[Gllug] lots of little files

Sun Oct 16 10:48:23 UTC 2005

>>> On Sat, 15 Oct 2005 09:23:20 +0100, Richard Jones
>>> <rich at annexia.org> said:

rich> On Sat, Oct 15, 2005 at 01:46:40AM +0100, Minty wrote:

Minty> I have a little script, the job of which is to create a
Minty> lot of very small files (~1 million files, typically
Minty> ~50-100bytes each). [ ... ]

rich> I too would be seriously tempted to change to using a
rich> database.

To support this rather wise point, as some people have been in
all seriousness trying to offer ''helpful'' suggestions on how
to ''optimize'' the performance of a filesystem subtree with
1,000,000 rather very small files, perhaps it is better to
detail a bit why sensible people would be «seriously tempted»
by a database...

First, I have appended two little Perl scripts (each rather
small), one creates a Berkeley DB database of K records of
random length varying between I and J bytes, the second does N
accesses at random in that database.

I have a 1.6GHz Athlon XP with 512MB of memory, and a relatively
standard 80GB disc 7200RPM. The database is being created on a
70% full 8GB JFS filesystem which has been somewhat recently
created:

----------------------------------------------------------------
$  time perl megamake.pl /var/tmp/db 1000000 50 100

real    6m28.947s
user    0m35.860s
sys     0m45.530s
----------------------------------------------------------------
$  ls -sd /var/tmp/db*
130604 /var/tmp/db
----------------------------------------------------------------

Now after an interval, but without cold start (for good
reasons), 100,000 random fetches:

----------------------------------------------------------------
$  time perl megafetch.pl /var/tmp/db 1000000 100000
average length: 75.00628

real    3m3.491s
user    0m2.870s
sys     0m2.800s
----------------------------------------------------------------

So, we got 130MiB of disc space used in a single file, >2500
records sustained per second inserted over 6 minutes and a half,
>500 records per second sustained over 3 minutes. Now, the
scripts and the database are just quick'n'dirty (e.g. the
needless creation of a string for every record inserted), but it
looks like a good start to me, with nice tidy numbers.

Well, it would be great to compare with the 1,000,000 small
files scheme, but today I am feeling lazy and so offer as to
this only arithmetic:

* The size of the tree will be around 1M filesystem blocks on
  most filesystems, whose block size usually defaults to 4KiB,
  for a total of around 4GiB, or can be set as low as 512B, for
  a total of around 0.5GiB.

* With 1,000,000 files and a fanout of 50, we need 20,000
  directories above them, 400 above those and 8 above those.
  So 3 directory opens/reads every time a file has to be
  accessed, in addition to opening and reading the file.

* Each file access will involve therefore four inode accesses
  and four filesystem block accesses, probably rather widely
  scattered. Depending on the size of the filesystem block and
  whether the inode is contiguous to the body of the file this
  can involve anything between 32KiB and 2KiB of logical IO per
  file access.

* It is likely that of the logical IOs those relating to the two
  top levels (those comprising 8 and 400 directories) of the
  subtree will be avoided by caching between 200KiB and 1.6MiB,
  but the other two levels, the 20,000 bottom directories and
  the 1,000,000 leaf files, won't likely be cached.

If the reader does the math it is pretty easy to see how that
compares, on paper, with 130MiB of space in a single file opened
once and a 2500/s append rate and a 500/s fetch rate...

----------------------------------------------------------------
use strict;
use warnings;

# This is just a rough test, not a proper script.

package main;

use Fcntl;
use DB_File;
my $type = 'DB_File';

my ($name,$entries,$minsize,$maxsize) = @ARGV;
my $upper = $maxsize - $minsize + 1;

$DB_HASH->{'nelem'} = $entries;

my %db;
tie (%db,$type,$name,(O_CREAT|O_RDWR),0666,$DB_HASH)
  || die "$!: Cannot tie '$name' of type '$type'";

while ($entries > 0)
{
  my $size = int ($minsize + rand  $upper);

  my $key = sprintf "%05x",--$entries;
  my $entry = "@" x $size;

  $db{$key} = $entry;

  undef $entry;
  undef $key;
}

untie %db;
----------------------------------------------------------------
use strict;
use warnings;

# This is just a rough test, not a proper script.

package main;

use Fcntl;
use DB_File;
my $type = 'DB_File';

my ($name,$entries,$fetches) = @ARGV;

$DB_HASH->{'nelem'} = $entries;

my %db;
tie (%db,$type,$name,0,0666,$DB_HASH)
  || die "$!: Cannot tie '$name' of type '$type'";

my $count = 0;
my $length = 0;

while ($count < $fetches)
{
  my $entryno = int rand $entries;
  my $key = sprintf "%05x",$entryno;
  my $entry = $db{$key};

  $length += length $entry;

  undef $key;

  $count++;
}

$length /= $count;

print "average length: $length\n";

untie %db;
----------------------------------------------------------------

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug