[sclug] performance puzzle

Sat Jul 9 12:27:51 UTC 2005

On Sat, 9 Jul 2005, Tom Dawes-Gamble wrote:

> On Wed, 2005-01-19 at 22:57 +0000, Tom Dawes-Gamble wrote:
>> Hi
>>
>> Here is an interesting situation.
>>
>> # time dd if=/mnt/tmp/foo of=/dev/null bs=1024
>> 1048577+0 records in
>> 1048577+0 records out
>>
>> real    0m8.793s
>> user    0m2.346s
>> sys     0m6.422s
>> # time dd if=/mnt/tmp/foo of=/mnt/tmp/bar bs=1024
>> 1048577+0 records in
>> 1048577+0 records out
>>
>> real    0m27.614s
>> user    0m3.117s
>> sys     0m14.983s

Not grossly poor performance. After all, the first run the data was read and
thrown away, the second time, it was read, then had to be written to disc
(and, what's more, to the same filesystem as which the data was being read
from). I'd therefore expect at least a doubling of 'user' and 'sys' times
and the results you obtained are consistent with that.

I'd also query your experimental method; did you run these tests multiple
times and take an average in order to limit experimental error? did you run
this on a system in run level 1 (i.e. with nothing else running)? The latter
would appear not to be the case, since the difference between 'real' time and
user+sys suggests to me that the dd process was pre-empted by something
else.

>> # time dd if=/mnt/tmp/bar of=/mnt/tmp/baz bs=1024
>> 1048577+0 records in
>> 1048577+0 records out
>>
>> real    0m54.358s
>> user    0m3.076s
>> sys     0m14.857s

This appears more interesting on the surface, but looking a little closer,
the big increase between the last run and this run is the increase of 'real'
time, rather than 'user' or 'sys'. I wonder if this result would have been
different if you ran 'sync' or rebooted (and ran the dd of=/dev/null again,
to be fair) between the second and third runs.

>> # for i in /mnt/tmp/???
>>> do
>>> time md5sum $i
>>> done
>> 4763dafae2d293e98b62979a00308158  /mnt/tmp/bar
>>
>> real    0m21.199s
>> user    0m3.679s
>> sys     0m2.155s
>> 4763dafae2d293e98b62979a00308158  /mnt/tmp/baz
>>
>> real    0m21.194s
>> user    0m3.667s
>> sys     0m2.163s
>> 4763dafae2d293e98b62979a00308158  /mnt/tmp/foo
>>
>> real    0m5.770s
>> user    0m3.685s
>> sys     0m2.005s
>> # ll /mnt/tmp/???
>> -rw-r--r--  1 root root 1073742848 Jan 19 16:33 /mnt/tmp/bar
>> -rw-r--r--  1 root root 1073742848 Jan 19 16:34 /mnt/tmp/baz
>> -rw-r--r--  1 root root 1073742848 Jan 19 16:31 /mnt/tmp/foo
>>
>>
>> Why does foo perform so much better than bar or baz?

In this case, I'd suggest that foo is stored in a faster or more-dense zone
of the disc. Is /mnt/tmp on a logical volume?

> I didn't ever get a correct answer to this little puzzle I set in
> January.  A number of people did offer the 'disk fragmentation' solution
> but that is not the case here.

I see no evidence indicating that that is definitely not the case. On what
basis are you drawing that conclusion?

> To add a little more information.
>
> The system has 1 Gig of ram.
> The file system is an ext3 file system.
> The file system was created mounted and then the program that originally
> creates foo was run. The system was then rebooted so that no data would
> be in the buffer cache. The file system was remounted and the commands
> above run.
>
> So why do the files bar and baz perform so badly?

Note also that there appear to be some weirdness going on with various parts
of the Linux kernel these days. Performance regressions have been reported
with the IDE drivers, and I've personally noticed some very poor performance
on RAID10 arrays hosted on LSI MegaRaid320 controllers that was only fixed
by a) disabling the controller's built-in read-ahead and b) using blockdev
--setra to crank the kernel's own read-ahead on /dev/sda, /dev/md? and all
LVM LVs upto 4096 or 8192 or so.

I'm also becoming disillusioned with Linux's software RAID implementation,
both in terms of reliability (if a single block generates a read error, the
entire block device is dropped out of a RAID1 array immediately, rather than
doing as BSD does and attempting to re-write the block from a copy from
another block device in the array, and only dropping the drive out if the
/write/ fails) and performance (I would expect reads on RAID1 to perform
about the same as reads and writes on RAID0 arrays, but this is not the
case).

> tom.

Best Regards,
Alex.
-- 
Alex Butcher      Brainbench MVP for Internet Security: www.brainbench.com
Bristol, UK                      Need reliable and secure network systems?
PGP/GnuPG ID:0x271fd950                         <http://www.assursys.com/>