[Gllug] Re: Memory usage

Thu Oct 13 17:36:53 UTC 2005

>>> On Thu, 13 Oct 2005 17:05:48 +0100, Nix <nix at esperi.org.uk>
>>> said:

[ ... ]

>> But guessing wildly the difference is most likely due to
>> scattered on-disc layout; I have recently reloaded my 'root'
>> partition, thus ensuring more clustering:
>> 
>> http://WWW.sabi.co.UK/Notes/anno05-4th.html#051010
>> 
>> and a 'tar' of the partition became seven times faster, and
>> interactive usage feel rather snappier.

nix> The latter is peculiar, 'cos unless /usr is on your root
nix> filesystem, most of the references to things on your root
nix> filesystem will be for things like /bin/ls and libc, which
nix> are so heavily referenced they should be pretty much
nix> entirely resident in the page cache.

Such clueless speculation might be avoided by the simple device
of reading the notes in the link above.

And no thanks for assuming willy nilly that I am similarly
clueless and I would test filesystem reading speed in a warm
state, despite the very explicit statement to the contrary in
the link above.

nix> (If you're mounting without noatime, you might be triggering
nix> a lot of dirty block flushes, which in large tars and dus and
nix> finds will contend with reads...)

The rather subtle but not large influence of this aspect of
things is probably more informedly discussed in these entries:

  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050910
  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050925

>> Now starting an application that consists of many files (main
>> executable, libraries, fonts, ...) is roughly equivalent to
>> 'tar'ing it up, and if all the bits and pieces are scattered
>> on disc, bad news.

nix> It's completely unlike tarring it up: only the metadata
nix> gets read at once, the rest gets paged in in an effectively
nix> random manner (it's not random, but the initial page-in
nix> order of most programs is very non-linear).

Perhaps you should consider the tree-traversal aspect thereof,
considering that it was explicitly written «bits and pieces
scattered on disc». Reading is relatively cheap, moving the arm
around the filesystem tree can dominate things even if not much,
or more, is read from each file. Compare the times taken for a
pure 'find' or 'fsck' to the full 'tar'ing of the same
filesystem here:

  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050913

>> To check this do the following: type "time soffice" and then
>> as soon as the window appears type "CTRL-Q" to terminate it
>> as soon as it is ready. What I get from cold and from warm
>> is: [ ... 10 seconds elapsed vs. 3 seconds elapsed ... ]

nix> That's an *extreme* difference. What does oprofile show?

Thats a very common difference for applications that do some
amount of scatterred references to a lot of files... Sometimes I
wonder if ''resource forks'' were actually a good idea. Bah!
What am I saying? :-(.

nix> (And if it *is* the filesystem, what the hell have you done
nix> to it?

I would like to point out that I have explicitly stated in a
previous message that my filesystems have been freshly reloaded
recently, and in any case this is clearly stated again in some
of the entries in the already mentioned discussion of filesystem
performance:

  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050917
  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050915

It would very nice if you actually spared some of your precious
time to read what you are commenting upon instead of investing
it in making unrelated comments.

nix> I have eight-year-old ext2 filesystems

First of all 'ext2' is a special case because following some
discussion I had many years ago with its developers it does both
static (preallocation) clustering and dynamic IO clustering,
which I recently added a bit of discussion about:

  http://WWW.sabi.co.UK/Notes/anno05-4th.html#051011b

nix> that don't show that degree of slowdown from run to run.)

Secondly, the more meaningful comparison is not run-to-run, but
cold start on the existing filesystem vs. cold start on the
filesystem as freshly reloaded, if the filesystem has been
updated often, and in particular if it has been almost full for
any length of time. Spatial clustering depends on rewrites and
on how scattered/fragmented the free list itself has become over
time...

Still, the comparison above between 12 and 3 seconds was about a
cold start of an app and an immediate restart of the same app,
so a rather warm start.

If, as shown later, that app does attempt or succeed in opening
a large number of files, and the relevant inodes are widely
scattered, very bad news indeed for the cold start, not so bad
for the warm start.

[ ... ]

>> Also, check the FontConfig/Xft2 font list with
>> 
>> fc-list | sort -df | less -S
>> 
>> as it might also be useful to see if it can be trimmed,
>> especially of asian fonts.

nix> New fontconfig releases have sped up font lookup a bit.
nix> (What's the slowdown here? Anyone done any profiling?)

Newer FontConfigs have font list ''caches'', but it is the
mapping and examination of fonts that might hit heavily.
I haven't looked but perhaps when OOo starts up it grabs a list
of fonts and looks into them.

Ah, just for curiosity I have just done a 'strace -f -e
trace=open,stat64 ....' and it turns out that when starting up
OOo 1.1.5 to an empty window I get this:

----------------------------------------------------------------
$  egrep -c '\<(open|stat64)\(' /tmp/soffice.strace
1660
$  egrep '\<(open|stat64)\(' /tmp/soffice.strace | egrep -c ' = [0-9]'
715
----------------------------------------------------------------

As points of comparison, I just tried with AbiWord:

----------------------------------------------------------------
$  egrep -c '\<(open|stat64)\(' /tmp/abiword.strace
1597
$  egrep '\<(open|stat64)\(' /tmp/abiword.strace | egrep -c ' = [0-9]'
1274
----------------------------------------------------------------

For further horror, here is the story for KWord:

----------------------------------------------------------------
$  egrep -c '\<(open|stat64)\(' /tmp/kword.strace
1836
$  egrep '\<(open|stat64)\(' /tmp/kword.strace | egrep -c ' = [0-9]'
1632
----------------------------------------------------------------

Doing some other counting and checking, it looks like relatively
few of those are for fonts, unless one opens the font menu, but
then they are still not that many.

  Note: details omitted, but more or less with visual inspection
   egrep '\<(open|stat64)\(' /tmp/$WHAT.strace \
     | sort -t '"' +1 | less -S

As a final point of comparison, VIM startup:

----------------------------------------------------------------
$  egrep -c '\<(open|stat64)\(' /tmp/vim.strace
187
$  egrep '\<(open|stat64)\(' /tmp/vim.strace | egrep -c ' = [0-9]'
92
----------------------------------------------------------------

Whatever, except for VIM above we see lots and lots of attempted
or successful inode accesses. If those inodes are scattered
around the disc, very bad news. Extensive discussion of locality
in file systems here:

  http://WWW.sabi.co.UK/Notes/anno05-4th.html#051010
  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050915

As remarked in my futile annotations, the ancient UNIX
assumption that starting a process is cheap is being voided by
frameworks and applications that come with a lot of baggage,
with EMACS probably starting the trend:

  http://WWW.sabi.co.UK/Notes/anno05-3rd.html#050701

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug