[Gllug] Odd (but apparently correct) behaviour from du

John Winters john at sinodun.org.uk
Thu Jul 17 10:14:59 UTC 2008


C. Cooke wrote:
> On Thu, Jul 17, 2008 at 10:09:33AM +0100, John Winters wrote:
>> Nix wrote:
>>> On 16 Jul 2008, John Winters told this:
>>>> The results which came back were startlingly variable.  Instead of a
>>>> consistent 30G or so (slowly increasing) in each snapshot I got wildly
>>>> varying values.  One night's snapshot would be 31G and the next night's
>>>> apparently 2G.  Looking at them manually however they all seemed to be
>>>> complete.
>>> That seems very strange.
>>>
>>> du's process_file() handles hard links by hashing every (inode, dev) it
>>> finds for inodes with a link count >1, then not accumulating sizes or
>>> names for inodes it's seen already (unless --count-links is active).
>>>
>>> Therefore you shouldn't see *varying* output unless your filesystem
>>> is returning readdir() results in a different order every time du runs
>>> (which is possible but really rather unlikely).
>> Possible misconception here.  When I said "varying" I didn't mean
>> "varying from run to run" - just "varying from snapshot to snapshot".
>> If you run it again you get the same result.
> 
> On the other hand, it's doing exactly the right thing here - it's
> telling you how much space each snapshot takes. After all, using hard
> links means you're taking incremental backups; it's expected that
> subsequent snapshots of a largely unchanged tree will be smaller.

Well...  yes and no.  It's not correct to say that the subsequent
snapshots are smaller.  In fact each snapshot is slightly larger than
the previous one (due to the way filing systems monotonically fill up).
 You can't make a meaningful pronouncement about the files being really
in one snapshot and not another - once they are hard-linked they're in
both.  Each snapshot is actually slightly larger than its predecessor.

As I said in my previous post, it really depends on what you're
expecting.  I was expecting it to tell me the disc usage of each
snapshot separately (because I'd named them separately on the command
line).  Had I asked for a report on a directory which *contained* all
the snapshots then yes, it would have to take account of shared files to
give a correct total.

Note that I'm not saying that du's behaviour is wrong - just it's not
obvious that it has to behave in this way *when invoked on separately
named directories*.  It would also help if the Linux man page weren't
missing this information (although as you point out, there is something
there from which the missing information can be inferred).

Cheers,
John

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list