[Gllug] Odd (but apparently correct) behaviour from du

C. Cooke ccooke-gllug at gkhs.net
Thu Jul 17 09:14:02 UTC 2008

On Thu, Jul 17, 2008 at 10:09:33AM +0100, John Winters wrote:
> Nix wrote:
> > On 16 Jul 2008, John Winters told this:
> >> The results which came back were startlingly variable.  Instead of a
> >> consistent 30G or so (slowly increasing) in each snapshot I got wildly
> >> varying values.  One night's snapshot would be 31G and the next night's
> >> apparently 2G.  Looking at them manually however they all seemed to be
> >> complete.
> > 
> > That seems very strange.
> > 
> > du's process_file() handles hard links by hashing every (inode, dev) it
> > finds for inodes with a link count >1, then not accumulating sizes or
> > names for inodes it's seen already (unless --count-links is active).
> > 
> > Therefore you shouldn't see *varying* output unless your filesystem
> > is returning readdir() results in a different order every time du runs
> > (which is possible but really rather unlikely).
> Possible misconception here.  When I said "varying" I didn't mean
> "varying from run to run" - just "varying from snapshot to snapshot".
> If you run it again you get the same result.

On the other hand, it's doing exactly the right thing here - it's
telling you how much space each snapshot takes. After all, using hard
links means you're taking incremental backups; it's expected that
subsequent snapshots of a largely unchanged tree will be smaller.

> > 
> >> I got the right answer.  Now hands up all those who knew that du behaved
> >> this way.
> > 
> > Er, I thought it was obvious that it had to do something like this from
> > the moment I first met du. I don't see how else it could possibly give
> > the right totals in the presence of hard links.
> I suppose it depends what you're expecting it to do.  When I invoked it
> with a list of directories I rather expected it to tell me the disc
> usage for each of those directories separately.  The author on the other
> hand clearly expected it to provide values which could subsequently be
> added together to give a correct total for all the directories together.
>  Once you're working towards the latter objective then clearly it does
> need to work in the way which it does.

du is supposed to give you a reading of disk usage; if it were able to,
by default, tell you that you have used more disk space than you *have*,
that would quite rightly be a bug.

If it helps, you might like the -l (--count-links) switch, which tells
du to count hard-linked files multiple times.

d=(1 0 6 0 1 0 5 5 41 5 3 12 4 5 15 1 4 -2 5 5 0 5 4 24 3 5 27 1 3 -2 1 3 6)
a=0;while :;do ((v=(c=a)+3));((x=d[d[a]]-d[d[a+1]]));d[d[a]]=$x;((a=d[d[a]]\
<0?${d[a+2]}:v));case $a in -1)read d[d[c]];a=$v;;-2)echo ${d[d[c+1]]};a=$v\
;;0)exit;;esac;done 2>&- # Charles Cooke, Sysadmin.  
Gllug mailing list  -  Gllug at gllug.org.uk

More information about the GLLUG mailing list