[Nottingham] NTL Cache Performance Suggestions

Robert Davies nottingham at mailman.lug.org.uk
Sat Mar 1 16:27:01 2003


On Saturday 01 March 2003 14:13, you wrote:
> > I like that:  ``Instead of downloading the odd 5MB
> > file from the mirror, I'm
> > going to continuously rsync 80GB but never use 99.9%
> > of it.''
>
> ... although, with sarge practically stalled at the
> moment, the downloads won't even reach that. rsync
> AFAIK takes large quantities of _server_ CPU time -
> which is why jigdo was introduced to replace the
> pseudo image kit
> (http://www.tldp.org/HOWTO/mini/Debian-Jigdo/whyjigdo.html#WHYNOTUSETHEWHOL
>EPIK). So, while the bandwidth donor has to pay by the GB,
> rsync'ing limits the number of concurrent users far
> more.

Look I just spotted that an FTP server mirror that I have mirrored has an 
rysnc mirror.  Guess what, the 5 minute FTP job, even with little data 
transfer, finished in < 10 seconds.   So in this case 30 times the users 
would need to be allowed concurrently on the ftp server, to match an rsync 
server.  Yes it's an extreme example, but approaches the 'normal case' for 
mirroring applications.

Now think what impact that fast service has temporarily on CPU %,  but does 
that mean it actually consume more CPU time?   Now remember there's also 
additional hidden System costs in large number of concurrent processes 
running slowly that sit around for a long time.  You might also like to look 
at the effect of 1 connect per file transfer on throughput and server load 
(just count the forks), and ponder why ftp servers have always had to have 
user limits on them.  Never wondered why HTTP mirrors are preferred and why 
programs like Apache pre-fork clients?

There are some rsync options which will consume considerable Server CPU, 
compression obviously (but it shouldn't be needed for .rpm, .deb, .tgz etc 
transfers), similarly encryption (via SSH), and finally the checksum option 
which also is not needed on formats like .rpm and .deb.  I've not looked but 
if there's not a way to disable those on rsyncd server config, someone will 
shortly add them to prevent DoS attacks.

Now if CPU time was really the precious resource, rather than bandwidth, 
there'd be a lot more ppl clustering old PCs and using things like Mosix, to 
get the grunt to overcome this problem.  If checksumming of partially 
modified files, were a serious issue the server could look them up from a DB, 
though I can't see it happening much on an archive for file distribution.

I don't expect inconvenient facts to sway the /. climate of opinion, but 
those who are wedded to old protocols, please install 20 phone lines and bond 
dialup modems into a fat pipe, rather than use 1 highbandwidth line, if you 
really willing to put your money where your mouth is.

Rob