[Nottingham] NTL Cache Performance Suggestions
Robert Davies
nottingham at mailman.lug.org.uk
Sat Mar 1 16:27:01 2003
On Saturday 01 March 2003 14:13, you wrote:
> > I like that: ``Instead of downloading the odd 5MB
> > file from the mirror, I'm
> > going to continuously rsync 80GB but never use 99.9%
> > of it.''
>
> ... although, with sarge practically stalled at the
> moment, the downloads won't even reach that. rsync
> AFAIK takes large quantities of _server_ CPU time -
> which is why jigdo was introduced to replace the
> pseudo image kit
> (http://www.tldp.org/HOWTO/mini/Debian-Jigdo/whyjigdo.html#WHYNOTUSETHEWHOL
>EPIK). So, while the bandwidth donor has to pay by the GB,
> rsync'ing limits the number of concurrent users far
> more.
Look I just spotted that an FTP server mirror that I have mirrored has an
rysnc mirror. Guess what, the 5 minute FTP job, even with little data
transfer, finished in < 10 seconds. So in this case 30 times the users
would need to be allowed concurrently on the ftp server, to match an rsync
server. Yes it's an extreme example, but approaches the 'normal case' for
mirroring applications.
Now think what impact that fast service has temporarily on CPU %, but does
that mean it actually consume more CPU time? Now remember there's also
additional hidden System costs in large number of concurrent processes
running slowly that sit around for a long time. You might also like to look
at the effect of 1 connect per file transfer on throughput and server load
(just count the forks), and ponder why ftp servers have always had to have
user limits on them. Never wondered why HTTP mirrors are preferred and why
programs like Apache pre-fork clients?
There are some rsync options which will consume considerable Server CPU,
compression obviously (but it shouldn't be needed for .rpm, .deb, .tgz etc
transfers), similarly encryption (via SSH), and finally the checksum option
which also is not needed on formats like .rpm and .deb. I've not looked but
if there's not a way to disable those on rsyncd server config, someone will
shortly add them to prevent DoS attacks.
Now if CPU time was really the precious resource, rather than bandwidth,
there'd be a lot more ppl clustering old PCs and using things like Mosix, to
get the grunt to overcome this problem. If checksumming of partially
modified files, were a serious issue the server could look them up from a DB,
though I can't see it happening much on an archive for file distribution.
I don't expect inconvenient facts to sway the /. climate of opinion, but
those who are wedded to old protocols, please install 20 phone lines and bond
dialup modems into a fat pipe, rather than use 1 highbandwidth line, if you
really willing to put your money where your mouth is.
Rob