[Gllug] squid, load balancing

ccooke ccooke-gllug at gkhs.net
Mon Jan 16 11:52:46 UTC 2006


On Mon, Jan 16, 2006 at 10:54:30AM +0000, SteveC wrote:
> Hello all,
> 
> I'm looking for advice on scaling a website (openstreetmap.org) above
> one box as we're hitting quite high load. The current topology is a db
> machine, the web server www, a tile server for imagery and a machine
> with backup harddrive space. Everything runs debian and apache2.
> 
> We're using squid to do some caching but it really isn't fun to
> configure from my limited experience. Are there other things to consider
> apart from squid? What are the options for load balancing?

Much of this is going to depend on your budget. I'm going to assume that
'cheap' is important - otherwise, you can just throw money at it.

A decent configuration for what you're talking about would be three or
four layers, dependant on your needs. The layers can be implemented
seperately.

The first thing you need to determine is where your bottleneck is. In
most cases like this, it's the web server - but *not all*. 

Second, if you're considering improving the performance of your database, 
you need to analyse how you would do that. Perhapse some changes to the
structure could give you the speedups you need? Is the database
normalised? Are there keys for every common search?

If you can't optimise it further without additional hardware, what
database is it? Does your cgi code update the database? If so, you need
to work out how to arrange that. If you're using mysql and writes are
rare, a star topology is rather useful: You have a 'master' database
which only does writes and a set of slave databases replicating from it
which only get reads. If speed is a real problem, you can even run the
slave databases from a ramdisk (or the newer mysql in-memory storage
engine). 

For load balancing, I can recommend the linux virtual server project
(not to be confused with the linux vserver project, which does quite
opposite job (but also very much recommended!)). It's included in the
standard kernel - has been since early in 2.4 - and is very stable. It's
configurable and flexible. You'll want two load balancer servers, to
allow them to fail over safely. 

See http://www.linuxvirtualserver.org/ for more information.

For an 'ultimate' configuration, optimising everything, I'd suggest an
architecture like this:

	[ Load balancer pair ] 		First LB layer

	   [squid] [squid]		Cache layer

	[ Load balancer pair ]		Second LB layer

	 [http] [http] [http]		Web server layer

	[ Load balancer pair ]		Third LB layer

      [readDB] [readDB] [writeDB]	DB layer

(The three LB layers can be the same pair of load balancers - the
connection just passes through them three times)

I've implemented this configuration several times myself, in different
forms - on Debian, as well. Every package you need you'll find in
debian, and the standard kernel has all the options you need.

-- 
Charles Cooke, Sysadmin 
Say it with flowers, send a triffid.
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list