[Gllug] Redundant hosting recommendation

Fri Nov 7 10:28:01 UTC 2008

On 6 Nov 2008, at 13:57, Simon Perry wrote:
> I need to come up with a hosting solution for a new site that offers
> 100% uptime but not having done this before I don't know what  
> solutions
> are available? What I have come up with so far is;
>
> 1) Two dedicated servers in separate data centres with a 3rd party  
> round
> robin DNS service

If you're serious about this, at least pick two different hosting  
providers for the data centers.

I've experienced:
   - hosting providers going bankrupt, forcing us to move servers at a  
days notice (dot com bubble... Luckily we *didn't* rely on a single  
provider)
   - hosting providers that turn out to have systemic problems with  
the way they handle hardware or software upgrades that affect multiple  
of their sites at once, meaning the disaster recovery site would fail  
at the same time as the primary far more than you should expect.
   - hosting providers who route all their outbound traffic from  
multiple data centres through the same network path, so failures with  
their network providers make everything fail at once.

Also consider that for two data centres to provide full failover you  
need at least twice the equipment needed to run the site - each data  
centre must be able to take the full load. If traffic is high enough  
going for three sites can be a cheaper alternative as it allows you to  
"only" have 50% extra capacity for each site and still be able to  
handle a full site failure (the tradeoff is a slightly higher risk of  
having to do a failover, of course, as you now have three sites that  
can fail).

DNS round robin is not entirely reliable, though providers like  
UltraDNS / Neustar do a decent (but expensive) job of handling  
failover. The problem is that you'll find many people cache DNS  
entries longer than they are supposed to. You *will* be unavailable to  
many people for at least a few minutes in the case a single site fails  
if you use DNS. If this is not acceptable your only real choice if you  
want to do it all yourself is to set up BGP (or have an ISP do it for  
you) so you can announce multiple routes and have both sites answer to  
the same IP. You may still get "blips" when the site fails over, but  
they can be made a lot shorter.

If your content is mainly static, you can sidestep (part of) the  
problem by using a content distribution network such as Panther  
Express which can be set up to cache everything that has been  
requested from your site more or less permanently (as a bonus you get  
geographic load balancing that can significantly speed up access from  
outside the UK - in one case I was testing Panther with a company I  
worked for and they served up content to our test script faster than  
our web server located on the local host). Most of these will use BGP  
to announce multiple routes, and so will be very reliable if well  
managed. Some of them can also be set up to talk to multiple backends  
at your end, so that _they_ handle the load balancing / failover  
between your sites. They are at best problematic for dynamic content  
though, as they are geared towards longer term caching.

Keep in mind that some data centres - especially smaller ones - in the  
London area are also vulnerable to failures in core infrastructure in  
Docklands - ask hard questions about where they get their  
connectivity. You might find both the data centres you pick get all  
their bandwidth from the exact same locations (one of the peering  
points in Docklands, typically - if you're extra unlucky they may be  
going via the same fibre bundles), and while the big peering points  
are typically very reliable there are enough alternatives that have  
backup paths that doesn't go via Docklands that there's no need to  
take the chance if the uptime is that critical to you.

> 2) A managed redundant hosting solution. Can anyone recommend a  
> supplier?

I don't know anyone that sells this as a "packaged" solution, due to  
all the data synchronization issues and complexities the moment  
dynamic data is involved that other people have pointed out - those  
issues tend to be very application specific.

-- 
Vidar Hokstad
Technical Director
Aardvark Media Limited

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.lug.org.uk/pipermail/gllug/attachments/20081107/3be6887f/attachment.html>
-------------- next part --------------
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug