[Gllug] Redbus outage

Jason Clifford jason at ukpost.com
Wed Mar 2 14:03:33 UTC 2005


On Wed, 2 Mar 2005, Liam Delahunty wrote:

> Just to conclude this whole tale of woe, we're back online now (back
> at 11:20), so were out for just over 24 hours.
> 
> They replaced the PSU and internal cables in our box.
> 
> Now i need to work on a recovery plan that includes the fscking
> datacentre being attacked by aliens.
> 
> I mean even if we had another box elsewhere, wouldn't DNS be pointed
> at the old box and so email/web requests would still go there? Updates
> would take 24 hours or so wouldn't they?

Set up DNS so that TTL values on the records are no more than you need 
them to be (30 minutes?) and if you have multiple servers in different 
locations all holding the correct data to provide the services use round 
robin DNS balancing so that load is distributed already.

This way if a location becomes unavailable you don't loose the service 
entirely and you can remove the "bad" entries from DNS within 30 minutes.

You might also consider that over the past 2 years RedBus have had 
multiple power failure issues at Harbour Exchange while the Meridian Gate 
and Sovereign House sites have not suffered in the same way.

Our servers (and our colo facility) are at Meridian Gate. Clara provide 
our upstream bandwidth. We suffered about 10 minutes of bandwidth outage 
as a result of yesterday's fiasco as the main router for Clara's RedBus 
facility is in HEX. Once the routing updates had been made things just 
carried on for us.

Jason Clifford
-- 
UKFSN.ORG		Finance Free Software while you surf the 'net
http://www.ukfsn.org/	   ADSL Broadband from just £21.50 / month 

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug




More information about the GLLUG mailing list