[Gllug] Redundant hosting recommendation

Thu Nov 6 17:28:44 UTC 2008

Simon Perry wrote:
> Hi,
>
> I need to come up with a hosting solution for a new site that offers 
> 100% uptime but not having done this before I don't know what solutions 
> are available? What I have come up with so far is;
>   

heh heh heh. The old 100% uptime chestnut.

Simon, I think you will find that 100% uptime is not probable. Never 
EVER sell it as. There are more ways things can go wrong than you would 
believe. Working for a hosting provider for most of my professional life 
I shall give you some insights.

1. Data data data

  If you have static HTML pages then this is pretty easy. You do not 
have to worry about data consistency and so you can use some kind of 
load balancing between sites

  However, if you have any kind of database that needs to be WRITTEN to 
then it makes things harder. You will need some kind of replication of 
the database and also a "fail back" plan which will involve an outage. 
Also failover in this scenario involves an outage but you can probably 
cut it down to about 60 seconds or so.

2. Failover

  You need to make sure that if you invoke your DR site that the primary 
site is definitely down. This is usually done via a string check on a 
page on your website. Make sure if you have a DB then the page pulls 
data from the DB so it is a full end to end test of your website. Also 
do not have the monitors on a hair trigger. i.e. do not have the site 
fail over if one check fails after 1 second.

3. Failback

  Ensure that you have and adequate fail back plan. Again with a DB site 
this will generally involve.

1. Stopping the site
2. Dumping that database from DR
3. Importing DB into PROD
4. Starting PROD and testing
5. Go live

4. Active Active

  If it is possible for you to use an active-active and 2 way 
replication database environment then I would recommend this. However 
this all depends on your application. mysql does support master-master 
replication but if you have any auto-indexing then PRI needs to be odd 
and DR needs to be even.

5. Gotcha's with load balancing.

  Depending on the way you setup the load balancing ( and if you even 
use it) you may suddenly find that some of the stats you run on your web 
logs do not work as all the requests come from the load balancer. If 
possible have a chain like below:

LB
|
|
FW
|
|
Server

 where the LB is default GW for firewall and firewall is default gateway 
for server. This way no NAT'ing or any network magic needs to be performed.

6. Gotcha's with DNS

  Some DNS servers do NOT honor TTL's. Crazy I know but the truth. The 
worst culprits  seem to be  proxy servers and some very old M$ DNS 
servers. Not a lot you can really do about that i am afraid

7. Single point of failure.

 Ensure redundancy in ALL your equipment. That means:

dual NIC's connected to dual switches
RAID hard disks
Dual Power supplies connected to DUAL power feeds.
You probably want 2 of each server in PROD so that small hardware 
failures do not invoke DR but that will depend on your budget

8. Test test test

  Make sure that before you go live you test all the redundant hardware. 
So pull out power cables /  network cards / disks etc. Much better to 
pick up a hole in testing rather than in production

I am probably making this sound much more difficult than it really is: 
however this is quite an undertaking. It is a complex process but one 
that can be done. It will take time and patience but good luck!

Cheers

Iain

> 1) Two dedicated servers in separate data centres with a 3rd party round 
> robin DNS service
>
> 2) A managed redundant hosting solution. Can anyone recommend a supplier?
>
> 3) Suggestions please
>
> Regards,
>
> Simon
>
>   

-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug