[Gllug] Redundant hosting recommendation
Iain M Conochie
iain at shihad.org
Thu Nov 6 17:28:44 UTC 2008
Simon Perry wrote:
> Hi,
>
> I need to come up with a hosting solution for a new site that offers
> 100% uptime but not having done this before I don't know what solutions
> are available? What I have come up with so far is;
>
heh heh heh. The old 100% uptime chestnut.
Simon, I think you will find that 100% uptime is not probable. Never
EVER sell it as. There are more ways things can go wrong than you would
believe. Working for a hosting provider for most of my professional life
I shall give you some insights.
1. Data data data
If you have static HTML pages then this is pretty easy. You do not
have to worry about data consistency and so you can use some kind of
load balancing between sites
However, if you have any kind of database that needs to be WRITTEN to
then it makes things harder. You will need some kind of replication of
the database and also a "fail back" plan which will involve an outage.
Also failover in this scenario involves an outage but you can probably
cut it down to about 60 seconds or so.
2. Failover
You need to make sure that if you invoke your DR site that the primary
site is definitely down. This is usually done via a string check on a
page on your website. Make sure if you have a DB then the page pulls
data from the DB so it is a full end to end test of your website. Also
do not have the monitors on a hair trigger. i.e. do not have the site
fail over if one check fails after 1 second.
3. Failback
Ensure that you have and adequate fail back plan. Again with a DB site
this will generally involve.
1. Stopping the site
2. Dumping that database from DR
3. Importing DB into PROD
4. Starting PROD and testing
5. Go live
4. Active Active
If it is possible for you to use an active-active and 2 way
replication database environment then I would recommend this. However
this all depends on your application. mysql does support master-master
replication but if you have any auto-indexing then PRI needs to be odd
and DR needs to be even.
5. Gotcha's with load balancing.
Depending on the way you setup the load balancing ( and if you even
use it) you may suddenly find that some of the stats you run on your web
logs do not work as all the requests come from the load balancer. If
possible have a chain like below:
LB
|
|
FW
|
|
Server
where the LB is default GW for firewall and firewall is default gateway
for server. This way no NAT'ing or any network magic needs to be performed.
6. Gotcha's with DNS
Some DNS servers do NOT honor TTL's. Crazy I know but the truth. The
worst culprits seem to be proxy servers and some very old M$ DNS
servers. Not a lot you can really do about that i am afraid
7. Single point of failure.
Ensure redundancy in ALL your equipment. That means:
dual NIC's connected to dual switches
RAID hard disks
Dual Power supplies connected to DUAL power feeds.
You probably want 2 of each server in PROD so that small hardware
failures do not invoke DR but that will depend on your budget
8. Test test test
Make sure that before you go live you test all the redundant hardware.
So pull out power cables / network cards / disks etc. Much better to
pick up a hole in testing rather than in production
I am probably making this sound much more difficult than it really is:
however this is quite an undertaking. It is a complex process but one
that can be done. It will take time and patience but good luck!
Cheers
Iain
> 1) Two dedicated servers in separate data centres with a 3rd party round
> robin DNS service
>
> 2) A managed redundant hosting solution. Can anyone recommend a supplier?
>
> 3) Suggestions please
>
> Regards,
>
> Simon
>
>
--
Gllug mailing list - Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug
More information about the GLLUG
mailing list