[Gllug] best way to update a single production server?

Fri May 1 13:59:23 UTC 2009

On Fri, May 1, 2009 at 11:21 AM, Khusro Jaleel <kerneljack at gmail.com> wrote:
> Hi everyone,
>
> We have a very important production server that needs updates (just
> simple Debian Etch updates, not Etch to Lenny). There are several
> updates, including kernel and other updates, etc.
>
> Since unfortunately we have only this server, and it MUST be up 24/7,
> I'm not sure how to apply these updates, since a reboot is needed.
>
> Some suggestions have been:
>
> 1. Ask the DC to transparently re-route the IPs on that server to
> another server, run the website from there. Then, apply updates to
> original server, reboot, and point things back.
> 2. Change DNS for the website to somewhere else for a few days
> (depending on TTL), update/reboot, then change DNS back.
>     - I have been told that this will be tricky because not everyone
> respects TTLs, allegedly AOL servers will keep the old IPs for upto 2
> weeks! I'm not sure if this is true or not.
> 3. Setup another server that just shows a maintenance page and point
> the main website to that, then we can update and reboot and point
> things back.
>
> In the longer term, what is the proper way to manage this process? Do
> you guys always install 2 servers in some sort of HA config so that
> while 1 is being updated, the other one takes over?
>
> Or do you use load balancers like F5 BigIP that handle this for you?
>
> Another option might be to use VMs? So we setup each server with 2
> VMs, so while we update 1, the other one takes over, but uses the same
> IPs?
>
> Thanks for any insight.
> Khusro
> --
> Gllug mailing list  -  Gllug at gllug.org.uk
> http://lists.gllug.org.uk/mailman/listinfo/gllug
>

We are so devoid of details regarding your request that we can provide
only generalities. If you must upgrade pretty much now (no time for
tests, no time for alternative configurations) I would just wait for
the quietest moment, do the work and pray for the best, the penalty
for not having built redundancy in your system from the start is
unavoidable downtime. If your server's disks are mirrored you should
detach a half of the mirror (or take a snapshot if you are not
mirroring) in case you need to recover from catastrophic failure.

As for the way forward, you hint that your server is a web server, so
a load balancer would work wonders here. I have seen BigIP used, it
works quite well: make the server that needs maintenance offline, do
your work, and bring the machine back online, BigIP will route traffic
graciously to the other server(s) in a transparent manner.

To such setup I would add at least one more machine as your test
machine. If possible you should never put any updates in a machine
that is earning your bread and butter before testing those updates
elsewhere. Test machines could be virtual nowadays, so you could have
a server with multiple virtual machines fullfilling the function of
test bed for all your systems.

Three machines? you must be asking. How much is that in money if you
use cheap hardware? How would it compare to losing your service for
any length of time ? You have to do some number juggling, but I find
hard to believe that one could not find money for something that
*must* be live 24/7 (must in this context means that your enterprise
would go broke if it isn't, if 24/7 is just an aspiration that us not
backed by hard ££££ I suggest that you should adjust expectations
around you given your current setup).
-- 
Gllug mailing list  -  Gllug at gllug.org.uk
http://lists.gllug.org.uk/mailman/listinfo/gllug