[YLUG] Server purchase

Richard G. Clegg richard at richardclegg.org
Sat May 20 11:55:38 BST 2006


Doug Winter wrote:

> Why do you need a heartbeat and a hot spare?  Can you really not cope 
> with even 30 mins downtime in the case of catastrophic failure?  My 
> experience is that hot failover is *way* more hassle than it's really 
> worth, and introduces a load of complex failure modes that are difficult 
> to test.  

I'm torn on this subject.  However, there are only two people who are 
capable of restoring the system at the moment and quite often we're both 
away at the same time.

> The things that break are the things with moving parts: PSU, disks and 
> fans.  RAM goes sometimes, although that's rarer.  Expect the moving 
> parts to fail.  MTBF on disks has gone through the floor with their 
> increasing capacity.  I used to reckon on 100 years, but now I think 
> it's more like 20.

Hmm... My "two disk failures in the last four weeks" seems relatively 
unlucky.

> Finally, and I think more importantly than anything else in the entire 
> world of running servers: TEST YOUR BACKUPS.  

I think, having reinstalled from scratch three times recently I can say 
hand on heart that the back ups do all work. :-)

Thanks very much for the suggestions though, all useful stuff.

-- 
Richard G. Clegg,
Networks & NonLinear Dynamics Group,
Dept. of Maths, Uni. of York.
http://www.richardclegg.org/



More information about the York mailing list