[Scottish] Re: Large Linux environments

Andrew Back andy at smokebelch.org
Sun Apr 13 15:21:25 BST 2008


On Fri, 11 Apr 2008, Craig Perry wrote:

>> how companies manage
>> large numbers of Linux servers (500-700 machines).
>
> Badly?! :-) Just jesting!
>
>> What distros do
>> data-centers use
>
> There was a time when it was unequivocally redhat. These days i don't think it'd
> be fair to say that; the playing field has opened up slightly (for the better!).
> I know of 2 seperate companies near glasgow going down the ubuntu & suse routes
> for medium (100ish node) clusters.

For 'Enterprise' applications you tend to really only have two options: 
Red Hat and Novell. It's a pretty miserable situation where most hardware 
and software vendors will only certify for these two. Sure, Oracle might 
run fine on Gentoo, or Slackware on a HP server. But if you have support 
contracts the moment they realise you're not running one something with a
veneer of corporate gloss and with their seal of approval, you are on 
your own.

Application and hardware support aside with mission critical systems you 
need some form of OS support. Either contracted out or in-house. I have 
been told that HP will offer support for Debian and believe they use it 
extensively internally. And Yahoo! being a big FreeBSD shop have their own 
team of kernel hackers in-house.

> If you've got budget for good support, i've found the suse sales reps are
> particularly co-operative on price both times i've used them.

Yeah, and you can get even cheaper support on SuSE/Novell Linux if you 
have a relationship with Microsoft. You can buy some of Bill's discounted 
support vouchers and with that the assurance that MS won't sue you for 
the patent infringements in Linux.

Personally I'm wary of Novell.

>> what management software (Elwell says he uses
>> cfengine).
>
> cfengine (crazy config language though ;-), puppet (quite buggy!), adhoc shell
> scripts with SSH & keyed logins. Where i work we use a mixture of these & a
> distributed job scheduling tool (and lots of prayers!) for our sizable compute
> cluster (2,000ish nodes).

I believe Uniersity of Edinburgh use LCFG.

http://www.lcfg.org/

A lot of major corporates have big clunky enterprise management platforms 
that are sold on the back of an unattainable utopian vision, as delivered 
by a Rolex sporting sales droid over a sumptuous business lunch. 
Multi-tiered magical systems that supposedly monitor and configure 
everything, and can manage application deployments and workloads across 
hetereogenous environments - or some such similar claims.

>> What other
>> considerations are there when running an operation that large? These
>> machines would be accessible via the public internet.
>
> Well, it's genuinely difficult to keep on top of that many hosts if they're
> visible from the outside. Keeping them secure would probably keep 2x full timers
> on their toes all day :-) In general, lock down everything you can, be as anal
> as you're imagination will let you. Setting noexec mount option on /data may
> just be the differnce between a success and failure for a scripted attack.

And also your application will dictate network architecture. If it's just 
a load of web servers then a flat network is fine. But if you have 
application servers and database servers, and the data is of high value, 
then you may want to take a multi-tiered approach with firewalls between 
layers creating separate security domains. And if your data is on 
NFS filers you may want a separate LAN for this traffic, and maybe even 
one also for backup, and possibly management. All depends on requirements.

And if it is mission critical then you might need to think about high 
availability solutions, and possibly diasaster recovery with a second site 
Etc.

Andrew



More information about the Scottish mailing list