[Nottingham] Linux PC super-cluster possible?

Wed Jul 4 12:35:58 BST 2007

On Tue, 2007-07-03 at 19:30 +0100, Martin wrote:
> Folks,
> 
> Is it possible to...
> 
> Have a number of PCs, interconnected by LAN, appear as a single
> multi-cpu system?
> 
> That is, you write a program with multiple parallel threads, and you can
> then run it unchanged on the multi-PC cluster as though it was on a
> multi-CPU single PC?
> 
> Another requirement is that as far as the application software and LAN
> activity is concerned, contact with anything else on the network that is
> not part of the cluster must appear as a single computer.
> 

It all depends what you have got and what you are trying to do.
You are talking about several different things here.

1. The cluster:
If you want "anything else on the network that is not part of the
cluster must appear as a single computer"  then you are talking about
having an IP masquerade between your cluster and the rest of the
universe.  With the exception of a few awkward protocols (ftp being one)
this is easily achieved.  Give your cluster a single 'public node' and
use iptables to hide the rest of the cluster behind it.

NB1. If you can have a different public node and admin node.
NB2. You really really really want to give the cluster it's own network
switch.

2. Now the application:
Is your application parallel at the process level or sub-process level ?
What language is it being written in ?
To what extent is it parallelisable ? 
How small/big are the units of work ?

If you want to transcode 100 movies you can:
Either launch a transcode processes per cluster node (or CPU) with each
processing a different movie so:
ssh my at node1 transcode movie1.avi movie1.mpg &
ssh my at node2 transcode movie2.avi movie2.mpg &	 
.
.
.
OR you can launch a single transcode process and use PVM to spread the
load across the cluster one movie at a time.

The first method is cheap and easy - you can achieve good results using
script+ssh in 10 minutes and also semi-robust against node failures
(just restart on a different node).

The second is probably more efficient processing wise (especially when
no. films < cluster nodes) but is more difficult to get right because
you may have to deal with things like node failures in your code.

I chose transcode because it can do this.  It is a good example of an
application which can be built to make use a PVM cluster if one is
available but still works fine when one is not.

There are other clustering libraries (http://www.open-mpi.org) too.
Sometimes the one you choose will depend on what else you need  (eg.
language bindings or integration with a load balancer).  None of these
are your traditional threads.

3. Load balancing
You can install a distributed resource manager (eg condor , SGE ,
torque) to manage and load balance jobs across the cluster.  SGE has a
job queue and can balance processes but it integrates with parallel
environments (make -j , PVM , possibly MPI) too.
I use SGE because it was easy to get up and running and the most free
at the time I needed it.  

I don't know about the others but I have made our use of SGE mostly
transparent by the judicious use of shell and perl scripts.

4. SMP and Threads
There are 'single image' products (openssi, Xen, Kerrighed) that can do
automatic job migration around a cluster of machines.   They all seem to
manage process migration but I don't know if any of them do thread
migration.     It also depend on the nature of your processes - mine
often use 100 -- 2GB of memory and there is no way I want those spewed
automagically around my network :)

I recommend http://www.clustermonkey.net/ and the beowulf mailing list
http://www.beowulf.org/mailman/listinfo/beowulf as sources of good
reading material on anything cluster related.

> Possible?
> Hackable?
> Ideas?
> 

Have fun,
Duncan

PS. See you tonight.
> Cheers,
> Martin
>