[vox-tech] beowulf cluster

Sun, 12 Jan 2003 10:25:12 +0000

on Fri, Jan 10, 2003 at 04:43:25PM -0800, Shawn P. Neugebauer (spn@ucdavis.edu) wrote:
> On Friday 10 January 2003 03:41 pm, Bill Broadley wrote:
> > On Mon, Jan 06, 2003 at 02:48:38PM -0800, Ryan Detert wrote:
> > > I am looking for a good howto or a really clear book on setting up a
> > > beowulf cluster. I have 3 computers and I am wondering first off if it
> > > would be easier to use NFS or having each node have a completely
> > > functional OS.
> >
> > Hrm, thats a pretty large subject, what exactly are your goals?
> > Resume fodder?  A particular application?  Parallelizing a
> > particular code?
> 
> I hear lots and lots of talk about building beowulf clusters, but I
> don't hear much talk about applications.  Is there existing software
> that will distribute problems across such such clusters
> *automatically*?  I'm thinking about high-level software, e.g.,
> Matlab, or octave, or even a more special-purpose application.  Or is
> the power of these clusters only harnessed when I write near-custom,
> MPI-based code that specifically parallelizes *my* problem?  I
> recognize the difficulty in auto-magically parallelizing, but what
> *good* is such a cluster if I have to write custom code all the time?

That's one approach.

Another is to look at the cluster as a cycle-scavenging queue-managmenet
system.  That is, jobs are run to batch, and the batching system
intelligently or otherwise (round-robin is a relatively decent
first-order approximation) allocates jobs to slave nodes.  This isn't as
sexy as NUMA, but it can be useful, and isn't terribly complicated to
set up.

This doesn't do much for interactive processes (unless you can
interactively gin up something, then submit it for batch work).  It can
work well for certain types of database, bioinformatics, quantitative, 
and image processing.

The fundamental requirement is to have a network of systems with
equivalent capabilities, available to the batching system.  Free
software lends itself, particularly where proprietary software is
licensed on a per-node or per-processor basis.

A _lot_ of cluster-based solutions still is based on the idea of solving
a specific problem by tying together multiple machines, in the special
rather than the general case.  HA, load balancing, cycle scavenging, and
designed segmented searches are typical.

Peace.

-- 
Karsten M. Self <kmself@ix.netcom.com>        http://kmself.home.netcom.com/
 What Part of "Gestalt" don't you understand?
   Geek for hire:  http://kmself.home.netcom.com/resume.html