[vox] Who thinks Java is cool?

Bill Broadley bill at broadley.org
Thu Jun 16 23:59:50 PDT 2011


On 06/15/2011 11:41 PM, Norm Matloff wrote:
>> I kind of think of Go as a second generation C designed to be more OO
>> and multicore friendly.  I particular I'm really fond of channels.
> 
> Unfortunately, I have the opposite point of view.  Thanks for pointing
> us to Go (I'd been vaguely aware of it), but the use of channels would
> be a big negative for me. 

Interesting, I've got the opposite point of view.

> It's generally believed in the parallel
> processing community that the shared-memory paradigm makes for clearer
> code than does message-passing, and I personally agree.

That's an interesting assertion, possibly better discussed in person.

I am quite biased having taught a grad level MPI class and running 10
ish (mostly MPI) clusters.

The HPC world of parallel processing seems to be largely invested in MPI
which seems orders of magnitude more popular than shared memory both in
HPC clusters and in HPC codes.  Granted that's mostly about delivered
performance/scaling and long term stability for expensive to produce
codes.  Sure OpenMP works, but that's only for toy scaling.  If you want
to scale to say a factor of 100, 1000 or 10,000 on a code that's not
embarrassingly parallel then you use message passing.  While there are
other marketing reasons it hardly seems like SGI (the only big shared
memory company/product I know of) doesn't seem to be doing very well.
Seems like even in computer architecture things are moving towards
serial point to point async connections... basically message passing.
It hasn't happened to dimms yet but it seems to be headed that way.

I've had a few experiences with MPI over shared memory (SGI and Mpich)
and both worked rather poorly.  Granted that's not relevant to getting
useful things done with shared memory.  As counter intuitive as it was,
it was faster to send a message to a pci-e attached card for local
messages than it was to send a message using shared memory.

Seems like the inherent overhead of sharing memory between multiple
sockets, invalidation of caches, broadcasts required for exclusive use,
and other overhead means that shared memory is neither high bandwidth or
low latency.  It is however convenient to share an arbitrarily complex
data structure with a pointer.  The main problem is that the ever
growing gap between CPU performance and the distance to memory means
that you want to minimize the distance CPU<->memory and avoid anything
shared.  Thus today's movement towards performance and multiple cores
means that everything that was shared (CPU busses, memory busses, disk
bussses, networks, and I/O busses) are now point to point unshared
connections (hypertransport/qpi, 3-4 memory connections per socket, full
backplane networks, and pci-e).  Shared means not scaling, intel finally
figured this out after AMD lead the HPC market for years with the opteron.

Messages on the other hand imply nothing shared, are fire and forget,
and don't (necessarily anyways) block.

Additionally, if you think your code might ever need more than a single
computer's worth of performance then it seems like message passing is
your only choice.

To show how simple and easy message passing can be I thought an example
might be useful.  This code:
 * launches pool number of workers
 * calculates y rows spread across the pool of workers
 * as soon as a worker goes idle it can ask for more work.  The channel
   can be buffered or not.

The code:
for y := 0; y < size; y++ {
   if y < pool {
      go renderRow(w, h, bytesPerRow, workChan, iter,finishChan)
   }
   workChan <- y;
}

renderRow:
for y := range workChan {
  /* calculations */
}

Nice, clean and easy.  While Go channels are (at least for now) only for
communicating within a box/node, message passing works fine between
nodes (as is the case with MPI).

So with all that said.  What is the state of the art shared memory
approach these days?  New languages?  New libraries?  How well do they
scale?  Are any aware of today's every more complex memory hierarchies?
 Are they good at keeping good process, core, cache, and memory localities?

> It's too bad, because I really like coroutines, which I see Go includes
> and indeed highlights.  (Python can set them up too, using generators.)
> But lack of shared-memory would be a big drawback for me.

Writing and especially debugging MPI codes is certainly hard, but it's
the cost of scaling to more than a node.  Sure blood, sweat, and tears
need to be spent to minimize sharing, parallelizing a serial code with
messaging does exactly that.  Even within the node I find often message
passing seems to be the easiest.  Well at least with Python and Go where
they provide nice, safe, fast, race condition free, message passing
systems so you don't have to write/debug your own.  Can you give an
example or two where shared memory shines?  I'm curious if it requires a
different language.



More information about the vox mailing list