[vox] Who thinks Java is cool?

Bill Broadley bill at broadley.org
Fri Jun 17 00:40:28 PDT 2011


On 06/16/2011 09:13 AM, Alex Mandel wrote:
> I'm curious to hear yours and Bill's take on Celery:
> http://celeryproject.org/

Interesting.  I'm not sure exactly what it's meant for though.  I am fan
of message passing, and in particular ones that are designed to be light
weight and async.

I've written some code with AMP, using it as the basis for a p2p
distributed backup system with performance as one of the primary goals.

Amusing the examples are identical, although somewhat more verbose than
the celery project's:

class JustSum(amp.AMP):
       def sum(self, a, b):
           total = a + b
           print 'Did a sum: %d + %d = %d' % (a, b, total)
           return {'total': total}
       Sum.responder(sum)

class Sum(amp.Command):
       arguments = [('a', amp.Integer()),
                    ('b', amp.Integer())]
       response = [('total', amp.Integer())]

> I did talk to several people at a recent conference who were doing
> multi-processing in python and seemed happy with it, but they were
> Geographers not Computer Science and in Geography python is the language
> to go with these days for interoperability and libraries.

Yeah, I'm pretty pleased with multi-processing, I spent a weekend trying
to figure out why nothing was scaling with my python implementation
before I found (and cursed) the GIL.

Basically I wanted the following threads:
* one to talk each file system (per disk head/RAID) and find changed
  files
* one per CPU core/thread for for calculating a SHA256 or Skein
  checksums and encryption
* one for feeding encrypted checksummed blobs to the p2p
  server

Multiprocessing handles this every well, I just setup a queue and I get
perfect scaling.

Then the P2P server would talk to it's peers trading encrypted blobs by
their checksum to insure the (user defined) redundancy was met, and to
keep it's peers honest with challenges.  I wanted to use AMP so I could
say things like:
 1) peer A -> B: Do you have any of these blobs <list of checkums>
 2) peer B -> A: I have these <list>
 3) peer A -> B: please store these <list>
 4) peer A -> B: please register me for these blobs you already have
    <list>

AMP seemed ideal for this, lightweight, async, fast.  I was hoping to
sustain a decent fraction of a GigE network connection even when
handling small encrypted blobs (because often the average file size in a
backup is small).

More info on AMP:
http://twistedmatrix.com/documents/8.2.0/api/twisted.protocols.amp.html

So I guess I don't see the hard part that celery solves.  I was hoping
for more detail, maybe a presentation or paper.  I glanced at the docs
without really finding anything particularly unique.

In general I'd stick with something popular (like twisted) or part of
the language standard (like multiprocessing) unless there was a
significant improvement.  I've seen many small projects die leaving
anyone using them stranded.


More information about the vox mailing list