[vox-tech] mmap()

Thu Oct 24 01:01:54 PDT 2013

On 10/23/2013 11:51 PM, Norm Matloff wrote:
> Thanks, everyone, for the comments.
>
> Someone asked what I am trying to do.  Here it is in a nutshell:
>
> I have an R package, Rdsm, to do shared-memory parallel computing,
> http://cran.r-project.org/web/packages/Rdsm/index.html  It does genuine
> shared-memory access when all the processes (nominally independent R
> processes) are on the same machine, and I'd like to be able to do this
> on a cluster, with the programmer illusion of shared memory.
> Performance of course would be much lower in the cluster case, but
> better than nothing (and probably no worse than Rmpi etc.).

I've minimal experience with Rmpi.  But sub 2us MPI wasn't hard even 7 
years ago.  Trying to do app -> mmap -> nfs client -> nfs server -> nfs 
client -> mmap and I'd be impressed with sub 2ms (a factor of 1000 
slower).  Does RMPI do anything stupid like assume TCP over ethernet?

Even relatively simple (compared to mmap) uses involved Mbox (instead of 
maildir) or SQL databases have been problematic over NFS.  Typically 
with tuning, a specific use case, and careful configuration correctness 
can be achieved.  I've never heard it referred to as fast though.

Even older direct approaches for using MPI over shared memory has been 
often been rather slow compared to NIC level loopback, at least until 
the more modern methods like LiMIC, KNEM, XPMEM, and CMA.

> Rdsm runs on top of another R package, bigmemory.  The latter allows
> storage in files rather than solely in memory, so in principle Rdsm
> should work on a cluster running a shared file system.

Bit range locking over nfs seems more about correctness, but I wouldn't 
expect it to be particularly fast.  The NFS synchronization model seems 
optimized for the common case that is VERY loosely synchronized.  I.e. a 
single client accessing a file read/write, or N clients access a file 
read only.  Two clients accessing the same fire read/write is not fast. 
  For similar reasons MPI-IO  can be gotten to work with NFS, but is 
slow and not recommended.  PVFS2 or Lustre seem to handle it much better.

> The culprit seems to be mmap(), which apparently works quite differently
> from ordinary file access.  For the latter, the changes made by one
> process seem to propagate to the other processes reasonably quickly.

Indeed, using mmap to map address space onto disk blocks is quite common 
and has a wide range of uses for decades.  Thank god for cheap/fast 64 
bit CPUs.  Unfortunately mmap but is a poor match for how NFS works.

> I'm probably going to give up, and bypass the mmap() part completely in
> the cluster case, settling instead for plain files (calling R's save()
> in one process and load() in the other).

Sounds more practical and will avoid most of the bit range locking/sync 
issues.  I've heard of a few related research projects to definite a 
useful set of thread primitives with some wrapper magic to allow Python 
multiprocessing and Go's goroutines and channels to work across 
machines.  The only working product I know of that works in the somewhat 
general case is ScaleMP.  I suspect correctness would allow ScaleMP, R, 
bigmemory and friends to "just work" for a cluster of machines. 
Performance however is another matter, certainly if you find out please 
let me know.