[vox-tech] mmap()
Bill Broadley
bill at broadley.org
Thu Oct 24 01:01:54 PDT 2013
On 10/23/2013 11:51 PM, Norm Matloff wrote:
> Thanks, everyone, for the comments.
>
> Someone asked what I am trying to do. Here it is in a nutshell:
>
> I have an R package, Rdsm, to do shared-memory parallel computing,
> http://cran.r-project.org/web/packages/Rdsm/index.html It does genuine
> shared-memory access when all the processes (nominally independent R
> processes) are on the same machine, and I'd like to be able to do this
> on a cluster, with the programmer illusion of shared memory.
> Performance of course would be much lower in the cluster case, but
> better than nothing (and probably no worse than Rmpi etc.).
I've minimal experience with Rmpi. But sub 2us MPI wasn't hard even 7
years ago. Trying to do app -> mmap -> nfs client -> nfs server -> nfs
client -> mmap and I'd be impressed with sub 2ms (a factor of 1000
slower). Does RMPI do anything stupid like assume TCP over ethernet?
Even relatively simple (compared to mmap) uses involved Mbox (instead of
maildir) or SQL databases have been problematic over NFS. Typically
with tuning, a specific use case, and careful configuration correctness
can be achieved. I've never heard it referred to as fast though.
Even older direct approaches for using MPI over shared memory has been
often been rather slow compared to NIC level loopback, at least until
the more modern methods like LiMIC, KNEM, XPMEM, and CMA.
> Rdsm runs on top of another R package, bigmemory. The latter allows
> storage in files rather than solely in memory, so in principle Rdsm
> should work on a cluster running a shared file system.
Bit range locking over nfs seems more about correctness, but I wouldn't
expect it to be particularly fast. The NFS synchronization model seems
optimized for the common case that is VERY loosely synchronized. I.e. a
single client accessing a file read/write, or N clients access a file
read only. Two clients accessing the same fire read/write is not fast.
For similar reasons MPI-IO can be gotten to work with NFS, but is
slow and not recommended. PVFS2 or Lustre seem to handle it much better.
> The culprit seems to be mmap(), which apparently works quite differently
> from ordinary file access. For the latter, the changes made by one
> process seem to propagate to the other processes reasonably quickly.
Indeed, using mmap to map address space onto disk blocks is quite common
and has a wide range of uses for decades. Thank god for cheap/fast 64
bit CPUs. Unfortunately mmap but is a poor match for how NFS works.
> I'm probably going to give up, and bypass the mmap() part completely in
> the cluster case, settling instead for plain files (calling R's save()
> in one process and load() in the other).
Sounds more practical and will avoid most of the bit range locking/sync
issues. I've heard of a few related research projects to definite a
useful set of thread primitives with some wrapper magic to allow Python
multiprocessing and Go's goroutines and channels to work across
machines. The only working product I know of that works in the somewhat
general case is ScaleMP. I suspect correctness would allow ScaleMP, R,
bigmemory and friends to "just work" for a cluster of machines.
Performance however is another matter, certainly if you find out please
let me know.
More information about the vox-tech
mailing list