Welcome to Remap.

Remap is a generic execution environment for distributed worker processes. At the moment, worker processes for map/reduce are implemented and work is being done on "vertex" jobs, which are worker processes for graph data.

The entire code base is written in python. It relies on nanomsg for message passing and if you want to run distributed, you need to use a distributed file system (cephfs, nfs), or copy all files to all disks.

Remap is targeted at developers, researchers and tinkerers with small installations of 1-50 nodes or so, so it's far from ready for prime-time 1000's of node installations. The main benefit is that if you don't have a cluster yet, have plenty of files to process and want to get your feet wet with the technology, remap can be very helpful.

Remap is extensible, so if map/reduce or 'vertex' (pregel) do not work for you, there are ways to write your own worker processes as a module on top of a daemon process. This module should then implement a couple of functions for status reporting, job control and lifetime management. You can always hook up on github to get more information on how to do that, the existing implementations should give you some insight how this is done (src/core and src/initiator).

There's a wiki with the design, some info to get started and some example implementations.