[mpiwg-tools] Cram v1.0
Todd Gamblin
tgamblin at llnl.gov
Tue Sep 23 18:25:44 CDT 2014
Hi all,
I¹ve just released version 1.0 of Cram, a new PMPI tool that members of
this list may be interested in:
https://github.com/scalability-llnl/cram
Cram lets you run many small MPI jobs within a single, large MPI job by
splitting MPI_COMM_WORLD up into many small communicators. This is not
unprecedented; tools have done this for a while. However, I think it¹s
the first tool that actually makes it easy.
Cram has command-line and python scripting interfaces that allow you to
create "cram files². Each cram file is packed with an ensemble of jobs,
where a job comprises:
- process count
- working directory
- command line arguments
- environment variables
When you link against libcram (or libfcram) and launch your job with a
particular cram file, Cram will split COMM_WORLD and run all the jobs in
the cram file independently. And yes, it handles args for Fortran, at
least on BG/Q. If people want that capability on some other platform, let
me know.
Cram is NOT a job scheduler; it is a simple, lightweight layer between
your jobs and the MPI runtime. There are no plans to support things like
job queues or emulating a resource manager inside an MPI job.
Cram was created to allow automated test suites to pack more jobs into a
BG/Q partition, and to run large ensembles on systems where the scheduler
won¹t scale. On BG/Q, for example, SLURM can run ~20,000 simultaneous
jobs before it falls over. With Cram we¹ve been able to run 1.5 million
MPI jobs at once, i.e. one job per core on all of Sequoia. This has been
useful so far for running large ensembles of uncertainty quantification
jobs.
For full documentation, see the github page above.
Thanks,
-Todd
More information about the mpiwg-tools
mailing list