[Mpi-22] reConfigurable MPI

siavash ghiasvand siavash.ghiyasvand at [hidden]
Thu Jul 1 14:26:39 CDT 2010


>
> You seem to be interested in the situation where resources are added to a
> cluster (or maybe freed up by other jobs completing) and having a running
> MPI job get notified asynchronously that there are newly available resources
> it can make a bid for.

Absolutely, this is what I want to do. But first of all I tried to know why
MPI as the leader of HPC clusters world didn't include this concept in its
standard (May be it's totally against the HPC world!). With Mr.Solt,
Mr.Gropp and your guides now I know the answer of that "Why?"

> This idea of the cluster manager pushing resources to a job without regard
> to where the job is in its execution would bring lots of new issues. I am
> not aware of anybody having made a serious attempt to even define what would
> be needed inside the MPI standard to let applications catch and act on an
> asynchronous notification like this.
>

I heard about (I'm not sure) something like this in "MPI/GAMMA
Project" [1<http://www.disi.unige.it/project/gamma/mpigamma/>],
which pushes additional resources to a running MPI cluster and when the
running program reaches mpi_barrier point those new resources are getting
involved (completely asynchronous).

> My first guess is that pushing an offer of additional resource would not be
> very hard to design into a resource manager but the MPI API side of how to
> react asynchronously to that offer would be very complex.
>

You are right, the automatic way for handling this, is really breathtaking.
for example, if we divided a loop for 5 machines and the cluster is running
now, how we can involve a new (6th) machine without restarting the entire
cluster?!

> Running job decides to try for more resource vs resource manager tries to
> volunteer more resource to running job

In PVM we have two functions pvm_addhosts and pvm_delhosts
[2<http://docs.cray.com/books/004-3686-001/html-004-3686-001/vemjlb.html>]
and they can more or less handle the first type ("Running job decides to try
for more resource") but the great issue is with the second one: "resource
manager tries to volunteer more resource to running job" which means jobs
are not aware about those new resources.

Any help or Idea in this concept would be greatly appreciated.

[1]  http://www.disi.unige.it/project/gamma/mpigamma/
[2]  http://docs.cray.com/books/004-3686-001/html-004-3686-001/vemjlb.html

Regards,
Siavash



* 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpi-22/attachments/20100701/9e375303/attachment.html>


More information about the Mpi-22 mailing list