[mpiwg-ft] MPI_Comm_revoke behavior
bosilca at icl.utk.edu
Wed Nov 27 13:47:22 CST 2013
On Nov 27, 2013, at 20:33 , Richard Graham <richardg at mellanox.com> wrote:
> I am thinking about the next step, and have some questions on the semantics of MPI_Comm_revoke()
What next step are you referring to?
> - When the routine returns, can the communicator ever be used again ? If I remember correctly, the communicator is available for point-to-point traffic, but not collective traffic – is this correct ?
A revoked communicator is unable to support any communication (point-to-point or collective) with the exception of agree and shrink. If this is not clear enough in the current version of the proposal we should definitively address it.
> - Looking forward, if one wants to restart the failed ranks (let’s assume we add support for this), what can be assume about the “repaired” communicator ? What can’t I assume about this communicator ?
What you can assume depends on what is the meaning of “repaired”. Already today one can spawn new processes and reconstruct a communicator identical to the original communicator before any fault. This can be done using MPI dynamics together with the agreement available in the ULFM proposal.
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft