[mpiwg-ft] MPI_Comm_revoke behavior

Richard Graham richardg at mellanox.com
Wed Nov 27 13:54:55 CST 2013


From: mpiwg-ft [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of George Bosilca
Sent: Wednesday, November 27, 2013 2:48 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [mpiwg-ft] MPI_Comm_revoke behavior

On Nov 27, 2013, at 20:33 , Richard Graham <richardg at mellanox.com<mailto:richardg at mellanox.com>> wrote:

I am thinking about the next step, and have some questions on the semantics of MPI_Comm_revoke()

What next step are you referring to?
[rich] To the full recovery stage.  Post what we are talking about now.

-          When the routine returns, can the communicator ever be used again ?  If I remember correctly, the communicator is available for point-to-point traffic, but not collective traffic - is this correct ?

A revoked communicator is unable to support any communication (point-to-point or collective) with the exception of agree and shrink. If this is not clear enough in the current version of the proposal we should definitively address it.
[rich] does this mean all current state (aside from who is alive) associated with the communicator is gone ?  Can't rely on continuing sending pending messages ?

          Looking forward, if one wants to restart the failed ranks (let's assume we add support for this), what can be assume about the "repaired" communicator ?  What can't I assume about this communicator ?

What you can assume depends on what is the meaning of "repaired". Already today one can spawn new processes and reconstruct a communicator identical to the original communicator before any fault. This can be done using MPI dynamics together with the agreement available in the ULFM proposal.
[rich] This implies that all outstanding traffic is flushed - is this correct ?



mpiwg-ft mailing list
mpiwg-ft at lists.mpi-forum.org<mailto:mpiwg-ft at lists.mpi-forum.org>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20131127/fbb57bc5/attachment-0001.html>

More information about the mpiwg-ft mailing list