[mpiwg-ft] MPI_Comm_revoke behavior

George Bosilca bosilca at icl.utk.edu
Wed Nov 27 13:47:22 CST 2013


On Nov 27, 2013, at 20:33 , Richard Graham <richardg at mellanox.com> wrote:

> I am thinking about the next step, and have some questions on the semantics of MPI_Comm_revoke()

What next step are you referring to?

> -          When the routine returns, can the communicator ever be used again ?  If I remember correctly, the communicator is available for point-to-point traffic, but not collective traffic – is this correct ?

A revoked communicator is unable to support any communication (point-to-point or collective) with the exception of agree and shrink. If this is not clear enough in the current version of the proposal we should definitively address it.

> -          Looking forward, if one wants to restart the failed ranks (let’s assume we add support for this), what can be assume about the “repaired” communicator ?  What can’t I assume about this communicator ?

What you can assume depends on what is the meaning of “repaired”. Already today one can spawn new processes and reconstruct a communicator identical to the original communicator before any fault. This can be done using MPI dynamics together with the agreement available in the ULFM proposal.

  George.

> Rich
>  
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20131127/87440eb2/attachment-0001.html>


More information about the mpiwg-ft mailing list