[mpiwg-ft] MPI_Comm_revoke behavior

Thu Dec 5 08:14:52 CST 2013

From: mpiwg-ft [mailto:mpiwg-ft-bounces at lists.mpi-forum.org] On Behalf Of George Bosilca
Sent: Wednesday, November 27, 2013 3:35 PM
To: MPI WG Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [mpiwg-ft] MPI_Comm_revoke behavior

On Nov 27, 2013, at 20:54 , Richard Graham <richardg at mellanox.com<mailto:richardg at mellanox.com>> wrote:

On Nov 27, 2013, at 20:33 , Richard Graham <richardg at mellanox.com<mailto:richardg at mellanox.com>> wrote:

I am thinking about the next step, and have some questions on the semantics of MPI_Comm_revoke()

What next step are you referring to?
[rich] To the full recovery stage.  Post what we are talking about now.

Full recovery stage? Can you expose a little more details here please.
[rich] the original intent was to allow for full restoration of communicators after failure, with minimal impact on those ranks that did not fail (don't want to get into what that means now ...).  Those goals were reduced for pragmatic reasons.  I want to make sure that when/if there is work continued in this direction, the current proposal does not preclude  this.  One of  the issues raised to me recently is that after a revoke one will not be able to accomplish such a goal on the remaining ranks - e.g., ranks will be reassigned.  I am following up very specifically on this question.

-          When the routine returns, can the communicator ever be used again ?  If I remember correctly, the communicator is available for point-to-point traffic, but not collective traffic - is this correct ?

A revoked communicator is unable to support any communication (point-to-point or collective) with the exception of agree and shrink. If this is not clear enough in the current version of the proposal we should definitively address it.
[rich] does this mean all current state (aside from who is alive) associated with the communicator is gone ?

Every deterministic information is still available (info and attributes). You can look for the group of processes associated with the communicator, as well as the group of failed. If what you are looking for is the possible unexpected messages, this is up to the implementation (see below).
[rich] don't understand

Can't rely on continuing sending pending messages ?

Not on a revoked communicator. If continuing to exchange messages is a requirement, the communicator should not be revoked.
[rich]  How does one then notify other ranks of the errors - does this have to be a user-level protocol ?

          Looking forward, if one wants to restart the failed ranks (let's assume we add support for this), what can be assume about the "repaired" communicator ?  What can't I assume about this communicator ?

What you can assume depends on what is the meaning of "repaired". Already today one can spawn new processes and reconstruct a communicator identical to the original communicator before any fault. This can be done using MPI dynamics together with the agreement available in the ULFM proposal.
[rich] This implies that all outstanding traffic is flushed - is this correct ?

This is up to the MPI implementation. This is specified on the first "Advice to implementors" on the second page.
[rich]  does not seem like a good idea - users should have guarantees on what they get if they use MPI.

  George.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20131205/1cb9e046/attachment.html>