[Mpi3-ft] Defining the state of MPI after an error

Bronis R. de Supinski bronis at llnl.gov
Mon Sep 20 12:09:46 CDT 2010


> I did not intend to ignore your use case.

No problem.

> I did mention that I have no worries about asking MPI implementations 
> to refrain from blocking future MPI calls after an error is detected. 
> That was an implicit recognition of your use case.

OK, that helps.

> The MPI standard already forbids having an MPI call on one thread block 
> progress on other threads.  I would interpret that to include a case 
> where a thread is blocked in a collective communication or a MPI_Recv 
> that will never be satisfied. That is, the blocked MPI call cannot 
> prevent other threads from using libmpi.  Requiring libmpi to release 
> any lock it took even when doing an error return would be logical but 
> may not be implied by what is currently written.

The current text provides no such guarantee. Once anerror is
returned anywhere, all bets are off (at least that is how I
have read it; I would need to go back through the text to
find the exact words that cause my concern).

> Communicators provide a sort of isolation that keeps stray crap from 
> failed operations from spilling over (such as eager sent message for 
> which the MPI_Recv failed).  If the tool uses its own threads and 
> private communicators, I agree it is reasonable to ask any libmpi to 
> avoid sabotaging that communication.

That would be perfect from my perspective.

> Where I get concerned is when we start talking about affirmative 
> requirements for distributed  MPI state after an error

I don't think we can have those beyond "best effort".
The errors may indicate problems that make further
communication impossible -- perhaps because of the
erroneous action or just due to the state of the
network or other processes. I do think we can require
accurate return values and have an advice to implementers
that suggests best effort following errors. I believe
that would satisfy my requirements.


>                   Dick
> Dick Treumann  -  MPI Team
> IBM Systems & Technology Group
> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
> Tele (845) 433-7846         Fax (845) 433-8363
> From:   "Bronis R. de Supinski" <bronis at llnl.gov>
> To:     "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>
> Date:   09/20/2010 12:46 PM
> Subject:        Re: [Mpi3-ft] Defining the state of MPI after an error
> Sent by:        mpi3-ft-bounces at lists.mpi-forum.org
> ________________________________
> Dick:
> You seem to be ignoring my use case. Specifically, I
> have tool threads that use MPI. Their use of MPI should
> be unaffected by all of the scenarios that you are raising.
> However, the standard provides no way for me to tell if
> they work correctly in these situations. I just have to
> cross my fingers and hope.
> FYI: Your implementation has long met this requirement
> (my hopes are not dashed with it). Others have begun to
> recently. In any event, I would like some way to tell...
> Further, it is useful in many other scenarios apply to know
> that the implementation intends to remain usable. I am not
> looking for a promise of correct execution; I am looking
> for a promise of best effort and accurate return codes.
> Bronis
> On Mon, 20 Sep 2010, Richard Treumann wrote:
>> If there is any question about whether these calls are still valid after an error with an error handler that returns (MPI_ERRORS_RETURN or user handler)
>> MPI_Abort,
>> MPI_Error_string
>> MPI_Error_class
>> I assume it should be corrected as a trivial oversight in the original text.
>> I would regard the real issue as being the difficulty with assuring the state of remote processes.
>> There is huge difficulty in making any promise about how an interaction between a process that has not taken an error and one that has will behave.
>> For example, if there were a loop of 100 MPI_Bcast calls and on iteration 5, rank 3 uses a bad communicator, what is the proper state?  Either a sequence number is mandated so the other ranks hang quickly or a sequence number is prohibited so everybody keeps going until the "end" when the missing MPI_Bcast becomes critical.  Of course, with no sequence number, some tasks are stupidly using the iteration n-1 data for their iteration n computation.
>> Dick Treumann  -  MPI Team
>> IBM Systems & Technology Group
>> Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
>> Tele (845) 433-7846         Fax (845) 433-8363
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

More information about the mpiwg-ft mailing list