[Mpi3-ft] Defining the state of MPI after an error

Richard Treumann treumann at us.ibm.com
Wed Sep 22 22:41:29 CDT 2010


Rich 

I am not on the FT working group. I do not know the ins and out of your 
ultimate goals or how you hope they can be accomplished.  I do not intend 
to become immersed in the complete FT specification. I cannot afford the 
time.

Those who are immersed seem to feel they have a "minor" tweak to add to 
the standard before they have the rest of FT worked out.  It does not 
sound "minor" to me but maybe I have misunderstood some part of it.

I think the logical thing for me to do is wait for the ticket. To be 
voted, the ticket must completely specify the functionality and include 
the precise text to explain what it means.  If, by the time you have a 
ticket, my concerns are resolved by what it says, I will not object.

If I am still concerned by what the ticket proposes, I must lobby against 
it and at that point I can stick to the merits or risks contained in the 
ticket language only.

At this point, I am trying to discuss the shape of a cloud of smoke and 
the discussion has become unproductive.  I have tried several ways of 
making my concern understandable so I do not know what to make of the 
question "What are your objections here?" at this stage.

I understand that something like MPI_ERR_CANNOT_CONTINUE could possibly 
make sense in the context of a complete FT chapter that devotes tens of 
pages to saying exactly when it is appropriate to return and how it is to 
be understood by people who wish to write portable MPI applications.

            Dick

Dick Treumann  -  MPI Team 
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363




From:
"Graham, Richard L." <rlgraham at ornl.gov>
To:
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" 
<mpi3-ft at lists.mpi-forum.org>
Date:
09/22/2010 08:52 PM
Subject:
Re: [Mpi3-ft] Defining the state of MPI after an error
Sent by:
mpi3-ft-bounces at lists.mpi-forum.org



Dick,
  What are your objections here ?  All the current proposal is doing is 
trying to define a set of consistent return codes, and is not changing 
anything about MPI.  I am not sure there is sufficient information to act 
on in all cases with the current error handling in MPI, but may be wrong 
on this.  However, there is nothing else that is being proposed at this 
stage.
  Also, if no changes are made to current implementations, there is a 
chance that apps will hang, but this is no different than if users set 
errors return today.  Which would need to do be done to see the effects of 
the changes.
  Now, it is fair to ask if this really adds something.  If there is 
intent to recover from errors, which is what the FT working group is 
trying to figure out, then this has a lot of value in that it is the venue 
for an initial discussion of how to extend the error classes, which is 
really what Josh has been trying to do.  I believe we need an 
implementation and some app experiments to find what we have missed.
  As for the choice of "MPI_ERR_CANNOT_CONTINUE" how is this any different 
than malloc returning null ?  It tells the user that the library is no 
longer functional, and leaves it to the app to decide how to respond. 
Depending on the implementation, there are "error" scenarios that both the 
app and the MPI library can survive.  Failure of alloc_mem may be such a 
function.  An app may also decide that sending data to a specific 
destination may  also be ok - I can give a several real use cases that 
were brought to us as we were looking into this that would be just fine 
with this.  Now the collective operations are another question.
  So, all this proposal is really doing is start to revive the FT 
discussion at the Forum level, as partial implementation is getting to a 
state that it can be evaluated.  This is really why it is important to 
understand the specific shortcomings you see in what is being proposed - 
just the error propagation issues.

Thanks,
Rich

On 9/22/10 6:55 PM, "Richard Treumann" <treumann at us.ibm.com> wrote:


We are kind of going in circles because the context and rationale for 
CANNOT_CONTINUE are still too ambiguous.

My argument is against adding it into the standard first and figuring out 
later what it means.

I will wait for the ticket. If the ticket gives a full and convincing 
specification of what the implementor and the user are to do with it,, I 
will make my judgement based on the whole description.

If the ticket says "Put this minor change in today and we will decide 
later what it means, I must lobby the Forum to reject the ticket..

Note
1)  all current errors detected by an MPI application map to an existing 
error class. An error cannot map to two error classes so if some user 
error handler is presently checking for MPI_ERR_OP after a non-SUCCESS 
return from MPI_Reduce and the implementation moves the return code for 
passing a bad OP from class MPI_ERR_OP to MPI_ERR_CANNOT_CONTINUE it has 
just broken a user code.
2) Mandating that every MPI call after a MPI_ERR_CANNOT_CONTINUE must 
return MPI_ERR_CANNOT_CONTINUE will require that every MPI call check a 
global flag (resulting in overhead and possible displacement of other data 
from cache)


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363



From: Darius Buntinas <buntinas at mcs.anl.gov>
To: "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" 
<mpi3-ft at lists.mpi-forum.org>
Date: 09/22/2010 05:47 PM
Subject: Re: [Mpi3-ft] Defining the state of MPI after an error
Sent by: mpi3-ft-bounces at lists.mpi-forum.org
________________________________




On Sep 22, 2010, at 2:29 PM, Richard Treumann wrote:

>
> You lost me there - in part, i am saying it is useless because there are 
almost zero cases in which it would be appropriate.  How does that make it 
"a minor change"?

Well I figure we're just adding an error class that the implementation can 
return to the user if it gives up and can't continue.  That's minor. 
Whether or not it's useful is another story :-)

> Can you provide me the precise text you would add to the standard? 
Exactly how does the CANNOT_CONTINUE work?  Under what conditions does an 
MPI process see a CANNOT_CONTINUE and what does it mean?

I don't know yet.  It might be something as simple as adding an entry to 
the error class table with a description like:

    Process can no longer perform any MPI operations.  If an MPI operation
    returns this error class, all subsequent calls to MPI functions will
    return this error class.

> Please look at the example again.  The point was that there is nothing 
there that would justify a CANNOT_CONTINUE and MPI is still working 
correctly. Despite that, the behavior is a mess from the algorithm 
viewpoint after the error.

Since we haven't defined what happens in a failed collective yet, consider 
an implementation could will not continue after a failed collective.  The 
odd numbered processes that did not immediately return from barrier with 
an error will continue with the barrier protocol (say it's recursive 
doubling).  Some of the odd processes will need to send messages to some 
of the even processes.  Upon receiving these messages, the even processes 
will respond with an I_QUIT message, or perhaps the connection is closed, 
so the odd processes will get a communication error when trying to send 
the message.  In either case, the odd processes will notice that 
something's wrong with the other processes, and return an error.  The 
second barrier will return a CANNOT_CONTINUE on all of the processes.

OK, what if the odd processes can't determine that the even processes 
can't continue?  The odd processes would hang in the first barrier, and 
the even numbered processes would get a CANNOT_CONTINUE from the second 
barrier.

So we either get a hang, or everyone gets a CANNOT_CONTINUE but we avoided 
the discombobulated scenario.

-d



_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft <
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft>




_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100922/52b4b719/attachment-0001.html>


More information about the mpiwg-ft mailing list