[mpiwg-ft] Madrid Report
Jim Dinan
james.dinan at gmail.com
Tue Oct 1 15:55:07 CDT 2013
Hi Bronis,
Thanks for raising this, I think we were so focused on the interesting
fault tolerance problems that we overlooked the basic MPI error handling
fixups that need to be done.
The text you pointed out definitely does need to be changed. I think the
intended semantic is something along the lines of: If MPI notifies the user
that an error has occurred, then the failed operation has not completed
successfully. Depending on the nature of the error, any output from the
failed operation may be invalid, including local and non-local output MPI
handles and local and non-local output data buffers. Additional semantics
may apply depending on the operation and error class. For process
failures, we refer users to the fault tolerance chapter.
Something we are concerned about is that the broader space of errors could
take us into a deep rat hole that will stall progress on addressing fault
tolerance. For example, MPI_ERR_ARGS and MPI_ERR_TRUNCATE have different
implications to the output buffer of an MPI_Recv. The spec also does not
define what happens at the sender when the receiver gets an
MPI_ERR_TRUNCATE. Things can get more tricky when we consider collectives
where, for example, one process gave invalid arguments while the others
gave valid arguments. While it would be useful to define these semantics,
these situations are not really a part of the fault tolerance work.
I propose that we take the approach of cleaning up this text wherever we
find it, and fix it in a way that will facilitate future work to better
define behavior after errors are reported. For the FT WG, in particular,
we would focus just on defining the behavior after the MPI_ERR_PROC_FAILED
error is reported.
~Jim.
On Wed, Sep 25, 2013 at 12:51 PM, Bronis R. de Supinski <bronis at llnl.gov>wrote:
>
> Wesley:
>
> I disagree vehemently with your statement that "either one
> individually would be useless." While I was not at the
> meeting (and thus did not raise the objections or make this
> suggestion -- this time), I consider this statement to be
> a continued example of how those working on the proposal
> have failed to understand what users most desire from MPI
> in terms of a first step for fault tolerance. The biggest
> concern is simply that the MPI standard states on p28:24-26:
>
> This document does not specify the state of a computation
> after an erroneous MPI call has occurred. The desired
> behavior is that a relevant error code be returned, and
> the effect of the error be localized to the greatest
> possible extent.
>
> I believe at least one other example exists of this sort of
> text but I will not look for it now. The point is that the
> user has no guarantees about the state of MPI, the computation
> or anything else after an error. While that may be unavoidable,
> the user also has no why to query the implementation about
> what the most likely state is. Thus, the user must make the
> most pessimistic assumption and catching an error is pretty
> much useless other than to implement some very basic message -
> or if the user is feeling frisky, some more complex action -
> prior to shutdown soon after the error is observed.
>
> I think the working group has listened to a small set of users
> and missed that the vast majority of users would be happy (at
> least initially) if the standard provided such a capability.
> I have spoken to numerous users for whom this statement is
> true. Further, such a capability would be fairly easy to make
> low cost. While some details would need to be worked out,
> which would take time, you would find wide acceptance of
> that approach -- as opposed to trying to ram your complex,
> unintuitive interface down the throats of those who object.
>
> That brings me to my other major objection to your email,
> You state "I think we all expect that this proposal won't
> be passing unanimously." The point of deferring the FT
> proposal was not to have it pushed with increase urgency.
> The point was to have objections addressed and to make
> the adopted interface broadly accepted. You still seem
> to think that a narrowly passed proposal would be OK.
>
> Bronis
>
>
>
>
>
>
>
>
>
> On Wed, 25 Sep 2013, Wesley Bland wrote:
>
> Let me add my two cents here as well.
>>
>> As Jim mentioned, this wasn't a rehashed slide deck. This was a new one.
>> The confusion may have come from the fact that there haven't been any
>> changes to the proposal recently and the talk that Rich gave was designed
>> to be informational for those who hadn't attended the US meetings recently.
>>
>> As Jim mentioned, this was really supposed to be a justification talk to
>> give background and rationale. It sounds like we may not have achieved that
>> goal. It's unfortunate that more of us couldn't be there to help the
>> conversation, but I'd like to try to find out more about the specific
>> concerns that were raised as we've been working very hard to respond to
>> these sorts of things in the past.
>>
>> I don't know if you actually meant that there was a desire to see the
>> proposal split in two or not, but I'd really be against that. It would
>> present a lot of work and without both pieces, either one individually
>> would be useless. We've gone through the text many times to be sure that
>> we're presenting the least intrusive interface possible that still
>> accomplishes all of the needs of FT. As you mentioned, it isn't the most
>> user friendly interface and we don't expect it to be. However, the real
>> user friendly interfaces (C/R, replication, migration, strong consistency,
>> etc.) can be built on top of the proposal, which is what we intended in our
>> design. If that's one of the objections, then we many need to rethink
>> things here or do a better job of convincing the forum that this is a good
>> idea. FWIW, I believe that was always the original intent of MPI anyway.
>>
>> If it's going to be a problem to bring this to the forum for a reading in
>> December, we can push things out, but we've been targeting that date and
>> I'd like to have a better idea of what the objections are before we decide
>> to delay again. If it's just the same arguments again, then it may not do
>> any good to push this for another 3-6 months. I think we all expect that
>> this proposal won't be passing unanimously.
>>
>> Thanks for the feedback Martin. I know that you probably weren't the one
>> objecting and those who were probably aren't on this list. It's too bad
>> that more of us couldn't be at the meeting, but I think we're all familiar
>> with the funding situation in the US right now for international travel.
>>
>> Thanks,
>> Wesley
>>
>> On Sep 25, 2013, at 8:23 AM, Jim Dinan <james.dinan at gmail.com> wrote:
>>
>> Hi Martin,
>>>
>>> The slides that were presented in Madrid were actually a new deck. We
>>> had tried to address feedback that there is a need to do a better job of
>>> covering the core issues, potential approaches, and justifications for
>>> design decisions, beyond just presenting the proposal. It sounds like more
>>> discussion around these topics is needed, and we can work with that for the
>>> Chicago meeting.
>>>
>>> It sounds like we're being asked to break the proposal up into two sets
>>> of changes -- one that defines failures in MPI and MPI's state after a
>>> failure; and second that adds a recovery API. Is that right? I'm not sure
>>> if it's possible to tease these things apart, but we can look into it.
>>>
>>> Cheers,
>>> ~Jim.
>>>
>>>
>>> On Tue, Sep 24, 2013 at 2:48 PM, Martin Schulz <schulzm at llnl.gov> wrote:
>>>
>>>>
>>>> Hi all,
>>>>
>>>> This is more from the viewpoint of an external observer, since I
>>>> haven't had the time to really participate lately.
>>>>
>>>> Rich gave a presentation covering the current proposal, but he also
>>>> said that this was basically the old slide deck that had been shown before
>>>> and didn't contain any of the new work that the FT group has been doing in
>>>> coordinating with the other WGs. There were also several questions that
>>>> people asked and that we couldn't answer since Rich was the only FT group
>>>> member at the meeting.
>>>>
>>>> Based on this I got the feeling that many in the forum were still
>>>> concerned about this overarching proposal and would like to hear more. In
>>>> particular, a few meetings back after the MPI 3.0 vote we had a discussion
>>>> about what was missing to make such concerns go away and one thing that we
>>>> agreed on was to first go through the standard and clean things up to make
>>>> it compatible with an FT proposal. From what I can tell, a lot of this has
>>>> happened and it may help to explicitly present that before even going into
>>>> the API approach. This (plus an updated talk with a broader discussion with
>>>> more FT group members being present) may be better for the Chicago meeting
>>>> than doing an actual formal reading (which may be too early since there
>>>> doesn't seem to be consensus about the approach, yet).
>>>>
>>>> From the application side, I have seen a few people starting to use it.
>>>> One of our PDs is using it for a large MD application and there is also the
>>>> following work I saw at a recent conference:
>>>>
>>>> ftp://ftp.inf.ethz.ch/pub/**publications/tech-reports/7xx/**793.pdf<ftp://ftp.inf.ethz.ch/pub/publications/tech-reports/7xx/793.pdf>
>>>>
>>>> Both said that the interface is overly complex and has a very (too?)
>>>> high of an impact on applications, which makes me worried. Not sure how we
>>>> can address this, though.
>>>>
>>>> Martin
>>>>
>>>>
>>>>
>>>>
>>>> On Sep 19, 2013, at 6:43 PM, Aurélien Bouteiller <bouteill at icl.utk.edu>
>>>> wrote:
>>>>
>>>> I am currently in Europe and busy. My schedule will get back to normal
>>>> in october.
>>>>
>>>> I'm also interested in a summary of events though.
>>>>
>>>> Aurelien
>>>>
>>>>
>>>> Le 20 sept. 2013 à 00:42, Wesley Bland <wbland at mcs.anl.gov> a écrit :
>>>>
>>>> Hi WG,
>>>>
>>>> I've been out if the loop for a bit since my wife had our first baby
>>>> last week (everyone is doing great). I was wondering how things went at the
>>>> forum meeting last week. Was there any feedback from those in attendance?
>>>>
>>>> Also, I didn't see an email about this week's con call. I assume that
>>>> it didn't happen, but if it did can anyone mention what was discussed?
>>>>
>>>> Thanks,
>>>> Wesley
>>>> ______________________________**_________________
>>>> mpiwg-ft mailing list
>>>> mpiwg-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/**mailman/listinfo.cgi/mpiwg-ft<http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft>
>>>>
>>>>
>>>> --
>>>> * Dr. Aurélien Bouteiller
>>>> * Researcher at Innovative Computing Laboratory
>>>> * University of Tennessee
>>>> * 1122 Volunteer Boulevard, suite 309b
>>>> * Knoxville, TN 37996
>>>> * 865 974 9375
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ______________________________**_________________
>>>> mpiwg-ft mailing list
>>>> mpiwg-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/**mailman/listinfo.cgi/mpiwg-ft<http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft>
>>>>
>>>>
>>>> ______________________________**______________________________**
>>>> ____________
>>>> Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
>>>> CASC @ Lawrence Livermore National Laboratory, Livermore, USA
>>>>
>>>>
>>>> ______________________________**_________________
>>>> mpiwg-ft mailing list
>>>> mpiwg-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/**mailman/listinfo.cgi/mpiwg-ft<http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft>
>>>>
>>> ______________________________**_________________
>>> mpiwg-ft mailing list
>>> mpiwg-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/**mailman/listinfo.cgi/mpiwg-ft<http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft>
>>>
>>
>>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20131001/6ea2e7d1/attachment.html>
More information about the mpiwg-ft
mailing list