[mpiwg-ft] A meeting this week

Aurélien Bouteiller bouteill at icl.utk.edu
Fri Nov 22 14:14:39 CST 2013


When doing MPI_Recv(…, rank=2, …),only failures of rank 2 have to be reported. Other failures do not have to be monitored at all (good: on most networks these days, the HPC fabric will report failure of 2 w/o the need for any supplementary actions on the implementation’s part). 

Now, consider the following example to understand why revoke is necessary (for correctness). 
In the following example, only the 2 processes neighboring “fail rank” have an opportunity to detect the failure. When they detect the failure, they “break” from the normal loop of the algorithm (presumably to go in Shrink or do some recovery activity).  Since they do not want to continue the sendrecv loop, they have to revoke, so that other process do not stall waiting for them to match the sendrecv call. 

    if( rank == failrank ) {
        if( args.verbose ) printf( "Rank %04d: SEPPUKU\n", rank );
        raise( SIGKILL );
    }

   do { 
        rc = MPI_Sendrecv( b1, count, MPI_DOUBLE, (rank+1)>=np? 0:rank+1, 1, 
                           b2, count, MPI_DOUBLE, (rank-1)<0? np-1:rank-1, 1, 
                           world, MPI_STATUS_IGNORE );
        if( MPI_SUCCESS != rc ) {
            MPI_Error_string( rc, errorstr, &elen); 
            if( MPI_ERR_PROC_FAILED == rc ) {
                if( args.verbose ) printf( "Rank %04d: exception (%s); calling Comm_revoke\n", rank, errorstr );
                OMPI_Comm_revoke( world );
		break;
            }
            else if( MPI_ERR_REVOKED == rc ) {
                break;
            }
        }
    } while( 1 );


I hope this clarifies. 



Le 22 nov. 2013 à 15:02, Jim Dinan <james.dinan at gmail.com> a écrit :

> So, what is the argument for having MPI_Comm/win_revoke?  Is it a performance, rather than a correctness argument?
> 
>  ~Jim.
> 
> 
> On Fri, Nov 22, 2013 at 2:01 PM, Wesley Bland <wbland at mcs.anl.gov> wrote:
> It doesn't require an asynchronous failure detector. It does require that you detect failures (in a unspecified way) insofar as it prevents completion. Once you enter the MPI library, you have to use some sort of detector (probably from the runtime level) to keep from getting deadlocked. 
> 
> Wesley
> 
> On Nov 22, 2013, at 12:53 PM, Jim Dinan <james.dinan at gmail.com> wrote:
> 
>> The latter case requires a failure detector, right?  I had thought the current design would avoid this requirement.
>> 
>>  ~Jim.
>> 
>> 
>> On Thu, Nov 21, 2013 at 12:06 PM, Wesley Bland <wbland at mcs.anl.gov> wrote:
>> No problem about missing the call yesterday. It was a last minute thing. I think we're in good shape to submit the text on Monday, but we're just doing some final passes over the text. There were a few changes that Aurélien will be making and I'm getting Gail to do an English pass, but overall, it's still essentially the same as what we read last year (per the request from the forum).
>> 
>> There are a couple of ways out of this deadlock. The first is, as you mentioned, to have a function in the library to essentially manually trigger an error handler and let the library figure out what is wrong. This method would work, but it is a bit heavy handed. The alternative solution is that the wildcard on process X should return an error because the failure of process Y meets the definition if an "involved process." Process X will get an exception (or an MPI_Errhandler) and can trigger the recovery path. 
>> 
>> Either way should work, but the later is obviously the preferred and expected solution. 
>> 
>> Thanks,
>> Wesley
>> 
>> On Nov 21, 2013, at 10:58 AM, Jim Dinan <james.dinan at gmail.com> wrote:
>> 
>>> Hi Guys,
>>> 
>>> Sorry I wasn't able to attend.  I'm back from SC now, if you need me.
>>> 
>>> I have a concern about the current approach to revoking communicators.  Consider a program that uses a library with a communicator, CL, that is private to the library.  Process X makes a call to this library and performs a wildcard receive on CL.  Process Y fails; Y would have sent a message to X on CL.  Process Z sees that Y failed, but it sees it in the user code, outside of the library.  Process Z cannot call revoke on CL because it does not have any knowledge about how the library is implemented and it does not have a handle to CL.
>>> 
>>> This seems like a situation that will result in deadlock, unless the library is also extended to include a "respond to process failure" function.  Is this handled in some other way, and I'm just not seeing it?
>>> 
>>> It seems like the revoke(comm) approach requires the programmer to know about all communication and all communicators/windows in use in their entire application, including those contained within libraries.  Is that a correct assessment?
>>> 
>>>  ~Jim.
>>> 
>>> 
>>> On Wed, Nov 20, 2013 at 2:39 PM, Aurélien Bouteiller <bouteill at icl.utk.edu> wrote:
>>> Rich, this is a followup of the proofreading work done during the regular meeting we had last week, and everybody, including SC attendees, had a chance to join. I am sorry you couldn’t.
>>> 
>>> Anyway, here is the working document for today: all diffs since the introduction of the new RMA chapter 5 month ago.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Le 19 nov. 2013 à 17:07, Richard Graham <richardg at mellanox.com> a écrit :
>>> 
>>> > With SC this week this is poor timing
>>> >
>>> > Rich
>>> >
>>> > ------Original Message------
>>> > From: Wesley Bland
>>> > To: MPI WG Fault Tolerance and Dynamic Process Control working Group
>>> > Cc: MPI WG Fault Tolerance and Dynamic Process Control working Group
>>> > ReplyTo: MPI WG Fault Tolerance and Dynamic Process Control working Group
>>> > Subject: Re: [mpiwg-ft] A meeting this week
>>> > Sent: Nov 19, 2013 2:13 PM
>>> >
>>> > Ok. I'll be there. I'll send it off for an editing today.
>>> >
>>> > Wesley
>>> >
>>> >> On Nov 19, 2013, at 3:12 PM, Aurélien Bouteiller <bouteill at icl.utk.edu> wrote:
>>> >>
>>> >> Dear WG members,
>>> >>
>>> >> We have been misreading the new forum rules. We have to buckle the text of the proposal this week and not in 2 weeks from now, so time is running short. I would like to invite you to a supplementary meeting tomorrow to make a review of the text together.
>>> >>
>>> >> Jim, I don’t know if you will be able to attend on short notice, but your input would be greatly appreciated.
>>> >>
>>> >> Date: Wed, November 20,
>>> >> Time: 3pm EDT/New York
>>> >> Dial-in information: 712-432-0360
>>> >> Code: 623998#
>>> >>
>>> >> Agenda:
>>> >> Review of ULFM text and final work.
>>> >>
>>> >> Aurelien
>>> >>
>>> >> --
>>> >> * Dr. Aurélien Bouteiller
>>> >> * Researcher at Innovative Computing Laboratory
>>> >> * University of Tennessee
>>> >> * 1122 Volunteer Boulevard, suite 309b
>>> >> * Knoxville, TN 37996
>>> >> * 865 974 9375
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> _______________________________________________
>>> >> mpiwg-ft mailing list
>>> >> mpiwg-ft at lists.mpi-forum.org
>>> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>>> > _______________________________________________
>>> > mpiwg-ft mailing list
>>> > mpiwg-ft at lists.mpi-forum.org
>>> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>>> > _______________________________________________
>>> > mpiwg-ft mailing list
>>> > mpiwg-ft at lists.mpi-forum.org
>>> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>>> 
>>> --
>>> * Dr. Aurélien Bouteiller
>>> * Researcher at Innovative Computing Laboratory
>>> * University of Tennessee
>>> * 1122 Volunteer Boulevard, suite 309b
>>> * Knoxville, TN 37996
>>> * 865 974 9375
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> _______________________________________________
>>> mpiwg-ft mailing list
>>> mpiwg-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>>> 
>>> _______________________________________________
>>> mpiwg-ft mailing list
>>> mpiwg-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>> 
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
>> 
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
> 
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft
> 
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375










More information about the mpiwg-ft mailing list