[Mpi3-ft] Persistent Communication & Sendrecv

Joshua Hursey jjhursey at open-mpi.org
Tue Aug 31 14:59:58 CDT 2010


I think I see what is getting confused. We are not changing MPI_Recv_init or MPI_Irecv from the current standard. So the status object is only available on a call to one of the completion functions.

I was running with the assumption that if a process is known to be failed at the call to either of these functions then they will return MPI_ERR_IN_STATUS (even though these functions don't have a status object, this tells the user to call a completion call to access the status object). Further, if the process is not known to be failed until the call to MPI_Start{all}, then the start function would return MPI_ERR_IN_STATUS. Then the application should use one of the completion calls to access the error code from the returned status object.

Thinking about this some more and rereading the standard, both MPI_Irecv, MPI_Recv_init, MPI_Start{all} should return immediately since they are local operations. Their return code should -not- be determined by the other processes. I say this after reading the following sentence in the introduction to section 3.7:
"In all cases, the send start call is local: it returns immediately, irrespective of the status of other processes."

So for MPI_Startall, it will start all of the requests and return MPI_SUCCESS (unless one of the arguments is bad). It will not be until the user calls a completion function that they will be notified of an error, even if it was known at the time of the call to MPI_Startall.

How do people feel about this?

I'll sketch out some examples on the wiki that we can use to help the discussion.

-- Josh

On Aug 31, 2010, at 3:15 PM, Bronis R. de Supinski wrote:

> 
> Josh:
> 
> OK, so I am not fully up to date on the FT plans. If you
> have MPI_Recv_init or MPI_Irecv return a status (which
> is how I read the idea) then it CANNOT return what was
> matched. If, as this post reads, you mean the completion
> associated with the request then I am not clear on how
> it is relevant whether it was an immediate or persistent
> request. Perhaps we are talking across each other...
> 
> Bronis
> 
> 
> 
> On Tue, 31 Aug 2010, Joshua Hursey wrote:
> 
>> I agree that the count=0 is odd (section 3.2.2 in case anyone is interested). Maybe it is for control messages where the buffer does not matter, just that you received a message (e.g., Barrier).
>> 
>> I don't know if I follow the rest. So when I call:
>> ------------------
>> MPI_Irecv(MPI_ANY_SOURCE, MPI_ANY_TAG, count=0, &req);
>> MPI_Wait(req, status);
>> ------------------
>> The field values in 'status' will match the incoming received message. This way the user can figure out from whom and with what tag this recv matched. Or with what error the request failed.
>> 
>> If the user were to write:
>> ------------------
>> MPI_Irecv(MPI_ANY_SOURCE, MPI_ANY_TAG, count=0, &req);
>> MPI_Wait(req, status);
>> MPI_Wait(req, status);
>> ------------------
>> The second wait would return MPI_SUCCESS with an 'empty' status since the first wait destroys the request.
>> 
>> For the persistent situation, the MPI_Wait does not destroy the request, but just sets it inactive. So in:
>> ------------------
>> MPI_Recv_init(MPI_ANY_SOURCE, MPI_ANY_TAG, count=0, &req);
>> MPI_Start(req);
>> MPI_Request_get_status(req, flag, status); /* A */
>> MPI_Wait(req, status);
>> MPI_Request_get_status(req, flag, status); /* B */
>> MPI_Wait(req, status);
>> ------------------
>> The second MPI_Wait would still return an 'empty' status since the request is inactive. The MPI_Request_get_status in (A), if the request is finished, will set the flag to true and return a status with the various fields set to match the incoming message. If the request is still active then it will return false, and the 'empty' status. The MPI_Request_get_status in (B) will set the flag to true (the request is not active or null) and return 'empty' status.
>> 
>> So in the Startall case the user can figure out which requests were started, which failed, and which were not started by checking the arguments returned by MPI_Request_get_status one at a time:
>> forall requests call MPI_Request_get_status(req, flag, status)
>> if( flag ) {
>>  if( status.error != success ) // Failed
>>  if( status == empty ) // Not started
>>  if( status != empty ) // Started, and now complete
>> } else {
>>  // Started, and not yet complete
>> }
>> 
>> What am I missing?
>> 
>> -- Josh
>> 
>> 
>> On Aug 31, 2010, at 1:20 PM, Bronis R. de Supinski wrote:
>> 
>>> 
>>> Huh? No, the MPI_Recv_init creates a persistent request with
>>> MPI_ANY_SOURCE and MPI_ANY_TAG for those fields. Combined with
>>> the MPI_Startall, it is roughly the same as this:
>>> 
>>> MPI_Irecv (MPI_ANY_SOURCE, MPI_ANY_TAG, count=0, &req[1]);
>>> 
>>> (I'll admit the "count=0" arg is a bit odd). A wait on the
>>> started request can match any message (again, the count is
>>> a bit strange and Josh omitted the type arg but it is
>>> pseudocode so no big deal).
>>> 
>>> On Tue, 31 Aug 2010, Joshua Hursey wrote:
>>> 
>>>> 
>>>> On Aug 31, 2010, at 1:09 PM, Fab Tillier wrote:
>>>> 
>>>>> Joshua Hursey wrote on Tue, 31 Aug 2010 at 08:43:24
>>>>> 
>>>>>> I was thinking more about MPI_Startall() this morning and found a
>>>>>> situation where this technique would not work.
>>>>>> 
>>>>>> If the application does:
>>>>>> --------------------
>>>>>> MPI_Send_init(rank=1, tag=123, req[0]);
>>>>>> MPI_Recv_init(MPI_ANY_SOURCE, MPI_ANY_TAG, count=0, req[1]);
>>>>>> MPI_Send_init(rank=2, tag=123, req[2]);
>>>>>> 
>>>>>> MPI_Startall(3, req) // Fails with MPI_ERR_IN_STATUS
>>>>>> if( failed ) {
>>>>>> for(i=0; i<3; ++i) {
>>>>>>  MPI_Request_get_status(req[i], flag, status);
>>>>>>  if( flag && status.error != success ) // Failed
>>>>>>  if( flag && status == empty ) // Not started
>>>>>>  if( flag && status != empty ) // Complete
>>>>>> }
>>>>>> }
>>>>>> --------------------
>>>>>> 
>>>>>> The problem is with the definition of an 'empty' status which has
>>>>>> (section 3.7.3):
>>>>>> ---------------------
>>>>>> source = MPI_ANY_SOURCE
>>>>>> tag = MPI_ANY_TAG
>>>>>> error = MPI_SUCCESS
>>>>>> MPI_Get_count = 0
>>>>>> MPI_test_cancelled = false
>>>>>> ---------------------
>>>>>> So the successful completion of the MPI_Recv_init() call would be
>>>>>> indistinguishable from the 'not started' or inactive state of the call.
>>>>> 
>>>>> Wouldn't a successful completion of the MPI_Recv_init() return a specific source and tag for the message actually received?  The source and tag fields of the receive are for filtering incoming sends, but when the receive completes, it was matched to exactly one send, with a specific tag and source.
>>>>> 
>>>>> What am I missing?
>>>> 
>>>> Ah yes. You are correct. So this is not a problem then, and the MPI_Request_get_status() technique would be a good way to check the state of all of the requests.
>>>> 
>>>> Thanks,
>>>> Josh
>>>> 
>>>>> 
>>>>> -Fab
>>>>> 
>>>>> _______________________________________________
>>>>> mpi3-ft mailing list
>>>>> mpi3-ft at lists.mpi-forum.org
>>>>> http://**lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>> 
>>>> 
>>>> ------------------------------------
>>>> Joshua Hursey
>>>> Postdoctoral Research Associate
>>>> Oak Ridge National Laboratory
>>>> http://**www.**cs.indiana.edu/~jjhursey
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://**lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>> 
>>>> 
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://*lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>> 
>> 
>> ------------------------------------
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://*www.*cs.indiana.edu/~jjhursey
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://*lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> 
>> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://www.cs.indiana.edu/~jjhursey








More information about the mpiwg-ft mailing list