[Mpi3-ft] New RMA functions

Joshua Hursey jjhursey at open-mpi.org
Thu Feb 24 10:15:29 CST 2011


On Feb 24, 2011, at 10:53 AM, Pavan Balaji wrote:

> 
> The new RMA proposal is an extension of the existing MPI-2.2 RMA 
> interface; it'll all sit in the same chapter (please see Bill's slides 
> at the last Forum for the details).
> 
> Two forms of RMA communication operations will be present:
> 
> 1. The existing PUT/GET/ACCUMULATE operations which are from MPI-2.2. 
> They will not take a request operand, and we want to retain it that way 
> to minimize the performance overhead. Synchronization calls (such as 
> closing an epoch or flush/flushall) wait for their completion, but they 
> do not return a status object currently. Adding a status object to the 
> synchronization calls is an option, though that'll require extensive 
> changes. But adding them to the PUT/GET/ACCUMULATE operations themselves 
> would beat the purpose of low-overhead communication, so that might not 
> be doable.

I suspected that this was one of the reasons for not having it be part of the interface originally, since the MPI would have to internally track the request or status objects if they were used. So I'm fine with leaving these operations as is.

> 
> 2. The second set of operations is RPUT/RGET/RACCUMULATE, which take a 
> request operand. For this part, I don't believe you'll have any issue 
> with fault propagation, as you'd need to use WAIT/WAITALL/... to 
> complete these requests. But remember that WAIT/WAITALL only complete 
> these requests locally for these operations (i.e., you can reuse the 
> buffer), while most synchronization operations complete the operations 
> at the remote target.

Yeah I think those would be fine as they are currently formed, since we can access the status object through the wait* operations. If a process does an RPUT to a dead process (or dead just after the RPUT returns) then we can set the error in the request, which the user can access though the status object after the wait* operation. I think we would probably encourage users that would find such notification useful to use the RPUT/RGET/RACCUMULATE operations instead of the request-less counterparts - maybe as an advice to users.

> 
> Overall, if the user uses PUT/GET/ACCUMULATE, returning errors during 
> synchronization calls only is OK, IMO. If the user wants error returns 
> per operation, then he/she should use RPUT/RGET/RACCUMULATE. But it 
> might still require the synchronization calls to return a status object.
> 
> Is this sufficient for you guys? I can bring it up at the next RMA 
> telecon if needed.

Thanks for the insight. I think this helps, and seems to fit with the current way we are thinking about process fault tolerance in the one-sided chapter. We'll have to take a closer look at the proposal just to make sure (I have only looked through the sides so far).

The question that we are trying to address is what semantics and interfaces would be useful to someone using RMA operations if a process in the target or source groups is dead. The way we have specified it so far is that the only really meaningful check happens at the synchronization operation, so the user only knows that a process failed sometime during the epoch, not necessarily which operation. If the need to know which operation they should use the new RPUT/RGET/RACCUMULATE operations, if they don't then they can use the non-request operations. So I think the RPUT/RGET/RACCUMULATE operations fill the gap that we found in the current standard.

Thanks again,
Josh

> 
> Thanks,
> 
>  -- Pavan
> 
> On 02/24/2011 08:29 AM, Joshua Hursey wrote:
>> Thanks for checking. I'm glad that the model we are pushing forward with regard to fault checking at synchronization points seems to work for the RMA folks.
>> 
>> So I was thinking about adding an optional status object (rather than a request object) to put/get/accumulate operations that is filled in at the synchronization event. This is slightly different than the way we use status objects in other places in the standard, but I don't know if they need to be a full request. In particular, we want to provide the user with the option to disregard the parameter if they don't care about the specific operation which can be done with a MPI_STATUS_IGNORE, but there is no equivalent for requests. Since the synchronization operations are effectively waitall's on all requests posted during the epoch, we really only care about the status (and we want to avoid a conversation about canceling a one-sided request). What do you all think about that?
>> 
>> So is this (status objects for put/get/accumulate) something that they have already put in the RMA proposal, or just something they would be open to adding?
>> 
>> As I mentioned on the call, I am hesitant to change the one-sided interfaces currently in the standard to add the status object since the new RMA functionality is bring brought forward. Is the intention of the new RMA proposal to replace the current one-sided chapter, or to sit beside it? If the latter, then we may want to consider a smaller proposal just to add the status object to the one-sided operations. If the former, then we can just wait for the new semantics.
>> 
>> Thanks,
>> Josh
>> 
>> On Feb 23, 2011, at 2:43 PM, Darius Buntinas wrote:
>> 
>>> 
>>> After today's concall, I talked to Pavan about fault-tolerance and the new RMA functions.  He said that it would be appropriate to check for/report errors at synchronization points (like the end of epochs and things like flush), and for operations that take requests (like puts with requests).
>>> 
>>> This sounds like what was suggested during the call.
>>> 
>>> -d
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>> 
>> 
>> ------------------------------------
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey





More information about the mpiwg-ft mailing list