[Mpi3-ft] New RMA functions
balaji at mcs.anl.gov
Thu Feb 24 09:53:00 CST 2011
The new RMA proposal is an extension of the existing MPI-2.2 RMA
interface; it'll all sit in the same chapter (please see Bill's slides
at the last Forum for the details).
Two forms of RMA communication operations will be present:
1. The existing PUT/GET/ACCUMULATE operations which are from MPI-2.2.
They will not take a request operand, and we want to retain it that way
to minimize the performance overhead. Synchronization calls (such as
closing an epoch or flush/flushall) wait for their completion, but they
do not return a status object currently. Adding a status object to the
synchronization calls is an option, though that'll require extensive
changes. But adding them to the PUT/GET/ACCUMULATE operations themselves
would beat the purpose of low-overhead communication, so that might not
2. The second set of operations is RPUT/RGET/RACCUMULATE, which take a
request operand. For this part, I don't believe you'll have any issue
with fault propagation, as you'd need to use WAIT/WAITALL/... to
complete these requests. But remember that WAIT/WAITALL only complete
these requests locally for these operations (i.e., you can reuse the
buffer), while most synchronization operations complete the operations
at the remote target.
Overall, if the user uses PUT/GET/ACCUMULATE, returning errors during
synchronization calls only is OK, IMO. If the user wants error returns
per operation, then he/she should use RPUT/RGET/RACCUMULATE. But it
might still require the synchronization calls to return a status object.
Is this sufficient for you guys? I can bring it up at the next RMA
telecon if needed.
On 02/24/2011 08:29 AM, Joshua Hursey wrote:
> Thanks for checking. I'm glad that the model we are pushing forward with regard to fault checking at synchronization points seems to work for the RMA folks.
> So I was thinking about adding an optional status object (rather than a request object) to put/get/accumulate operations that is filled in at the synchronization event. This is slightly different than the way we use status objects in other places in the standard, but I don't know if they need to be a full request. In particular, we want to provide the user with the option to disregard the parameter if they don't care about the specific operation which can be done with a MPI_STATUS_IGNORE, but there is no equivalent for requests. Since the synchronization operations are effectively waitall's on all requests posted during the epoch, we really only care about the status (and we want to avoid a conversation about canceling a one-sided request). What do you all think about that?
> So is this (status objects for put/get/accumulate) something that they have already put in the RMA proposal, or just something they would be open to adding?
> As I mentioned on the call, I am hesitant to change the one-sided interfaces currently in the standard to add the status object since the new RMA functionality is bring brought forward. Is the intention of the new RMA proposal to replace the current one-sided chapter, or to sit beside it? If the latter, then we may want to consider a smaller proposal just to add the status object to the one-sided operations. If the former, then we can just wait for the new semantics.
> On Feb 23, 2011, at 2:43 PM, Darius Buntinas wrote:
>> After today's concall, I talked to Pavan about fault-tolerance and the new RMA functions. He said that it would be appropriate to check for/report errors at synchronization points (like the end of epochs and things like flush), and for operations that take requests (like puts with requests).
>> This sounds like what was suggested during the call.
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
More information about the mpiwg-ft