[Mpi3-rma] [EXTERNAL] Re: request-based ops

Mon Jun 17 09:44:21 CDT 2013

First, I apologize for reply on top; outlook webmail sucks.

I think I agree with Jeff, option #2 is icky.

I think I also agree with Jim that my performance concerns could be overstated, as one could make iflush be a flush and return a completed request (to be test/waited on later).  There's no locking concerns, as you have to have the lock in order to call flush (in my implementation, anyway).  But it would limit other implementations which queued rma ops, and that would be unfortunate.

Since Pavan's on a make everything non-blocking kick, perhaps it's instructive to think about the ifence or iunlock (which I don't like, but I think is useful in this conversation).  In both, I think it's a relatively clean semantic to not allow communication operations between the operation and the test/wait, similar to how we wouldn't allow communication from another thread during fence / unlock, as the access epoch closes when the synchronization call starts.  In that case, why would we put different semantics in for flush?

Or, another way of thinking about it is that for all non-blocking communication operations, we don't allow the user to modify state between the non-blocking operation and the test/wait.  In most cases, the state is the user buffer, but in this case, the state is the operation state.  So again, I think there's good precedent for the easy to define, easy to understand rule that you can't start new communication operations between the iflush and the test/wait on that window, just like you couldn't modify the user buffer for isend/ibcast/etc.

Brian

--
  Brian W. Barrett
  Scalable System Software Group
  Sandia National Laboratories
________________________________________
From: mpi3-rma-bounces at lists.mpi-forum.org [mpi3-rma-bounces at lists.mpi-forum.org] on behalf of Jeff Hammond [jhammond at alcf.anl.gov]
Sent: Monday, June 17, 2013 8:14 AM
To: MPI 3.0 Remote Memory Access working group
Subject: Re: [Mpi3-rma] [EXTERNAL] Re: request-based ops

I do not want to consider interpretation 2 because it is problematic
for some implementations.  There are cases  where it requires the
flush to happen when the request is waited upon, in which case there
is absolutely no benefit over doing a blocking flush.

Actually, now that I think about it, every implementation has to issue
or re-issue a flush at the time the wait is called if there have been
an RMA operations issued (to the relevant targets) since the time the
first fence may have been issued.  Hence, an implementation that
issues the iflush eagerly may do twice the work whereas one that does
not is providing zero benefit over the blocking flush.

The only case where 2 is going to be effective and efficient is the
case where it reduces to 1 because no new RMA operations are issued
between the iflush and the wait.

Jeff

On Mon, Jun 17, 2013 at 8:52 AM, Jim Dinan <james.dinan at gmail.com> wrote:
> It seems like there are two possible semantics for which operations are
> complete by a call to MPI_Win_iflush:
>
> (1) Completes all operations issued by the origin process on the given
> window before MPI_Win_iflush was called.
> (2) Completes all operations issued by the origin process on the given
> window before MPI_Win_iflush completed.
>
> So far, we've just been looking at #1, but I think that #2 is worth
> considering.  Option #2 allows an implementation that just checks if the
> counters are equal.  This avoids the issue where #1 can't be implemented in
> terms of #2, because issuing an unbounded number of operations while testing
> on the iflush request can cause the iflush to not complete indefinitely.
>
> Option #2 does not directly provide the functionality that Jeff was looking
> for, but this could be implemented using two windows.  Issue a bunch of
> operations on win1 and iflush on win2, when win2 has been flushed, switch to
> issuing operations on win2 and iflushing win1.
>
>  ~Jim.
>
> On Mon, Jun 17, 2013 at 7:38 AM, Jim Dinan <james.dinan at gmail.com> wrote:
>>
>> Sorry, I should have been more specific.  An implementation of iflush that
>> waits for the completion of all messages should be valid.  Such an
>> implementation would compare counters and return true if they are the same.
>> This implementation could have the issue I mentioned in the previous
>> message, where the user continuously issuing operations can prevent iflush
>> from completing.
>>
>> Jim.
>>
>> On Jun 16, 2013 10:13 AM, "Pavan Balaji" <balaji at mcs.anl.gov> wrote:
>>>
>>>
>>> On 06/16/2013 10:02 AM, Jim Dinan wrote:
>>>>
>>>> If the channel is unordered, a message after the iflush can increment
>>>> the counter, while one before the iflush has not yet completed.  So, the
>>>> counter is not enough to mark a particular point in time.
>>>
>>>
>>> Ah, good point.
>>>
>>>> An implementation of iflush as flush should still be valid, right?  Just
>>>
>>>
>>> No.  You cannot do this if the user only uses TEST.
>>>
>>> MPI_WIN_IFLUSH(&req);
>>> while (MPI_TEST(req) is not done);
>>>
>>>  -- Pavan
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>
>
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
ALCF docs: http://www.alcf.anl.gov/user-guides
_______________________________________________
mpi3-rma mailing list
mpi3-rma at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma