[Mpi3-rma] RMA proposal 1 update

Pavan Balaji balaji at mcs.anl.gov
Tue May 18 14:48:47 CDT 2010


The only way it can be done today is by using a remote software agent, 
not through hardware -- which has been my point all along. That is, from 
though IB claims that it does remote completion for every request in 
hardware, as far as the MPI semantics go, remote completion and local 
completion cannot have the same cost.

  -- Pavan

On 05/18/2010 02:44 PM, Underwood, Keith D wrote:
> The same way it does it for unlock today...
> 
>> Ok, in that case, how will a network that only gives remote completion
>> till the adapter ensure ordering between the "foo" and "bar" variables
>> if they go over different adapters?
>>
>> Btw, there are a number of production systems that use multi-rail IB.
>>
>>   -- Pavan
>>
>> On 05/18/2010 02:07 PM, Underwood, Keith D wrote:
>>> Yes, you should get 100.  MPI_Flush() does remote completion, just
>> like MPI_Win_unlock().  How you do that on some hacked together dual
>> rail solution is up to the implementation ;-)
>>> Keith
>>>
>>>> -----Original Message-----
>>>> From: Pavan Balaji [mailto:balaji at mcs.anl.gov]
>>>> Sent: Tuesday, May 18, 2010 1:05 PM
>>>> To: Underwood, Keith D
>>>> Cc: MPI 3.0 Remote Memory Access working group
>>>> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>>>>
>>>>
>>>> On 05/18/2010 01:52 PM, Underwood, Keith D wrote:
>>>>> Now you are just trying to be difficult... First, your scenario is
>>>> not legal.  You have to call a local MPI_Lock()/MPI_Unlock() before
>>>> that data is visible in the private window to allow loads and
>> stores.
>>>> Even accessing that item that was Put over NIC1 is undefined until
>> the
>>>> source has done a completion operation.
>>>>
>>>> Sorry, I don't mean to. Relying on network ordering till memory just
>>>> seems hacky. So, I'm trying to see if there are cases where the
>> network
>>>> doesn't have full control on when things are written to memory.
>>>>
>>>>> Even then, I think you are discussing an ordering problem that
>> exists
>>>> in the base standard:  completing an MPI_Unlock() implies remote
>>>> completing.  Real remote completion.  Until MPI_Unlock() completes,
>>>> there is no guarantee of ordering between anything.  MPI_flush()
>> does
>>>> not add to this issue.
>>>>
>>>> Hmm.. Maybe I don't understand MPI_Flush very well then. Here's the
>>>> example case I was thinking of:
>>>>
>>>> MPI_Win_lock(target = 1, SHARED);
>>>> if (rank == 1) {
>>>> 	MPI_Put(win, target = 1, foo = 100, ...);
>>>> 	MPI_Flush(win, target = 1, ...);
>>>> 	MPI_Get_accumulate(win, target = 1, &bar, ...);
>>>> }
>>>> else if (rank == 0) {
>>>> 	do {
>>>> 		MPI_Get_accumulate(win, target = 1, &bar, ...);
>>>> 	} while (bar != 1); /* Get the mutex */
>>>> 	MPI_Get(win, target = 1, &foo, ...);
>>>> }
>>>> MPI_Win_unlock(target = 1);
>>>>
>>>> So, the question is, is process 1 guaranteed to get foo = 100 in
>> this
>>>> case? Note that there are no direct load/stores here, so everything
>> can
>>>> happen in shared lock mode.
>>>>
>>>>   -- Pavan
>>>>
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the mpiwg-rma mailing list