[Mpi3-rma] RMA proposal 1 update

Pavan Balaji balaji at mcs.anl.gov
Mon May 17 21:43:27 CDT 2010


This is orthogonal to the all_fence_all discussion, but I thought I'll 
clarify it nevertheless --

I had mentioned that IB has remote completion, but after the Forum 
Sayantan reminded me that IB's remote completion semantics are weaker 
than what MPI RMA would require. By remote completion, IB only talks 
about completion from the remote network adapters perspective, not the 
remote memory. So, in a case where there are multiple network adapters, 
the only way to know of remote completion is through software active 
messages, which has more overhead than the hardware giving a notification.

Now, with respect to all_fence_all, it doesn't seem any different from 
Win_fence. While I was arguing for having this function at the Forum, I 
think I see Keith's argument now. Assuming, all puts and gets have been 
initiated as they are called how can this function be made more 
efficient than just doing a fence_all + barrier? Do others have examples 
where this can be more efficient? Note that a bad implementation that 
just buffers all puts/gets and initiates them at the all_fence_all time 
might benefit from this, but I personally think we shouldn't encourage 
such implementations by giving such functionality.

  -- Pavan

On 05/16/2010 09:32 PM, Underwood, Keith D wrote:
> Before doing that, can someone sketch out the platform/API and the implementation that makes that more efficient?  There is no gain for Portals (3 or 4).  There is no gain for anything that supports Cray SHMEM reasonably well (shmem_quiet() is approximately the same semantics as MPI_flush_all).  Hrm, you can probably say the same thing about anything that supports UPC well - a strict access is basically a MPI_flush_all(); MPI_Put(); MPI_flush_all();... Also, I thought somebody said that IB gave you a notification of remote completion...
> 
> The question then turns to the "other networks".  If you can't figure out remote completion, then the collective is going to be pretty heavy, right?
> 
> Keith
> 
>> -----Original Message-----
>> From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
>> bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
>> Sent: Sunday, May 16, 2010 7:27 PM
>> To: MPI 3.0 Remote Memory Access working group
>> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>>
>> Tortsten,
>>
>> There seemed to be decent agreement on adding MPI_Win_all_flush_all
>> (equivalent to MPI_Win_flush_all called from every rank in the
>> communicator associated with the window) since this function can be
>> implemented far more efficiently as a collective than the equivalent
>> point-wise function calls.
>>
>> Is there a problem with adding this to your proposal?
>>
>> Jeff
>>
>> On Sun, May 16, 2010 at 12:48 AM, Torsten Hoefler <htor at illinois.edu>
>> wrote:
>>> Hello all,
>>>
>>> After the discussions at the last Forum I updated the group's first
>>> proposal.
>>>
>>> The proposal (one-side-2.pdf) is attached to the wiki page
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/RmaWikiPage
>>>
>>> The changes with regards to the last version are:
>>>
>>> 1) added MPI_NOOP to MPI_Get_accumulate and MPI_Accumulate_get
>>>
>>> 2) (re)added MPI_Win_flush and MPI_Win_flush_all to passive target
>> mode
>>> Some remarks:
>>>
>>> 1) We didn't straw-vote on MPI_Accumulate_get, so this function might
>>>   go. The removal would be very clean.
>>>
>>> 2) Should we allow MPI_NOOP in MPI_Accumulate (this does not make
>> sense
>>>   and is incorrect in my current proposal)
>>>
>>> 3) Should we allow MPI_REPLACE in
>> MPI_Get_accumulate/MPI_Accumulate_get?
>>>   (this would make sense and is allowed in the current proposal but
>> we
>>>   didn't talk about it in the group)
>>>
>>>
>>> All the Best,
>>>  Torsten
>>>
>>> --
>>>  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
>>> Torsten Hoefler         | Research Associate
>>> Blue Waters Directorate | University of Illinois
>>> 1205 W Clark Street     | Urbana, IL, 61801
>>> NCSA Building           | +01 (217) 244-7736
>>> _______________________________________________
>>> mpi3-rma mailing list
>>> mpi3-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>>>
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> jhammond at mcs.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>>
>> _______________________________________________
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> 
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the mpiwg-rma mailing list