[mpiwg-rma] Single RMA synchronization for several window handles

Jeff Hammond jeff.science at gmail.com
Mon Aug 11 15:50:46 CDT 2014


Then the right assertions are the solution, not a new function.

Jeff

On Mon, Aug 11, 2014 at 1:49 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> Unless the right asserts are passed, a process in a fence has to wait until all other processes have reached their fence. That is because, in the absence of asserts, the process doesn't know if it is the target of RMA operations from another process.
>
> Rajeev
>
> On Aug 11, 2014, at 3:22 PM, Jeff Hammond <jeff.science at gmail.com>
>  wrote:
>
>> Does the MPI standard require fence to do a barrier?  If not, then my
>> argument still stands.  The implementation only has to sync once per
>> remote target across N fences and the rest are nothing more than
>> function call overhead.
>>
>> If we want to optimize for functional call overhead, let's consider
>> MPI_Win_fencev, which makes way more sense than creating
>> franken-windows with this combiner function.
>>
>> Jeff
>>
>> On Mon, Aug 11, 2014 at 7:29 AM, Jim Dinan <james.dinan at gmail.com> wrote:
>>> It's more than function call overhead.  Fence usually includes a barrier, so
>>> this is one barrier versus several.
>>>
>>>
>>> On Mon, Aug 11, 2014 at 10:09 AM, Jeff Hammond <jeff.science at gmail.com>
>>> wrote:
>>>>
>>>> If function call overhead matters, MPI is probably the wrong model.
>>>>
>>>> You'll have to prove to me that this (function call overhead) is
>>>> significant compared to the cost of synchronization using empirical data on
>>>> some system.
>>>>
>>>> Jeff
>>>>
>>>> Sent from my iPhone
>>>>
>>>>> On Aug 11, 2014, at 1:49 AM, Rolf Rabenseifner <rabenseifner at hlrs.de>
>>>>> wrote:
>>>>>
>>>>> Jim and all,
>>>>>
>>>>> It is not syntactic sugar.
>>>>> It is a latency optimizing enhancement:
>>>>>
>>>>> If you are doing neighbor communication with 1-sided
>>>>> communication to 6 neighbors - based on 6 windows that
>>>>> are all defined in MPI_COMM_WORLD, then currently
>>>>> you need to call 6 times the synchronization calls
>>>>> which implies at maximum a 6 times larger latency,
>>>>> e.g., 6 times MPI_Win_fence instead of one call.
>>>>>
>>>>> If you are using MPI on shared memory windows,
>>>>> then the same latency problem exists, but here,
>>>>> the trick with dynamic windows is impossible
>>>>> because shared memory windows must be allocated
>>>>> with MPI_Win_allocate_shared.
>>>>>
>>>>> Of course, it is more complicated to define
>>>>> MPI_Win_combine on a superset of communicators used
>>>>> for all the combined windows.
>>>>>
>>>>> As a first effort,  it would be already helpful
>>>>> to define it for the same communicator.
>>>>> To allow enhancements in a future MPI version,
>>>>> I still would recommend to have the comm argument
>>>>> as part of the argument list.
>>>>>
>>>>> Rolf
>>>>>
>>>>> ----- Original Message -----
>>>>>> From: "Jim Dinan" <james.dinan at gmail.com>
>>>>>> To: "MPI WG Remote Memory Access working group"
>>>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>>>> Sent: Sunday, August 10, 2014 5:48:02 PM
>>>>>> Subject: Re: [mpiwg-rma] Single RMA synchronization for several window
>>>>>> handles
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hi Rolf,
>>>>>>
>>>>>>
>>>>>> I had initially proposed this in the context of passive target RMA.
>>>>>> Active target RMA already notifies the receiver when data has
>>>>>> arrived, but there is no efficient way to get such a notification in
>>>>>> passive target RMA.  I think what you are proposing here would be
>>>>>> syntactic sugar on top of the existing interface --- could I
>>>>>> implement this by fencing every window to determine that all
>>>>>> transfers are completed?
>>>>>>
>>>>>>
>>>>>> I agree with the comments regarding dynamic windows.  The merged
>>>>>> window would contain discontiguous buffers; thus, it would lose the
>>>>>> ability to do offset-based addressing and would need to use absolute
>>>>>> (BOTTOM-based) addressing.
>>>>>>
>>>>>>
>>>>>> ~Jim.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Aug 8, 2014 at 10:18 AM, Rolf Rabenseifner <
>>>>>> rabenseifner at hlrs.de > wrote:
>>>>>>
>>>>>>
>>>>>> Jim,
>>>>>>
>>>>>> your topic "Reducing Synchronization Overhead Through Bundled
>>>>>> Communication" may get also help if we would be able to
>>>>>> combine several window handles to one superset window handle.
>>>>>>
>>>>>> If you have several windows for different buffers, but
>>>>>> only one synchronization pattern, e.g. MPI_Win_fince
>>>>>> then currently you must call MPI_Win_fence seperately
>>>>>> for each window handle.
>>>>>>
>>>>>> I would propose:
>>>>>>
>>>>>> MPI_Win_combine (/*IN*/  int count,
>>>>>>                /*IN*/  MPI_Win *win,
>>>>>>                /*IN*/  MPI_Comm comm,
>>>>>>                /*OUT*/ MPI_Win *win_combined)
>>>>>>
>>>>>> The process group of comm must contain the process groups of all win.
>>>>>> The resulting window handle win_combined can be used only
>>>>>> in RMA synchronization calls and other helper routines,
>>>>>> but not for dynamic window allocation nor for any
>>>>>> RMA communication routine.
>>>>>> Collective synchronization routines must be called by all processes
>>>>>> of comm.
>>>>>> The semantics of an RMA synchronization call using win_combined
>>>>>> is defined as if the calls were seperately issued for
>>>>>> each window handle of the array win. If group handles
>>>>>> are part of the argument list of the synchronization call
>>>>>> then the appropriate subset is used for each window handle in win.
>>>>>>
>>>>>> What do you think about this idea for MPI-4.0?
>>>>>>
>>>>>> Best regards
>>>>>> Rolf
>>>>>>
>>>>>> ----- Original Message -----
>>>>>>> From: "Jim Dinan" < james.dinan at gmail.com >
>>>>>>> To: "MPI WG Remote Memory Access working group" <
>>>>>>> mpiwg-rma at lists.mpi-forum.org >
>>>>>>> Sent: Thursday, August 7, 2014 4:08:32 PM
>>>>>>> Subject: [mpiwg-rma] RMA Notification
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>>
>>>>>>> I have added a new proposal for an RMA notification extension:
>>>>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/439
>>>>>>>
>>>>>>>
>>>>>>> I would like to bring this forward for the RMA WG to consider as an
>>>>>>> MPI-4 extension.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> ~Jim.
>>>>>>> _______________________________________________
>>>>>>> mpiwg-rma mailing list
>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>
>>>>>> --
>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>>>>>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>>>>>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>>>>>> _______________________________________________
>>>>>> mpiwg-rma mailing list
>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpiwg-rma mailing list
>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>
>>>>> --
>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>>>>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>>>>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>>>>> _______________________________________________
>>>>> mpiwg-rma mailing list
>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>
>>>
>>>
>>> _______________________________________________
>>> mpiwg-rma mailing list
>>> mpiwg-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/



More information about the mpiwg-rma mailing list