[mpiwg-rma] Single RMA synchronization for several window handles

Mon Aug 11 12:47:15 CDT 2014

One doesn't have to define 6 different windows for the stencil example. If you define the whole array as one window, you can do all the 6 puts or gets and then a single fence.

Rajeev

On Aug 11, 2014, at 10:47 AM, Rolf Rabenseifner <rabenseifner at hlrs.de> wrote:

> It is not function call overhead, it is my 7 point-stencil-example
> executing 6 times a synchronization pattern (e.g. all what is needed
> for MPI_Win_fence) or only one time this synchronization pattern.
> And as far as I see, the multiple synchronizations
> cannot be discarded by the MPI library based on some
> intelligent optimization.
> 
> More than this, one RMA epoch needs typically two synchroniizations,
> i.e., the question is whether 12 or two MPI_Win_fence are needed;
> i.e., difference is 10, i.e., my proposal would allow
> that 10 times the MPI_Win_fence latency is removed.
> 
> Rolf 
> 
> ----- Original Message -----
>> From: "Jeff Hammond" <jeff.science at gmail.com>
>> To: "MPI WG Remote Memory Access working group" <mpiwg-rma at lists.mpi-forum.org>
>> Sent: Monday, August 11, 2014 4:09:19 PM
>> Subject: Re: [mpiwg-rma] Single RMA synchronization for several	window	handles
>> 
>> If function call overhead matters, MPI is probably the wrong model.
>> 
>> You'll have to prove to me that this (function call overhead) is
>> significant compared to the cost of synchronization using empirical
>> data on some system.
>> 
>> Jeff
>> 
>> Sent from my iPhone
>> 
>>> On Aug 11, 2014, at 1:49 AM, Rolf Rabenseifner
>>> <rabenseifner at hlrs.de> wrote:
>>> 
>>> Jim and all,
>>> 
>>> It is not syntactic sugar.
>>> It is a latency optimizing enhancement:
>>> 
>>> If you are doing neighbor communication with 1-sided
>>> communication to 6 neighbors - based on 6 windows that
>>> are all defined in MPI_COMM_WORLD, then currently
>>> you need to call 6 times the synchronization calls
>>> which implies at maximum a 6 times larger latency,
>>> e.g., 6 times MPI_Win_fence instead of one call.
>>> 
>>> If you are using MPI on shared memory windows,
>>> then the same latency problem exists, but here,
>>> the trick with dynamic windows is impossible
>>> because shared memory windows must be allocated
>>> with MPI_Win_allocate_shared.
>>> 
>>> Of course, it is more complicated to define
>>> MPI_Win_combine on a superset of communicators used
>>> for all the combined windows.
>>> 
>>> As a first effort,  it would be already helpful
>>> to define it for the same communicator.
>>> To allow enhancements in a future MPI version,
>>> I still would recommend to have the comm argument
>>> as part of the argument list.
>>> 
>>> Rolf
>>> 
>>> ----- Original Message -----
>>>> From: "Jim Dinan" <james.dinan at gmail.com>
>>>> To: "MPI WG Remote Memory Access working group"
>>>> <mpiwg-rma at lists.mpi-forum.org>
>>>> Sent: Sunday, August 10, 2014 5:48:02 PM
>>>> Subject: Re: [mpiwg-rma] Single RMA synchronization for several
>>>> window        handles
>>>> 
>>>> 
>>>> 
>>>> Hi Rolf,
>>>> 
>>>> 
>>>> I had initially proposed this in the context of passive target
>>>> RMA.
>>>> Active target RMA already notifies the receiver when data has
>>>> arrived, but there is no efficient way to get such a notification
>>>> in
>>>> passive target RMA.  I think what you are proposing here would be
>>>> syntactic sugar on top of the existing interface --- could I
>>>> implement this by fencing every window to determine that all
>>>> transfers are completed?
>>>> 
>>>> 
>>>> I agree with the comments regarding dynamic windows.  The merged
>>>> window would contain discontiguous buffers; thus, it would lose
>>>> the
>>>> ability to do offset-based addressing and would need to use
>>>> absolute
>>>> (BOTTOM-based) addressing.
>>>> 
>>>> 
>>>> ~Jim.
>>>> 
>>>> 
>>>> 
>>>> On Fri, Aug 8, 2014 at 10:18 AM, Rolf Rabenseifner <
>>>> rabenseifner at hlrs.de > wrote:
>>>> 
>>>> 
>>>> Jim,
>>>> 
>>>> your topic "Reducing Synchronization Overhead Through Bundled
>>>> Communication" may get also help if we would be able to
>>>> combine several window handles to one superset window handle.
>>>> 
>>>> If you have several windows for different buffers, but
>>>> only one synchronization pattern, e.g. MPI_Win_fince
>>>> then currently you must call MPI_Win_fence seperately
>>>> for each window handle.
>>>> 
>>>> I would propose:
>>>> 
>>>> MPI_Win_combine (/*IN*/  int count,
>>>>                /*IN*/  MPI_Win *win,
>>>>                /*IN*/  MPI_Comm comm,
>>>>                /*OUT*/ MPI_Win *win_combined)
>>>> 
>>>> The process group of comm must contain the process groups of all
>>>> win.
>>>> The resulting window handle win_combined can be used only
>>>> in RMA synchronization calls and other helper routines,
>>>> but not for dynamic window allocation nor for any
>>>> RMA communication routine.
>>>> Collective synchronization routines must be called by all
>>>> processes
>>>> of comm.
>>>> The semantics of an RMA synchronization call using win_combined
>>>> is defined as if the calls were seperately issued for
>>>> each window handle of the array win. If group handles
>>>> are part of the argument list of the synchronization call
>>>> then the appropriate subset is used for each window handle in win.
>>>> 
>>>> What do you think about this idea for MPI-4.0?
>>>> 
>>>> Best regards
>>>> Rolf
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Jim Dinan" < james.dinan at gmail.com >
>>>>> To: "MPI WG Remote Memory Access working group" <
>>>>> mpiwg-rma at lists.mpi-forum.org >
>>>>> Sent: Thursday, August 7, 2014 4:08:32 PM
>>>>> Subject: [mpiwg-rma] RMA Notification
>>>>> 
>>>>> 
>>>>> 
>>>>> Hi All,
>>>>> 
>>>>> 
>>>>> I have added a new proposal for an RMA notification extension:
>>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/439
>>>>> 
>>>>> 
>>>>> I would like to bring this forward for the RMA WG to consider as
>>>>> an
>>>>> MPI-4 extension.
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> ~Jim.
>>>>> _______________________________________________
>>>>> mpiwg-rma mailing list
>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> 
>>>> --
>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>> rabenseifner at hlrs.de
>>>> High Performance Computing Center (HLRS) . phone
>>>> ++49(0)711/685-65530
>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>>> 685-65832
>>>> Head of Dpmt Parallel Computing . . .
>>>> www.hlrs.de/people/rabenseifner
>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
>>>> 1.307)
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>> 
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>> 
>>> --
>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>> rabenseifner at hlrs.de
>>> High Performance Computing Center (HLRS) . phone
>>> ++49(0)711/685-65530
>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
>>> 685-65832
>>> Head of Dpmt Parallel Computing . . .
>>> www.hlrs.de/people/rabenseifner
>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
>>> 1.307)
>>> _______________________________________________
>>> mpiwg-rma mailing list
>>> mpiwg-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> 
> 
> -- 
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma