[mpiwg-rma] Single RMA synchronization for several window handles

Jeff Hammond jeff.science at gmail.com
Mon Aug 11 15:22:00 CDT 2014


Does the MPI standard require fence to do a barrier?  If not, then my
argument still stands.  The implementation only has to sync once per
remote target across N fences and the rest are nothing more than
function call overhead.

If we want to optimize for functional call overhead, let's consider
MPI_Win_fencev, which makes way more sense than creating
franken-windows with this combiner function.

Jeff

On Mon, Aug 11, 2014 at 7:29 AM, Jim Dinan <james.dinan at gmail.com> wrote:
> It's more than function call overhead.  Fence usually includes a barrier, so
> this is one barrier versus several.
>
>
> On Mon, Aug 11, 2014 at 10:09 AM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>>
>> If function call overhead matters, MPI is probably the wrong model.
>>
>> You'll have to prove to me that this (function call overhead) is
>> significant compared to the cost of synchronization using empirical data on
>> some system.
>>
>> Jeff
>>
>> Sent from my iPhone
>>
>> > On Aug 11, 2014, at 1:49 AM, Rolf Rabenseifner <rabenseifner at hlrs.de>
>> > wrote:
>> >
>> > Jim and all,
>> >
>> > It is not syntactic sugar.
>> > It is a latency optimizing enhancement:
>> >
>> > If you are doing neighbor communication with 1-sided
>> > communication to 6 neighbors - based on 6 windows that
>> > are all defined in MPI_COMM_WORLD, then currently
>> > you need to call 6 times the synchronization calls
>> > which implies at maximum a 6 times larger latency,
>> > e.g., 6 times MPI_Win_fence instead of one call.
>> >
>> > If you are using MPI on shared memory windows,
>> > then the same latency problem exists, but here,
>> > the trick with dynamic windows is impossible
>> > because shared memory windows must be allocated
>> > with MPI_Win_allocate_shared.
>> >
>> > Of course, it is more complicated to define
>> > MPI_Win_combine on a superset of communicators used
>> > for all the combined windows.
>> >
>> > As a first effort,  it would be already helpful
>> > to define it for the same communicator.
>> > To allow enhancements in a future MPI version,
>> > I still would recommend to have the comm argument
>> > as part of the argument list.
>> >
>> > Rolf
>> >
>> > ----- Original Message -----
>> >> From: "Jim Dinan" <james.dinan at gmail.com>
>> >> To: "MPI WG Remote Memory Access working group"
>> >> <mpiwg-rma at lists.mpi-forum.org>
>> >> Sent: Sunday, August 10, 2014 5:48:02 PM
>> >> Subject: Re: [mpiwg-rma] Single RMA synchronization for several window
>> >> handles
>> >>
>> >>
>> >>
>> >> Hi Rolf,
>> >>
>> >>
>> >> I had initially proposed this in the context of passive target RMA.
>> >> Active target RMA already notifies the receiver when data has
>> >> arrived, but there is no efficient way to get such a notification in
>> >> passive target RMA.  I think what you are proposing here would be
>> >> syntactic sugar on top of the existing interface --- could I
>> >> implement this by fencing every window to determine that all
>> >> transfers are completed?
>> >>
>> >>
>> >> I agree with the comments regarding dynamic windows.  The merged
>> >> window would contain discontiguous buffers; thus, it would lose the
>> >> ability to do offset-based addressing and would need to use absolute
>> >> (BOTTOM-based) addressing.
>> >>
>> >>
>> >> ~Jim.
>> >>
>> >>
>> >>
>> >> On Fri, Aug 8, 2014 at 10:18 AM, Rolf Rabenseifner <
>> >> rabenseifner at hlrs.de > wrote:
>> >>
>> >>
>> >> Jim,
>> >>
>> >> your topic "Reducing Synchronization Overhead Through Bundled
>> >> Communication" may get also help if we would be able to
>> >> combine several window handles to one superset window handle.
>> >>
>> >> If you have several windows for different buffers, but
>> >> only one synchronization pattern, e.g. MPI_Win_fince
>> >> then currently you must call MPI_Win_fence seperately
>> >> for each window handle.
>> >>
>> >> I would propose:
>> >>
>> >> MPI_Win_combine (/*IN*/  int count,
>> >>                 /*IN*/  MPI_Win *win,
>> >>                 /*IN*/  MPI_Comm comm,
>> >>                 /*OUT*/ MPI_Win *win_combined)
>> >>
>> >> The process group of comm must contain the process groups of all win.
>> >> The resulting window handle win_combined can be used only
>> >> in RMA synchronization calls and other helper routines,
>> >> but not for dynamic window allocation nor for any
>> >> RMA communication routine.
>> >> Collective synchronization routines must be called by all processes
>> >> of comm.
>> >> The semantics of an RMA synchronization call using win_combined
>> >> is defined as if the calls were seperately issued for
>> >> each window handle of the array win. If group handles
>> >> are part of the argument list of the synchronization call
>> >> then the appropriate subset is used for each window handle in win.
>> >>
>> >> What do you think about this idea for MPI-4.0?
>> >>
>> >> Best regards
>> >> Rolf
>> >>
>> >> ----- Original Message -----
>> >>> From: "Jim Dinan" < james.dinan at gmail.com >
>> >>> To: "MPI WG Remote Memory Access working group" <
>> >>> mpiwg-rma at lists.mpi-forum.org >
>> >>> Sent: Thursday, August 7, 2014 4:08:32 PM
>> >>> Subject: [mpiwg-rma] RMA Notification
>> >>>
>> >>>
>> >>>
>> >>> Hi All,
>> >>>
>> >>>
>> >>> I have added a new proposal for an RMA notification extension:
>> >>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/439
>> >>>
>> >>>
>> >>> I would like to bring this forward for the RMA WG to consider as an
>> >>> MPI-4 extension.
>> >>>
>> >>>
>> >>> Cheers,
>> >>> ~Jim.
>> >>> _______________________________________________
>> >>> mpiwg-rma mailing list
>> >>> mpiwg-rma at lists.mpi-forum.org
>> >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> >>
>> >> --
>> >> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>> >> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>> >> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>> >> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>> >> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>> >> _______________________________________________
>> >> mpiwg-rma mailing list
>> >> mpiwg-rma at lists.mpi-forum.org
>> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> >>
>> >> _______________________________________________
>> >> mpiwg-rma mailing list
>> >> mpiwg-rma at lists.mpi-forum.org
>> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> >
>> > --
>> > Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
>> > High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>> > Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>> > _______________________________________________
>> > mpiwg-rma mailing list
>> > mpiwg-rma at lists.mpi-forum.org
>> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>
>
>
> _______________________________________________
> mpiwg-rma mailing list
> mpiwg-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/



More information about the mpiwg-rma mailing list