[Mpi3-rma] RMA proposal 1 update
Jesper Larsson Traff
traff at par.univie.ac.at
Tue May 18 10:52:35 CDT 2010
just the trivial remark that a reduce_scatter_block operation also
does this counting and may be more efficient
Jesper
On Tue, May 18, 2010 at 10:44:19AM -0500, Douglas Miller wrote:
> If each origin (in a fence epoch) keeps track of the count(s) of RMA
> operations to each of its targets, then an allreduce of those arrays will
> tell each target how many operations were done to itself and can be used to
> determine completion.
>
> _______________________________________________
> Douglas Miller BlueGene Messaging Development
> IBM Corp., Rochester, MN USA Bldg 030-2 A410
> dougmill at us.ibm.com Douglas Miller/Rochester/IBM
>
>
>
> "Underwood, Keith
> D"
> <keith.d.underwoo To
> d at intel.com> "MPI 3.0 Remote Memory Access
> Sent by: working group"
> mpi3-rma-bounces@ <mpi3-rma at lists.mpi-forum.org>
> lists.mpi-forum.o cc
> rg
> Subject
> Re: [Mpi3-rma] RMA proposal 1
> 05/18/2010 10:23 update
> AM
>
>
> Please respond to
> "MPI 3.0 Remote
> Memory Access
> working group"
> <mpi3-rma at lists.m
> pi-forum.org>
>
>
>
>
>
>
> Sorry, but you lost me at ?we could just do an allreduce to look at
> counts?. Could you go into a bit more detail? If you have received counts
> from all ranks at all ranks (um, that doesn?t seem scalable), then it would
> seem that an allfenceall() would require an Alltoall() to figure out if
> everybody was safe. I don?t see how an allreduce would do the job. But,
> I?ll admit that I don?t know really DCMF or BG network interface
> architecture or? So, I could just be missing something here.
>
> Thanks,
> Keith
>
> From: mpi3-rma-bounces at lists.mpi-forum.org [
> mailto:mpi3-rma-bounces at lists.mpi-forum.org] On Behalf Of Brian Smith
> Sent: Tuesday, May 18, 2010 4:57 AM
> To: MPI 3.0 Remote Memory Access working group
> Cc: MPI 3.0 Remote Memory Access working group;
> mpi3-rma-bounces at lists.mpi-forum.org
> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>
>
> Sorry for the late response....
> On BGP, DCMF Put/Get doesn't do any accounting and DCMF doesn't actually
> have a fence operation. There is no hardware to determine when a put/get
> has completed either. We need to send a get along the same
> (deterministically routed) path to "flush" any messages out to claim we are
> synchronized.
>
> When we implemented ARMCI, we introduced accounting in our "glue" on top of
> DCMF because of the ARMCI_Fence() operation. There are similar concerns in
> the MPI one-sided "glue".
>
> Going forward, we need to figure out how we'd implement the new MPI RMA
> operations and determine if there would be accounting required. If there
> would be (and I'm thinking there would), then an allfenceall in MPI would
> be easy enough to do and would provide a significant benefit on BG. We
> could just do an allreduce to look at counts. If the standard procedure is
> fenceall()+barrier(), I could do that much better as an allfenceall call.
>
> On platforms that have some sort of native accounting, this allfenceall
> would only be the overhead of a barrier. So I think an allfenceall has
> significant value to the middleware more than DCMF and therefore would
> strongly encourage it in MPI, especially given the use-cases we heard from
> Jeff H. at the forum meeting.
>
> This scenario is the same in our next super-secret product offering
> everyone knows about but I don't know if *I* can mention.
>
>
> Brian Smith (smithbr at us.ibm.com)
> BlueGene MPI Development/
> Communications Team Lead
> IBM Rochester
> Phone: 507 253 4717
>
>
>
>
> From: "Underwood, Keith D" <keith.d.underwood at intel.com>
>
> To: "MPI 3.0 Remote Memory Access working group"
> <mpi3-rma at lists.mpi-forum.org>
>
> Date: 05/16/2010 09:33 PM
>
> Subject Re: [Mpi3-rma] RMA proposal 1 update
> :
>
> Sent mpi3-rma-bounces at lists.mpi-forum.org
> by:
>
>
>
>
>
>
>
>
> Before doing that, can someone sketch out the platform/API and the
> implementation that makes that more efficient? There is no gain for
> Portals (3 or 4). There is no gain for anything that supports Cray SHMEM
> reasonably well (shmem_quiet() is approximately the same semantics as
> MPI_flush_all). Hrm, you can probably say the same thing about anything
> that supports UPC well - a strict access is basically a MPI_flush_all();
> MPI_Put(); MPI_flush_all();... Also, I thought somebody said that IB gave
> you a notification of remote completion...
>
> The question then turns to the "other networks". If you can't figure out
> remote completion, then the collective is going to be pretty heavy, right?
>
> Keith
>
> > -----Original Message-----
> > From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
> > bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
> > Sent: Sunday, May 16, 2010 7:27 PM
> > To: MPI 3.0 Remote Memory Access working group
> > Subject: Re: [Mpi3-rma] RMA proposal 1 update
> >
> > Tortsten,
> >
> > There seemed to be decent agreement on adding MPI_Win_all_flush_all
> > (equivalent to MPI_Win_flush_all called from every rank in the
> > communicator associated with the window) since this function can be
> > implemented far more efficiently as a collective than the equivalent
> > point-wise function calls.
> >
> > Is there a problem with adding this to your proposal?
> >
> > Jeff
> >
> > On Sun, May 16, 2010 at 12:48 AM, Torsten Hoefler <htor at illinois.edu>
> > wrote:
> > > Hello all,
> > >
> > > After the discussions at the last Forum I updated the group's first
> > > proposal.
> > >
> > > The proposal (one-side-2.pdf) is attached to the wiki page
> > > https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/RmaWikiPage
> > >
> > > The changes with regards to the last version are:
> > >
> > > 1) added MPI_NOOP to MPI_Get_accumulate and MPI_Accumulate_get
> > >
> > > 2) (re)added MPI_Win_flush and MPI_Win_flush_all to passive target
> > mode
> > >
> > > Some remarks:
> > >
> > > 1) We didn't straw-vote on MPI_Accumulate_get, so this function might
> > > go. The removal would be very clean.
> > >
> > > 2) Should we allow MPI_NOOP in MPI_Accumulate (this does not make
> > sense
> > > and is incorrect in my current proposal)
> > >
> > > 3) Should we allow MPI_REPLACE in
> > MPI_Get_accumulate/MPI_Accumulate_get?
> > > (this would make sense and is allowed in the current proposal but
> > we
> > > didn't talk about it in the group)
> > >
> > >
> > > All the Best,
> > > Torsten
> > >
> > > --
> > > bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> > > Torsten Hoefler | Research Associate
> > > Blue Waters Directorate | University of Illinois
> > > 1205 W Clark Street | Urbana, IL, 61801
> > > NCSA Building | +01 (217) 244-7736
> > > _______________________________________________
> > > mpi3-rma mailing list
> > > mpi3-rma at lists.mpi-forum.org
> > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> > >
> >
> >
> >
> > --
> > Jeff Hammond
> > Argonne Leadership Computing Facility
> > jhammond at mcs.anl.gov / (630) 252-5381
> > http://www.linkedin.com/in/jeffhammond
> >
> > _______________________________________________
> > mpi3-rma mailing list
> > mpi3-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
More information about the mpiwg-rma
mailing list