[Mpi3-rma] RMA proposal 1 update

Tue May 18 10:44:19 CDT 2010

If each origin (in a fence epoch) keeps track of the count(s) of RMA
operations to each of its targets, then an allreduce of those arrays will
tell each target how many operations were done to itself and can be used to
determine completion.

_______________________________________________
Douglas Miller                  BlueGene Messaging Development
IBM Corp., Rochester, MN USA                     Bldg 030-2 A410
dougmill at us.ibm.com               Douglas Miller/Rochester/IBM

             "Underwood, Keith                                             
             D"                                                            
             <keith.d.underwoo                                          To 
             d at intel.com>              "MPI 3.0 Remote Memory Access       
             Sent by:                  working group"                      
             mpi3-rma-bounces@         <mpi3-rma at lists.mpi-forum.org>      
             lists.mpi-forum.o                                          cc 
             rg                                                            
                                                                   Subject 
                                       Re: [Mpi3-rma] RMA proposal 1       
             05/18/2010 10:23          update                              
             AM                                                            

             Please respond to                                             
              "MPI 3.0 Remote                                              
               Memory Access                                               
              working group"                                               
             <mpi3-rma at lists.m                                             
               pi-forum.org>                                               

Sorry, but you lost me at “we could just do an allreduce to look at
counts”.  Could you go into a bit more detail?  If you have received counts
from all ranks at all ranks (um, that doesn’t seem scalable), then it would
seem that an allfenceall() would require an Alltoall() to figure out if
everybody was safe.  I don’t see how an allreduce would do the job.   But,
I’ll admit that I don’t know really DCMF or BG network interface
architecture or… So, I could just be missing something here.

Thanks,
Keith

From: mpi3-rma-bounces at lists.mpi-forum.org [
mailto:mpi3-rma-bounces at lists.mpi-forum.org] On Behalf Of Brian Smith
Sent: Tuesday, May 18, 2010 4:57 AM
To: MPI 3.0 Remote Memory Access working group
Cc: MPI 3.0 Remote Memory Access working group;
mpi3-rma-bounces at lists.mpi-forum.org
Subject: Re: [Mpi3-rma] RMA proposal 1 update

Sorry for the late response....
On BGP, DCMF Put/Get doesn't do any accounting and DCMF doesn't actually
have a fence operation. There is no hardware to determine when a put/get
has completed either. We need to send a get along the same
(deterministically routed) path to "flush" any messages out to claim we are
synchronized.

When we implemented ARMCI, we introduced accounting in our "glue" on top of
DCMF because of the ARMCI_Fence() operation. There are similar concerns in
the MPI one-sided "glue".

Going forward, we need to figure out how we'd implement the new MPI RMA
operations and determine if there would be accounting required. If there
would be (and I'm thinking there would), then an allfenceall in MPI would
be easy enough to do and would provide a significant benefit on BG. We
could just do an allreduce to look at counts. If the standard procedure is
fenceall()+barrier(), I could do that much better as an allfenceall call.

On platforms that have some sort of native accounting, this allfenceall
would only be the overhead of a barrier. So I think an allfenceall has
significant value to the middleware more than DCMF and therefore would
strongly encourage it in MPI, especially given the use-cases we heard from
Jeff H. at the forum meeting.

This scenario is the same in our next super-secret product offering
everyone knows about but I don't know if *I* can mention.

Brian Smith (smithbr at us.ibm.com)
BlueGene MPI Development/
Communications Team Lead
IBM Rochester
Phone: 507 253 4717

 From:   "Underwood, Keith D" <keith.d.underwood at intel.com>                

 To:     "MPI 3.0 Remote Memory Access working group"                      
         <mpi3-rma at lists.mpi-forum.org>                                    

 Date:   05/16/2010 09:33 PM                                               

 Subject Re: [Mpi3-rma] RMA proposal 1 update                              
 :                                                                         

 Sent    mpi3-rma-bounces at lists.mpi-forum.org                              
 by:                                                                       

Before doing that, can someone sketch out the platform/API and the
implementation that makes that more efficient?  There is no gain for
Portals (3 or 4).  There is no gain for anything that supports Cray SHMEM
reasonably well (shmem_quiet() is approximately the same semantics as
MPI_flush_all).  Hrm, you can probably say the same thing about anything
that supports UPC well - a strict access is basically a MPI_flush_all();
MPI_Put(); MPI_flush_all();... Also, I thought somebody said that IB gave
you a notification of remote completion...

The question then turns to the "other networks".  If you can't figure out
remote completion, then the collective is going to be pretty heavy, right?

Keith

> -----Original Message-----
> From: mpi3-rma-bounces at lists.mpi-forum.org [mailto:mpi3-rma-
> bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
> Sent: Sunday, May 16, 2010 7:27 PM
> To: MPI 3.0 Remote Memory Access working group
> Subject: Re: [Mpi3-rma] RMA proposal 1 update
>
> Tortsten,
>
> There seemed to be decent agreement on adding MPI_Win_all_flush_all
> (equivalent to MPI_Win_flush_all called from every rank in the
> communicator associated with the window) since this function can be
> implemented far more efficiently as a collective than the equivalent
> point-wise function calls.
>
> Is there a problem with adding this to your proposal?
>
> Jeff
>
> On Sun, May 16, 2010 at 12:48 AM, Torsten Hoefler <htor at illinois.edu>
> wrote:
> > Hello all,
> >
> > After the discussions at the last Forum I updated the group's first
> > proposal.
> >
> > The proposal (one-side-2.pdf) is attached to the wiki page
> > https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/RmaWikiPage
> >
> > The changes with regards to the last version are:
> >
> > 1) added MPI_NOOP to MPI_Get_accumulate and MPI_Accumulate_get
> >
> > 2) (re)added MPI_Win_flush and MPI_Win_flush_all to passive target
> mode
> >
> > Some remarks:
> >
> > 1) We didn't straw-vote on MPI_Accumulate_get, so this function might
> >   go. The removal would be very clean.
> >
> > 2) Should we allow MPI_NOOP in MPI_Accumulate (this does not make
> sense
> >   and is incorrect in my current proposal)
> >
> > 3) Should we allow MPI_REPLACE in
> MPI_Get_accumulate/MPI_Accumulate_get?
> >   (this would make sense and is allowed in the current proposal but
> we
> >   didn't talk about it in the group)
> >
> >
> > All the Best,
> >  Torsten
> >
> > --
> >  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> > Torsten Hoefler         | Research Associate
> > Blue Waters Directorate | University of Illinois
> > 1205 W Clark Street     | Urbana, IL, 61801
> > NCSA Building           | +01 (217) 244-7736
> > _______________________________________________
> > mpi3-rma mailing list
> > mpi3-rma at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
> >
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> jhammond at mcs.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
>
> _______________________________________________
> mpi3-rma mailing list
> mpi3-rma at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma

_______________________________________________
mpi3-rma mailing list
mpi3-rma at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma
_______________________________________________
mpi3-rma mailing list
mpi3-rma at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma