<html>

<head>

<style>

.hmmessage P

{

margin:0px;

padding:0px

}

body.hmmessage

{

font-size: 10pt;

font-family:Verdana

}

</style>

</head>

<body class='hmmessage'>

<div><br></div><div>You are correct in trying to look at the best possible case and estimating cache-misses/performance-bottlenecks. However, personally don't see any difference between this and shmem. When you cannot really allocate symmetric memory underneath, the amount of bookkeeping is same in both cases. When there is no heterogeneity, the check for this can be disabled at MPI startup. When there is heterogeneity we cannot compare with shmem.</div><div><br></div>I cannot imagine not having symmetric/collective memory object creation to support these RMA interfaces, I think it is a must-have. Sorry I have only been saying we should have these interfaces but haven't given any example for this yet. Given how many times this same issue is coming up, I will do it now.<div><br></div><div>Consider the creation interfaces:</div><div>Create_memobj(IN user_ptr, IN size, OUT mem_obj)</div><div>Create_memobj_collective(user_ptr, size, OUT mem_obj)</div><div>Assign_memobj(IN/OUT mem_obj, IN user_address, IN size) <div><br></div><div>There will be more details on how a mem object which is a result of create_memobj on process A will be exchanged with process B. When it is exchanged explicitly, the heterogeneity information can be created at process B. </div><div><br></div><div>Now take the example with symmetric object:</div><div><br></div><div>Process A                                                                       </div><div><br></div><div>myptr = allocate(mysize);</div><div>Create_memobj_collective(myptr,mysize, all_obj);<br></div><div>Do all kinds of RMA_Xfers<br></div><div><br></div><div>and an example without symmetric object:</div><div><br></div><div>myptr = allocate(mysize);</div><div>Create_memobj(myptr,mysize,my_obj);</div><div> ----exchange objects here----</div><div>do all kinds of RAM_Xfers</div><div><div><br></div><div>In both cases, I can see being able to communicate without any cache misses for mem_obj.</div><div><br><span style="font-size:10pt;font-family:'Verdana','sans-serif'">Vinod Tipparaju ^ http://ft.ornl.gov/~vinod ^ 1-865-241-1802</span><br><br><br><br><hr id="stopSpelling">From: keith.d.underwood@intel.com<br>To: mpi3-rma@lists.mpi-forum.org<br>Date: Tue, 1 Sep 2009 09:07:41 -0600<br>Subject: Re: [Mpi3-rma] MPI3 RMA Design Goals<br><br>


<style>

.ExternalClass p.EC_MsoNormal, .ExternalClass li.EC_MsoNormal, .ExternalClass div.EC_MsoNormal

{margin-bottom:.0001pt;font-size:12.0pt;font-family:'Times New Roman','serif';}

.ExternalClass a:link, .ExternalClass span.EC_MsoHyperlink

{color:blue;text-decoration:underline;}

.ExternalClass a:visited, .ExternalClass span.EC_MsoHyperlinkFollowed

{color:purple;text-decoration:underline;}

.ExternalClass p

{margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:'Times New Roman','serif';}

.ExternalClass span.EC_EmailStyle18

{font-family:'Calibri','sans-serif';color:#1F497D;}

.ExternalClass .EC_MsoChpDefault

{font-size:10.0pt;}

@page Section1

{size:8.5in 11.0in;}

.ExternalClass div.EC_Section1

{page:Section1;}

</style>


<div class="EC_Section1">


<p class="EC_MsoNormal"><span style="font-size:11.0pt;font-family:'Calibri','sans-serif';color:#1F497D">If we take the SINGLE_RMA_INTERFACE_DRAFT_PROPOSAL as an example,

and combine it with the draft design goal #1: </span>In order to support RMA to

arbitrary locations, no constraints on memory, such as symmetric allocation or

collective window creation, can be required</p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">We get an interesting view on how difficult it can be to get

“close to the metal”.  So, for MPI_RMA_xfer, we have to assume

that the user has some array of target_mem data items.  That means the

sequence of steps in user space is:</p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">target_mem = ranks[dest];</p>


<p class="EC_MsoNormal">MPI_RMA_xfer(… target_mem, dest…);</p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">If we assume that the message sizes are small and the

destinations randomly selected and the machine is large… every access to

ranks is a cache miss, and we cannot prevent that by providing fancy

hardware.  This actually leads me to believe that we may need to

reconsider design goal #1, or at least clarify what it means in a way that makes

the access more efficient.</p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">MPI_RMA_xfer itself is no picnic either.  If we take

the draft design goal #5: The RMA model must support non-cache-coherent and

heterogeneous environments, then MPI is required to maintain a data structure

for every rank (ok, it has to do this anyway, but we are trying to get close to

the metal) and do a lookup into that data structure with every MPI_RMA_xfer to

find out if the target is heterogeneous relative to the target rank – another

cache miss.  Now, nominally, since this is inside MPI, a lower layer could

absorb that check… or, a given MPI could refuse to support heterogeneity

or… but, you get the idea.  </p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">So, we’ve got two cache line loads for every

transfer.  One in the application and one in the MPI library.  One is

impossible to move to the hardware and the other is simply very difficult to

move.  </p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">For a contrast, look at SHMEM.  Assume homogeneous,

only one communicator context, and hardware mapping of ranks to physical

locations.  A shmem_put() of a short item could literally be turned into a

few instructions and a processor store (on machines that supported such things). 

Personally, I think we will have done well if we can get to the point that a

reasonable hardware implementation can get MPI RMA to within 2x of a reasonable

SHMEM implementation.  I think we have a long way to go to get there.</p>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal">Keith</p>


<p class="EC_MsoNormal"><span style="font-size:11.0pt;font-family:'Calibri','sans-serif';color:#1F497D"> </span></p>


<p class="EC_MsoNormal"><span style="font-size:11.0pt;font-family:'Calibri','sans-serif';color:#1F497D"> </span></p>


<div style="border:none;border-left:solid blue 1.5pt;padding:0in 0in 0in 4.0pt">


<div>


<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">


<p class="EC_MsoNormal"><b><span style="font-size:10.0pt;font-family:'Tahoma','sans-serif'">From:</span></b><span style="font-size:10.0pt;font-family:'Tahoma','sans-serif'"> mpi3-rma-bounces@lists.mpi-forum.org

[mailto:mpi3-rma-bounces@lists.mpi-forum.org] <b>On Behalf Of </b>Vinod

tipparaju<br>

<b>Sent:</b> Tuesday, September 01, 2009 5:23 AM<br>

<b>To:</b> MPI 3.0 Remote Memory Access working group<br>

<b>Subject:</b> Re: [Mpi3-rma] MPI3 RMA Design Goals</span></p>


</div>


</div>


<p class="EC_MsoNormal"> </p>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'">Good

points! RMA interfaces should do nothing to prevent utilizing a high message

rate (or low overhead communication) that the underlying hardware may offer. To

ensure this happens, there should always be a unrestricted path (lets call it

this for now, people have called it a "thin layer", "direct

access") to the network. </span></p>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'"> </span></p>


</div>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'">This

means, despite the fact the the RMA interface has features that abstract out

complexity by providing useful characteristics such as ordering and atomicity,

it (the RMA interface) should always have this unrestricted, close to the heart

of the hardware, path. To achieve this, the unrestricted path should not

require any book keeping (from implementation perspective) in relation to the

feature-rich path or vice-versa.  </span></p>


</div>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'"> </span></p>


</div>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'">I

believe this is what we have demonstrated with the example interfaces hence the

null set isn't the case here :-). I will distribute an example implementation

very soon so people can get a feel.</span></p>


</div>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'"> </span></p>


</div>


<div>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'">---</span></p>


</div>


<div>


<p class="EC_MsoNormal"><span style="font-size:10.0pt;font-family:'Verdana','sans-serif'">Vinod

Tipparaju ^ http://ft.ornl.gov/~vinod ^ 1-865-241-1802<br>

<br>

<br>

<br>

> From: keith.d.underwood@intel.com<br>

> To: mpi3-rma@lists.mpi-forum.org<br>

> Date: Mon, 31 Aug 2009 16:17:28 -0600<br>

> Subject: Re: [Mpi3-rma] MPI3 RMA Design Goals<br>

> <br>

> There has been stunning silence since this email, so I will go ahead and

toss out a thought...<br>

> <br>

> In the draft design goals, I don't see two issues that I see as key. The

first is "support for high message rate/low overhead communications to

random targets". As best I can tell, this is one of the key places were

the existing one-sided operations are perceived as falling down for existing

customers of SHMEM/PGAS. The second is "elimination of the access epoch

requirement". This one may be, um, more controversial, but I believe it is

part and parcel with the first one. That is, the first one is not that valuable

if the programming model requires an excessive amount of access epoch opens and

closes just to force the global visibility of the operations. Unfortunately,

the intersection of this solution space with the solution space for the current

draft design goal #5 (support non-cache-coherent and heterogeneous

environments) may be the null set... I will hold out hope that this isn't the

case ;-)<br>

> <br>

> Keith <br>

> <br>

> > -----Original Message-----<br>

> > From: mpi3-rma-bounces@lists.mpi-forum.org [mailto:mpi3-rma-<br>

> > bounces@lists.mpi-forum.org] On Behalf Of William Gropp<br>

> > Sent: Wednesday, August 05, 2009 12:37 PM<br>

> > To: mpi3-rma@lists.mpi-forum.org<br>

> > Subject: [Mpi3-rma] MPI3 RMA Design Goals<br>

> > <br>

> > I've added versions of the RMA design goals that we discussed at the<br>

> > Forum meeting last week to the wiki page for our group (<br>

> > https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/RmaWikiPage<br>

> > ). This is a draft; lets discuss these. Also, feel free to add to<br>

> > the discussion, particularly in the background section.<br>

> > <br>

> > Bill<br>

> > <br>

> > William Gropp<br>

> > Deputy Director for Research<br>

> > Institute for Advanced Computing Applications and Technologies<br>

> > Paul and Cynthia Saylor Professor of Computer Science<br>

> > University of Illinois Urbana-Champaign<br>

> > <br>

> > <br>

> > <br>

> > <br>

> > _______________________________________________<br>

> > mpi3-rma mailing list<br>

> > mpi3-rma@lists.mpi-forum.org<br>

> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma<br>

> <br>

> _______________________________________________<br>

> mpi3-rma mailing list<br>

> mpi3-rma@lists.mpi-forum.org<br>

> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-rma</span></p>


</div>


</div>


</div>


</div></div></div></div></body>

</html>