<div dir="ltr">Jeff,<div><br></div><div>Thanks for your explanation. I am trying to find out the motivations for the proposal. The archives for mpi3-hybridpm were a little sparse ;-)</div>

<div><br></div><div>I agree that there are cases where you would just like to use 1 MPI process per node with a useful threading runtime within the node. I was suggesting that in this case, it might be possible to use the MPI Shared Memory RMA interface in MPI-3 (not to use posix shm and use hybrid process/thread queues). Would you say that it doesn't satisfy your use case? If not, why not? After all, WE designed that API.</div>

<div class="gmail_extra"><br></div><div class="gmail_extra" style>There are several approaches towards hybrid programming. I am trying to understand how we have jumped to the conclusion that endpoints are the answer and are supposed to discuss the API. I don't see this discussion in the WG email list.</div>

<div class="gmail_extra" style><br></div><div class="gmail_extra" style>Thanks,</div><div class="gmail_extra" style>Sayantan.</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 19, 2013 at 5:03 PM, Jeff Hammond <span dir="ltr"><<a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a>></span> wrote:<br>


<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Tue, Mar 19, 2013 at 5:51 PM, Sur, Sayantan <<a href="mailto:sayantan.sur@intel.com" target="_blank">sayantan.sur@intel.com</a>> wrote:<br>


>> > Just as an example: Suppose there is an MPI+OpenMP app that runs on 16<br>

>> cores with 1 MPI rank and 16 threads. On certain platform you find out if<br>

>> there are two endpoints you get better network utilization. In this case, can<br>

>> you not just run 2 MPI ranks with 8 threads each? How is this not achieve the<br>

>> same effect as your endpoint proposal?<br>

>><br>

>> Most apps run best with MPI only until they run out of memory.<br>

><br>

> Yes, and folks that run out of memory (MPI only) would use threading to reduce some of the memory consumption.<br>

><br>

> Adding endpoints that behave like ranks would not help the memory case.<br>

<br>

</div>This is not the point at all.  Let me just assert that there are apps<br>

that want to use 1 MPI process per node.  The MPI Forum should try to<br>

enable these users to max out their networks.  If one endpoint isn't<br>

enough, then telling users to use more MPI processes per node is as<br>

stupid a response as telling them to buy more DRAM.  The right<br>

solution is to enable better comm perf via endpoints, provided we can<br>

identify a path forward in that respect.<br>

<div><br>

>> Your<br>

>> argument can and often does lead back to MPI-only if applied inductively.<br>

><br>

> Folks can always adjust the balance of MPI ranks-to-threads to get to a point where adding more processes does not increase network-related performance and achieves the memory balance that you mention above.<br>

<br>

</div>The whole point is that there are apps that MUST run with 1 MPI<br>

process per node and therefore arguing about the procs-to-thread<br>

balance is completely useless.  Some of us recognize that load-store<br>

is a damn efficient way to communicate within a shared memory domain<br>

and have apps that use OpenMP, Pthreads, TBB, etc. for task- and/or<br>

data-parallelism within shared memory domains.  Are you suggesting<br>

that we try to create hybrid process-thread queues and annihilate our<br>

existing software to put everything in POSIX shm just to get more<br>

network concurrency within a node?<br>

<div><br>

>> It's really not an intellectually stimulating example to discuss.<br>

>><br>

> I am happy to look at other concrete examples that show the benefit of endpoints.<br>

<br>

</div>MADNESS uses 1 MPI process per node and a TBB-like (we are moving to<br>

actual TBB right now) thread runtime.  We are memory limited in some<br>

cases.  We completely reject your solution of using >1 MPI process per<br>

node.  Threads are mostly autonomous and would benefit from endpoints<br>

so that they can issue remote futures with affinity to data, for<br>

example.<br>

<br>

I don't understand why you're fighting so hard about this.  Just take<br>

as fact that some apps need 1 MPI process per node and use threads<br>

within the node.  Given this definition of a problem, try to explore<br>

the solution space that enables maximum utilization of the<br>

interconnect, which might be multi-rail, multi-adapter, multi-link<br>

etc.  Multi-rail IB and Blue Gene/Q are good examples of existing tech<br>

where 1 MPI process per node might not be able to saturate the<br>

network.<br>

<span><font color="#888888"><br>

Jeff<br>

</font></span><div><div><br>

>> Jeff Hammond<br>

>> Argonne Leadership Computing Facility<br>

>> University of Chicago Computation Institute <a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a> / (630)<br>

>> 252-5381 <a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/jeffhammond</a><br>

>> <a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond</a><br>

>><br>

>> _______________________________________________<br>

>> Mpi3-hybridpm mailing list<br>

>> <a href="mailto:Mpi3-hybridpm@lists.mpi-forum.org" target="_blank">Mpi3-hybridpm@lists.mpi-forum.org</a><br>

>> <a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm</a><br>

><br>

> _______________________________________________<br>

> Mpi3-hybridpm mailing list<br>

> <a href="mailto:Mpi3-hybridpm@lists.mpi-forum.org" target="_blank">Mpi3-hybridpm@lists.mpi-forum.org</a><br>

> <a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm</a><br>

<br>

<br>

<br>

--<br>

Jeff Hammond<br>

Argonne Leadership Computing Facility<br>

University of Chicago Computation Institute<br>

<a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a> / <a href="tel:%28630%29%20252-5381" value="+16302525381" target="_blank">(630) 252-5381</a><br>

<a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/jeffhammond</a><br>

<a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond</a><br>

_______________________________________________<br>

Mpi3-hybridpm mailing list<br>

<a href="mailto:Mpi3-hybridpm@lists.mpi-forum.org" target="_blank">Mpi3-hybridpm@lists.mpi-forum.org</a><br>

<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm</a><br>

</div></div></blockquote></div><br></div></div>