<div dir="ltr">Jeff,<div><br></div><div>Thanks for your explanation. I am trying to find out the motivations for the proposal. The archives for mpi3-hybridpm were a little sparse ;-)</div>
<div><br></div><div>I agree that there are cases where you would just like to use 1 MPI process per node with a useful threading runtime within the node. I was suggesting that in this case, it might be possible to use the MPI Shared Memory RMA interface in MPI-3 (not to use posix shm and use hybrid process/thread queues). Would you say that it doesn't satisfy your use case? If not, why not? After all, WE designed that API.</div>
<div class="gmail_extra"><br></div><div class="gmail_extra" style>There are several approaches towards hybrid programming. I am trying to understand how we have jumped to the conclusion that endpoints are the answer and are supposed to discuss the API. I don't see this discussion in the WG email list.</div>
<div class="gmail_extra" style><br></div><div class="gmail_extra" style>Thanks,</div><div class="gmail_extra" style>Sayantan.</div><div class="gmail_extra"><br><div class="gmail_quote">On Tue, Mar 19, 2013 at 5:03 PM, Jeff Hammond <span dir="ltr"><<a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div>On Tue, Mar 19, 2013 at 5:51 PM, Sur, Sayantan <<a href="mailto:sayantan.sur@intel.com" target="_blank">sayantan.sur@intel.com</a>> wrote:<br>
>> > Just as an example: Suppose there is an MPI+OpenMP app that runs on 16<br>
>> cores with 1 MPI rank and 16 threads. On certain platform you find out if<br>
>> there are two endpoints you get better network utilization. In this case, can<br>
>> you not just run 2 MPI ranks with 8 threads each? How is this not achieve the<br>
>> same effect as your endpoint proposal?<br>
>><br>
>> Most apps run best with MPI only until they run out of memory.<br>
><br>
> Yes, and folks that run out of memory (MPI only) would use threading to reduce some of the memory consumption.<br>
><br>
> Adding endpoints that behave like ranks would not help the memory case.<br>
<br>
</div>This is not the point at all. Let me just assert that there are apps<br>
that want to use 1 MPI process per node. The MPI Forum should try to<br>
enable these users to max out their networks. If one endpoint isn't<br>
enough, then telling users to use more MPI processes per node is as<br>
stupid a response as telling them to buy more DRAM. The right<br>
solution is to enable better comm perf via endpoints, provided we can<br>
identify a path forward in that respect.<br>
<div><br>
>> Your<br>
>> argument can and often does lead back to MPI-only if applied inductively.<br>
><br>
> Folks can always adjust the balance of MPI ranks-to-threads to get to a point where adding more processes does not increase network-related performance and achieves the memory balance that you mention above.<br>
<br>
</div>The whole point is that there are apps that MUST run with 1 MPI<br>
process per node and therefore arguing about the procs-to-thread<br>
balance is completely useless. Some of us recognize that load-store<br>
is a damn efficient way to communicate within a shared memory domain<br>
and have apps that use OpenMP, Pthreads, TBB, etc. for task- and/or<br>
data-parallelism within shared memory domains. Are you suggesting<br>
that we try to create hybrid process-thread queues and annihilate our<br>
existing software to put everything in POSIX shm just to get more<br>
network concurrency within a node?<br>
<div><br>
>> It's really not an intellectually stimulating example to discuss.<br>
>><br>
> I am happy to look at other concrete examples that show the benefit of endpoints.<br>
<br>
</div>MADNESS uses 1 MPI process per node and a TBB-like (we are moving to<br>
actual TBB right now) thread runtime. We are memory limited in some<br>
cases. We completely reject your solution of using >1 MPI process per<br>
node. Threads are mostly autonomous and would benefit from endpoints<br>
so that they can issue remote futures with affinity to data, for<br>
example.<br>
<br>
I don't understand why you're fighting so hard about this. Just take<br>
as fact that some apps need 1 MPI process per node and use threads<br>
within the node. Given this definition of a problem, try to explore<br>
the solution space that enables maximum utilization of the<br>
interconnect, which might be multi-rail, multi-adapter, multi-link<br>
etc. Multi-rail IB and Blue Gene/Q are good examples of existing tech<br>
where 1 MPI process per node might not be able to saturate the<br>
network.<br>
<span><font color="#888888"><br>
Jeff<br>
</font></span><div><div><br>
>> Jeff Hammond<br>
>> Argonne Leadership Computing Facility<br>
>> University of Chicago Computation Institute <a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a> / (630)<br>
>> 252-5381 <a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/jeffhammond</a><br>
>> <a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond</a><br>
>><br>
>> _______________________________________________<br>
>> Mpi3-hybridpm mailing list<br>
>> <a href="mailto:Mpi3-hybridpm@lists.mpi-forum.org" target="_blank">Mpi3-hybridpm@lists.mpi-forum.org</a><br>
>> <a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm</a><br>
><br>
> _______________________________________________<br>
> Mpi3-hybridpm mailing list<br>
> <a href="mailto:Mpi3-hybridpm@lists.mpi-forum.org" target="_blank">Mpi3-hybridpm@lists.mpi-forum.org</a><br>
> <a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm</a><br>
<br>
<br>
<br>
--<br>
Jeff Hammond<br>
Argonne Leadership Computing Facility<br>
University of Chicago Computation Institute<br>
<a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a> / <a href="tel:%28630%29%20252-5381" value="+16302525381" target="_blank">(630) 252-5381</a><br>
<a href="http://www.linkedin.com/in/jeffhammond" target="_blank">http://www.linkedin.com/in/jeffhammond</a><br>
<a href="https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond" target="_blank">https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond</a><br>
_______________________________________________<br>
Mpi3-hybridpm mailing list<br>
<a href="mailto:Mpi3-hybridpm@lists.mpi-forum.org" target="_blank">Mpi3-hybridpm@lists.mpi-forum.org</a><br>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm</a><br>
</div></div></blockquote></div><br></div></div>