<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class="">Hi Jim,<div class=""><br class=""></div><div class="">The CPU callback operation you describe seems to be only one-way (GPU notifying/triggering CPU), but a reverse mechanism would be needed to complete the pattern, as discussed on the call?</div><div class=""><br class=""></div><div class="">If the cuStreamWaitValue64 operation works like the “wait" in Stephen’s slides, i.e. does not actually sit and wait (blocking a whole thread block), but makes the FIFO stream not runnable until the wait condition is satisfied, then that looks promising for a return path. Your hint "memops are processed from within the GPU control processor that manages stream execution” suggests this is true. Doesn’t that provide all that is needed to get this working?</div><div class="">The GPU has a way to signal to the CPU that the send data buffer is ready to be sent.</div><div class="">The CPU has a way to signal to the GPU that the send data buffer can be overwritten.</div><div class=""><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">The CPU has a way to signal to the GPU that the recv data buffer is ready to be consumed.</div></div><div class=""><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);" class="">The GPU has a way to signal to the CPU that the recv data buffer can be overwritten.</div></div><div class="">The CPU needs a helper thread to monitor the memory locations targeted by the memops. This could use general requests in MPI and rely on the MPI progress “thread”/mechanism or be a separate/dedicated CPU thread.</div><div class="">The GPU can pack/unpack data; the CPU can use GPUDirect.</div><div class=""><br class=""></div><div class="">Nothing further seems to be necessary for a complete/functional implementation. Future changes — to MPI and/or to GPUs/CUDA — are only needed/desired to reduce/eliminate performance bottlenecks in this pattern.</div><div class=""><br class=""></div><div class="">Am I getting that right?</div><div class=""><div class=""><div class="">

<meta charset="UTF-8" class=""><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; word-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;" class=""><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><br class="Apple-interchange-newline">Cheers,</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Dan.</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">—</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">Dr Daniel Holmes PhD</div>Executive Director<br class="">Chief Technology Officer<br class=""><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">CHI Ltd</div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;"><a href="mailto:danholmes@chi.scot" class="">danholmes@chi.scot</a></div><div style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-size: 12px; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;" class=""><br class=""></div></div><br class="Apple-interchange-newline">

</div>

<div><br class=""><blockquote type="cite" class=""><div class="">On 11 Mar 2021, at 15:07, Jim Dinan via mpiwg-hybridpm <<a href="mailto:mpiwg-hybridpm@lists.mpi-forum.org" class="">mpiwg-hybridpm@lists.mpi-forum.org</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><div dir="ltr" class=""><div class="">Unfortunately, CPU callbacks are not a perfect solution on their own. CUDA does not allow CUDA calls from within CPU callbacks, so for example you would not be able to launch data packing kernels or peer-to-peer copy operations from within the callback. However, you can use CPU callbacks to signal a thread in the MPI runtime to process the operation. Another option in this design space is to use CUDA memops (e.g. cuStreamWriteValue64 or cuStreamWaitValue64) to coordinate between CUDA streams and MPI communication helper threads. Because memops are processed from within the GPU control processor that manages stream execution, I would expect these to have lower overheads than CPU callbacks (although I haven't measured this).</div><div class=""><br class=""></div><div class=""> ~Jim.</div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 10, 2021 at 10:08 PM Junchao Zhang <<a href="mailto:junchao.zhang@gmail.com" class="">junchao.zhang@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">Jim, <div class="">  Thanks for the slides.  In Stephen's presentation today, it seems with existing techniques, i.e, CPU MPI callback nodes in CUDA graphs, one can solve the MPI GPU problem. Is my understanding correct?</div><div class="">  </div><div class="">  Thanks. </div><div class=""><div class=""><div dir="ltr" class=""><div dir="ltr" class="">--Junchao Zhang</div></div></div><br class=""></div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, Mar 10, 2021 at 8:34 PM Jim Dinan via mpiwg-hybridpm <<a href="mailto:mpiwg-hybridpm@lists.mpi-forum.org" target="_blank" class="">mpiwg-hybridpm@lists.mpi-forum.org</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class=""><div dir="ltr" class="">Hi All,<div class=""><br class=""></div><div class="">I've posted Stephen's slides: <a href="https://github.com/mpiwg-hybrid/hybrid-issues/tree/master/slides" target="_blank" class="">https://github.com/mpiwg-hybrid/hybrid-issues/tree/master/slides</a><br class=""></div><div class=""><br class=""></div><div class="">Best,</div><div class=""> ~Jim.</div></div><br class=""><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Mar 8, 2021 at 11:21 AM Jim Dinan <<a href="mailto:james.dinan@gmail.com" target="_blank" class="">james.dinan@gmail.com</a>> wrote:<br class=""></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr" class="">Hi All,<br class=""><div class=""><br class=""></div><div class="">We have an invited speaker this week at the HACC WG:</div><div class=""><br class=""></div><div class="">Topic: CUDA Deep Dive For the MPI Forum HACC WG</div><div class="">When:  Wednesday, March 10 10-11:00am ET</div><div class="">Connection Info: <a href="https://github.com/mpiwg-hybrid/hybrid-issues/wiki" target="_blank" class="">https://github.com/mpiwg-hybrid/hybrid-issues/wiki</a></div><div class=""><br class=""></div><div class="">Speaker: Stephen Jones, NVIDIA</div><div class=""><br class=""></div><div class="">Stephen Jones is one of the architects of CUDA, working on defining the language, the platform, and the hardware that it runs on, to span the needs of parallel programming from high performance computing to artificial intelligence. Prior to his present position, he lead the Simulation & Analytics group at SpaceX, working on large-scale simulation of rocket engines. He has worked in diverse other industries, including networking, CAD/CAM, and scientific computing. He has been a part of CUDA since 2008.<br class=""></div><div class=""><br class=""></div><div class="">Cheers,</div><div class=""> ~Jim.</div><div class=""><br class=""></div><div class="">PS - Apologies for cross posting on the main list. If you would like to continue receiving emails relating to the Hybrid & Accelerator WG, please sign up for the mailing list here: <a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-hybridpm" target="_blank" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-hybridpm</a>.</div></div>

</blockquote></div></div>

_______________________________________________<br class="">

mpiwg-hybridpm mailing list<br class="">

<a href="mailto:mpiwg-hybridpm@lists.mpi-forum.org" target="_blank" class="">mpiwg-hybridpm@lists.mpi-forum.org</a><br class="">

<a href="https://lists.mpi-forum.org/mailman/listinfo/mpiwg-hybridpm" rel="noreferrer" target="_blank" class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-hybridpm</a><br class="">

</blockquote></div>

</blockquote></div></div>

_______________________________________________<br class="">mpiwg-hybridpm mailing list<br class=""><a href="mailto:mpiwg-hybridpm@lists.mpi-forum.org" class="">mpiwg-hybridpm@lists.mpi-forum.org</a><br class="">https://lists.mpi-forum.org/mailman/listinfo/mpiwg-hybridpm<br class=""></div></blockquote></div><br class=""></div></div></body></html>