[mpiwg-hybridpm] Questions about MPI CUDA stream integration
jeff.science at gmail.com
Thu Dec 17 15:31:54 CST 2020
I don't think it's reasonable to ask anybody to reason about the
interaction of MPI with a programming model that has no specification.
Furthermore, I don't think it's fair to ask anyone to dig into a language
that may be encumbered by patents and which has a restrictive,
I would be happy to discuss ISO C++20, C20 and Fortran 2008, Khronos
OpenCL, SYCL and Vulkan, OpenMP or OpenACC here, because those have
specifications and permit implementations on any hardware.
On Thu, Dec 17, 2020 at 2:34 AM Joseph Schuchart via mpiwg-hybridpm <
mpiwg-hybridpm at lists.mpi-forum.org> wrote:
> Jim, all,
> Thanks for your presentation yesterday (and last time). I had a bunch of
> questions but held back in hopes that we could go through the rest of
> the slides. Maybe it's better to have this discussion on the mailing
> list and save the precious hour in the WG meeting. In essence, my points
> boil down to these two:
> 1) I wonder what the benefit is of integrating stream support into MPI
> libraries over accelerator vendors providing their specific APIs on top
> of what MPI already offers? My understanding from the CUDA graph API is
> that you can add a host node that is a callback executed on the CPU.
> That is what I imagine the MPI library would use and it is what a
> third-party library could do as well, right? Otherwise, what is missing
> from the MPI API?
> 2) The CUDA stream and graph APIs seem very similar to task dependencies
> in OpenMP, with the same complications when combined with MPI. I think
> Martin hinted at this last night: MPI adds dependencies between nodes in
> one or more graphs that are not exposed to the CUDA scheduler, which
> opens the door for deadlocks. I think we should strive to do better. In
> OpenMP, detached tasks (and the events used to complete them) provide a
> user-controlled completion mechanism. This may be a model that could be
> picked up by the accelerator APIs. Joachim and I have shown that this
> model is easily coupled with callback-based completion notification in
> MPI :) So maybe the burden does not have to be all on MPI here...
> The discussion last night got held up by many questions, partly because
> two concepts were mixed: device-side communication initiation (which I
> think is very interesting) and stream integration (which I am less
> convinced of). It might be worth splitting the two aspects into separate
> discussions since you can have one without the other and it might make
> it easier for people to follow along.
> Since the problem of passing va-args through PMPI came up last night:
> one way to deal with it would be to provide MPI_Wait_enqueuev(reqs,
> type, va_args) to allow PMPI wrappers to inspect the va-args and pass
> them on to MPI. This is the model that printf/vprintf and friends are
> using. I'm not sure whether that works for Fortran though...
> I hope this is a good place to discuss these topics so everyone feel
> free to comment. Otherwise just take it as input for the next WG meeting :)
> Dr-Ing. Joseph Schuchart
> High Performance Computing Center Stuttgart (HLRS)
> Nobelstr. 19
> D-70569 Stuttgart
> Tel.: +49(0)711-68565890
> Fax: +49(0)711-6856832
> E-Mail: schuchart at hlrs.de
> mpiwg-hybridpm mailing list
> mpiwg-hybridpm at lists.mpi-forum.org
jeff.science at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-hybridpm