[Mpi3-hybridpm] First cut at slides

Martin Schulz schulz6 at llnl.gov
Sun Jun 23 14:21:32 CDT 2013

Hi Jeff, all,

Just catching up with this thread now - looking over the slides, I had a few questions/comments:

Slide 3: why does this eliminates the issues of calling before Init? Wouldn't the user still have to call Init and an MPI call at a point when the init refcount is 0 would call an error?

Slide 6: what do you mean with actually finalize? Will a user see a difference (unless he/she keeps track of how many times this was called, i.e., has a separate ref count)?

Slide 7: I am not sure if the term epoch really helps (we had the same situation in MPI_T and we just talked about MPI_T being initialized by saying more init calls than finalize calls). The reason why I am saying this is a user cannot and should not think of epochs. After a part of a program that does its own initialization (e.g., a library) calls finalize, this component should not ever make any MPI calls until it calls init again, independent of whether we are in a different epoch or not. Epoch should only make sense as implementation details in some implementations.

Slide 8: why not deprecate and later remove these functions? Nobody should have a need to check - if a component wants to call MPI, it can initialize it. We had the same discussion in MPI_T and never found a good use for these functions (that's why they are not there). I know the semantics are slightly different there, but I just not sure we should expose the epoch concept or the ability to test for it to users.

Having said that, though, it may worthwhile to expose the ref count through a mandatory control variable in MPI_T for tools.

Slide 9: I don't think this restrictions makes any sense anymore - a component would have to check whether it is the main thread. If yes, its own called must be delayed until the no-op finalizes are called (which is impossible since the component will have shutdown at that point) or if not, it has to check whether it is the one that drops the ref count to zero. If so, it is not allowed to call finalize, which again is problematic because it can't call it later. Do we still need the notion of a main thread (I realize it may/will impact implementations, but from the perspective of creating a clean and consistent standard)?

Slide 14: Is there any guarantee that the size of COMM_WORLD stays the same? If processes drop out between epochs, COMM_WORLD would be smaller, right? Also, if you spawn processes, finalize, and reinitialize on all processes, do the new ones become part of COMM_WORLD? Or should all intracommunicators be recreated?


On Jun 18, 2013, at 11:00 AM, Jeff Squyres (jsquyres) <jsquyres at cisco.com> wrote:

> On Jun 18, 2013, at 11:25 AM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:
>>> But maybe that isn't disastrous.  If you don't want that (potential)
>>> behavior, you should COMM_DISCONNECT before finalize.
> Technically speaking, no.  FINALIZE is collective over all connected processes (where "connected" includes those who have not yet DISCONNECTed).  
> In one sense, everyone disconnects because FINALIZE completes.  But technically, none of them explicitly called DISCONNECT.
>> Could we add that behavior without
>> breaking existing semantics?
> Getting more specific on FINALIZE semantics is a deep, dark, rat hole with dragons and other terrible monsters in it...
> -- 
> Jeff Squyres
> jsquyres at cisco.com
> For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm

Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
CASC @ Lawrence Livermore National Laboratory, Livermore, USA

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-hybridpm/attachments/20130623/8029a3de/attachment-0001.html>

More information about the mpiwg-hybridpm mailing list