[mpiwg-hybridpm] Comments from the INIT/FINALIZE presentation today

Fri Sep 13 12:27:13 CDT 2013

Short version: the proposal went down in flames.

More detail:

Here's some of the specific points that caused a lot of contention:

1. Although it took me a while to understand his comments, Torsten more-or-less immediately identified a major flaw in the proposal (and Bill Gropp agreed): the idea of INIT/FINALIZE only being collective when opening/closing an epoch leads to real problems when combined with the idea of multiple threads (assume THREAD_MULTIPLE for the purpose of this discussion point).

The real problem is that individual threads have the power to alter global process state.

One example: it becomes racy if multiple threads in multiple MPI processes repeatedly invoke the following:

  // In each thread
  for (i = 0; i < LARGE_NUMBER; ++i) {
    MPI_Init(NULL, NULL); 
    MPI_Finalize();
  }

You can't know how many epochs there will be in a given process because it's racy.  And therefore you can't guarantee correctness, because you can/will have a different number of epochs in each process.

Making INIT/FINALIZE always collective solves this issue, but I recall we had other issues with that (I'm too jet lagged/tired to remember what they were offhand).

Regardless, this is a major issue that we'll need to figure out if this proposal is going to go any further.

2. One possible solution to this entire mess was proposed by Alexander Supalov:

- the first call to MPI_INIT initializes MPI
- subsequent calls to MPI_INIT are effectively no-ops
- make MPI_Finalize a no-op

I.e., you initialize once and never finalize.  There's no issue with threads, no issues with collective/local-only INIT/FINALIZE behavior, ...etc.  It's a simple solution.

3. Another possible solution was proposed: create MPI_INIT2 / MPI_FINALIZE2 (pick better names, I know), something like this:

  MPI_Init2(&my_mpi_comm_world, &my_mpi_comm_self);
  MPI_Send(..., my_mpi_comm_world);
  MPI_Finalize2(&my_mpi_comm_world, &my_mpi_comm_self);

I.e., you basically get a handle that identifies *your* "session" of MPI usage.  INIT2/FINALIZE2 are always collective, they return a "personalized"/unique COMM_WORLD/COMM_SELF that basically identify the caller's MPI "session", they are thread safe (e.g., if I ATTR_PUT on my_mpi_comm_world, that attr is not available on your_mpi_comm_world), etc.

This avoids the problem of threads calling INIT and effectively altering global state (especially when it's racy).

Granted, there's lot of other MPI global symbols (e.g., datatypes, etc.).  But many, many things come from MPI_COMM_WORLD (e.g., all non-dynamic communicators, files, and windows -- need to think about dynamics, though...).  So perhaps having *your* copy of MPI_COMM_WORLD/SELF would be sufficient...?

It's an interesting idea; needs some thought (e.g., what to do with the existing MPI_COMM_WORLD and MPI_COMM_SELF symbols -- I'm sure there's lots of other details to think through).

4. Torsten was also concerned that there was a lack of a solid use case for stacked libraries that call INIT/FINALIZE themselves (because today's libraries currently rely on the application to call INIT/FINALIZE), and also a lack of a solid use case for re-initialization of MPI.  To me, this is a chicken/egg issue (i.e., no one does it because you can't), but it's hard to argue against it.

Perhaps those of you who are closer to app developers can cite specific needs/use cases...?

5. Bill/Torsten pointed out that in slide 6, INITIALIZED/FINALIZED are *not* inherently racy per MPI-2+ definitions.  Jeff Hammond's original question was how to make multiple threads in an application (presumably from multiple different libraries) safe to initialize MPI, perhaps where they all do something like this:

  // Disregarding issues about <THREAD_MULTIPLE for this discussion point
  MPI_Initialized(&flag);
  if (0 == flag) MPI_Init_thread(...MULTIPLE...);

Bill pointed out that MPI (2 and 3) says that for THREAD_MULTIPLE, the program must be correct for any interleaving of MPI calls from multiple threads.  By this definition, what JeffH wants to do is erroneous because multiple threads would call MPI_INIT, and is therefore a new use case.  While this is jaw-dropping to me, technically it is correct.  The standard answer to do this is that the different threads must use some mechanism outside of MPI for synchronization/state/whatever.

(I realize this doesn't help JeffH's use case at all, because different libraries have no knowledge of each other and can't coordinate who will call MPI_INIT.  This is just more background info for those of you who weren't there, because we've been actively working on extending the standard to allow JeffH's use case -- I'm just reporting on what was discussed)

-----

Martin took notes for me while I was presenting.  His notes are below, interspersed with my comments and further explanations on his notes.

Begin forwarded message:

> From: Martin Schulz <schulz6 at llnl.gov>
> Subject: Hybrid comments
> Date: September 13, 2013 3:12:07 PM GMT+02:00
> To: Jeff Squyres <jsquyres at cisco.com>
> 
> Hybrid Init/Finalize Notes
> 
> Distinction of undefined vs. prohibited things after epoch
> 	- handle don't have to go stale
> 	- behavior is undefined
> 	- goal should be to keep the spirit of the current Finalize
	--> Torsten was picking on slide 21, where we said a "low quality implementation must behave as if finalized when refcount decrements to 0".  His point is that if someone does something erroneous, the behavior is actually *undefined* -- we can't say the implementation *must behave* a certain way if a user does something erroneous.  We're certainly not going for that; it was just a poorly worded slide (although it took me quite a while to understand his objection!).
> 
> What does "collective" mean after close of an epoch
> 	(if multiple processes are left after a finalize and we initialize again)
	--> The question was raised in slide 23: what happens if all 3 MPI processes call MPI_INIT again?  Dunno -- we didn't talk about that...

> Launch a legacy library inside a new MPI code on a subset of nodes
> 	problem: you can't restrict MPI_COMM_WORLD
	--> This comment was basically the result of a lengthy discussion about the "INIT/FINALIZE not collective unless opening/closing an epoch" issue.  If you allow such behavior, then you could allow 
> 
> Big issue: you get a potentially different MCW if you do local inits (not sure who else does this)
	--> This issue is also related to the "INIT/FINALIZE not collective unless opening/closing an epoch..." discussion.
> 
> Corner case 6:
> 	thread ordering can result in more or less epochs
	--> See my point #1 up above.
> 
> Initialized/Finalized is not racy because MPI 2 requires that all interleavings have to be valid - otherwise the code is erroneous
	--> See my point #5 up above.
> 
> create bubbles with their own MCW
> 	have init return separate MCW
> 	enables separation
	--> See my point #3 up above

-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/