[Mpi3-ft] MPI_Init / MPI_Finalize
Bronis R. de Supinski
bronis at llnl.gov
Wed Aug 25 23:08:36 CDT 2010
Fab:
There is no wiggle room. MPI_FINALIZE is collective across
MPI_COMM_WORLD. I do not understand why you would say otherwise.
Here is more of the passage I was quoting:
-----------------
MPI_FINALIZE is collective over all connected processes. If no processes
were spawned, accepted or connected then this means over MPI_COMM_WORLD;
otherwise it is collective over the union of all processes that have been
and continue to be connected, as explained in Section Releasing
Connections on page Releasing Connections.
-----------------
The "connected" terminology is used to handle dynamic process
management issues, for which the set of all processes cannot
easily be defined in terms of a single communicator.
Bronis
On Wed, 25 Aug 2010, Fab Tillier wrote:
> What defines "connected"? MPI_FINALIZE isn't collective across MPI_COMM_WORLD, as processes might never communicate with one another. Even if they do, communication may not require a connection, so they may never be connected.
>
> It seems to me there might be enough wiggle room in the standard to allow MPI_Finalize to not be collective at all?
>
> -Fab
>
> Bronis R. de Supinski wrote on Wed, 25 Aug 2010 at 15:06:38
>
>>
>> Josh:
>>
>> On p293 of the 2.2 standard, it says "MPI_FINALIZE is collective
>> over all connected processes." I don't know that the call being
>> collective changes your analysis but your statement that the
>> call is not collective was incorrect...
>>
>> Bronis
>>
>>
>> On Wed, 25 Aug 2010, Joshua Hursey wrote:
>>
>>> During the discussion of the run-though stabilization proposal today
>>> on the teleconf, we spent a while discussing the expected behavior of
>>> MPI_Init and MPI_Finalize in the presence of process failures. I would
>>> like to broaden the discussion a bit to help pin down the expected
>>> behavior.
>>>
>>> MPI_Init(): ----------- Problem: If a process fails before or during
>>> MPI_Init, what should the MPI implementation do?
>>>
>>> The current standard says nothing about the return value of
>>> MPI_Init() (Ch. 8.7). To the greatest possible extent the application
>>> should not be put in danger if it wishes to ignore errors (assumes
>>> MPI_ERRORS_ARE_FATAL), so returning an error from this function (in
>>> contrast to aborting the job) might be dangerous. However, if the
>>> application is prepared to handle process failures, it is unable to
>>> communicate that information to the MPI implementation until after the
>>> completion of MPI_Init().
>>>
>>> So a couple of solutions were presented each with pros and cons (please
>>> fill in if I missed any): 1) If a process fails in MPI_Init() (default
>>> error handler is
>>> MPI_ERRORS_ARE_FATAL) then the entire job is aborted (similar to
>>> calling MPI_Abort on MPI_COMM_WORLD).
>>>
>>> 2) If a process fails in MPI_Init() the MPI implementation will
>>> return an appropriate error code/class (e.g., MPI_ERR_RANK_FAIL_STOP),
>>> and all subsequent calls into the MPI implementation will return the
>>> error class MPI_ERR_OTHER (should be create a MPI_ERR_NOT_ACTIVE?).
>>> Applications should eventually notice the error and terminate.
>>>
>>> 3) Allow the application to register only the MPI_ERRORS_RETURN
>>> handle on MPI_COMM_WORLD before MPI_Init() using the
>>> MPI_Errhandler_set() function. Errors that occur before the
>>> MPI_Errhandler_set() call are fatal. Errors afterward, including during
>>> MPI_Init() are not fatal.
>>>
>>> In the cases where MPI_Init() returns MPI_ERR_RANK_FAIL_STOP to
>>> indicate a process failure, is the library usable or not? If the
>>> application can continue running through the failure, then the MPI
>>> library should still be usable, thus MPI_Init() must be fault tolerant
>>> in its initialization to be able to handle process failures. If the MPI
>>> implementation finds itself in trouble and cannot continue it should
>>> return MPI_ERR_CANNOT_CONTINUE from all subsequent calls including
>>> MPI_Init, if possible.
>>>
>>>
>>> MPI_Finalize():
>>> ---------------
>>> Problem: If a process fails before or during MPI_Finalize (and the
>>> error handler is not MPI_ERRORS_ARE_FATAL), what should this function
>>> return? Should that return value be consistent to all processes?
>>>
>>> To preserve locality of fault handling, a local process should not be
>>> explicitly forced to recognize the failure of a peer process that they
>>> never interact with neither directly (e.g., point-to-point) or
>>> indirectly (e.g., collective). So MPI_Finalize should be fault tolerant
>>> and keep trying to complete even in the presence of failures.
>>>
>>> MPI_Finalize is not required to be a collective operation, though it
>>> is often implemented that way. An implementation may need to delay the
>>> return from MPI_Finalize until its role in the failure information
>>> distribution channel is complete. But we should not require a multi-
>>> phase commit protocol to ensure that everyone either succeeds or
>>> returns some error. Implementations may do so internally in order to
>>> ensure that MPI_Finalize does not hang.
>>>
>>> If MPI_Finalize returns an error (say MPI_ERR_RANK_FAIL_STOP
>>> indicating a 'new to this rank' failure), what good is this information
>>> to the application? It cannot query for which rank(s) failed since MPI
>>> has been finalized. Nor can it initiate recovery. The best it could do
>>> is assume that all other processes failed and take local action.
>>>
>>>
>>> MPI_Finalize: MPI_COMM_WORLD process rank 0:
>>> --------------------------------------------
>>> In chapter 8, Example 8.7 illustrates that "Although it is not
>>> required that all processes return from MPI_Finalize, it is required
>>> that at least process 0 in MPI_COMM_WORLD return, so that users can
>>> know that the MPI portion of the computation is over."
>>>
>>> We deduced that the reasoning for this explanation was to allow for
>>> MPI implementation that create and destroy MPI processes during
>>> init/finalize from rank 0. Or worded differently, rank 0 is the only
>>> rank that can be assumed to exist before MPI_Init and after
>>> MPI_Finalize.
>>>
>>> Problem: So what if rank 0 fails at some point during the computation
>>> (or just some point during MPI_Finalize)?
>>>
>>> In the proposal, I added an advice to users to tell them to not
>>> depend on any specific ranks to exist before MPI_Init or after
>>> MPI_Finalize. So, in a faulty environment, the example will produce
>>> incorrect results under certain failure scenarios (e.g., failure of
>>> rank 0).
>>>
>>> In an MPI environment that depends on rank 0 for process creation and
>>> destruction, the failure of rank 0 is (should be?) critical and the MPI
>>> implementation will either abort the job or return
>>> MPI_ERR_CANNOT_CONTINUE from all calls to the MPI implementation. So we
>>> believe that the advice to users was a sufficient addition to this
>>> section. What do others think?
>>>
>>>
>>> So MPI_Init seems to be a more complex issue than MPI_Finalize. What
>>> do folks think about the presented problems and possible solutions? Are
>>> there other issues not mentioned here that we should be addressing?
>>>
>>> -- Josh
>>>
>>> Run-Through Stabilization Proposal:
>>> https://**svn.mpi-forum.org/trac/mpi-forum-
>>> web/wiki/ft/run_through_stabilization
>>>
>>> ------------------------------------
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://**www.**cs.indiana.edu/~jjhursey
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://**lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://*lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
More information about the mpiwg-ft
mailing list