[Mpi-forum] MPI_Abort and error handlers
jeff.science at gmail.com
Fri Aug 7 08:44:24 CDT 2015
I am looking at MPI-3.1 8.3 "Error Handling" and 8.7.1 "Allowing User
Functions at Process Termination" right now, but I cannot figure out the
interaction between MPI_Abort and error handlers.
When MPI_Abort is called by one process, what is the effect on the others,
besides "MPI will try to clean them up"? I suppose one has to assume the
worst case of "no cleanup" or "it's like kill -9" right now. What does a
high-quality implementation do?
What I am looking for a way to have error handlers called when MPI_Abort is
called somewhere. I don't expect this can be required, but "a high-quality
implementation will do this" would be very useful.
The motivation is for one-sided job termination, e.g. shmem_global_exit and
upc_global_exit (details below). The challenge is that these functions
require I/O flushing and resource release. I really do not want to have to
burn a thread just to satisfy this requirement in OSHMPI.
OpenSHMEM 1.2 says this:
shmem_global_exit is a non-collective routine that allows any one PE to
force termination of an OpenSHMEM program for all PEs, passing an exit
status to the execution environment. This routine terminates the entire
program, not just the OpenSHMEM portion. When any PE calls
shmem_global_exit, it results in 27 the immediate notification to all PEs
to terminate. shmem_global_exit flushes I/O and releases resources in
accordance with C/C++/Fortran language requirements for normal program
termination. If more than one PE calls shmem_global_exit, then the exit
status returned to the environment shall be one of the values passed to
shmem_global_exit as the status argument. There is no return to the caller
of shmem_global_exit; control is returned from the OpenSHMEM program to the
execution environment for all PEs.
shmem_global_exit may be used in situations where one or more PEs have
determined that the program has completed and/or should terminate early.
Accordingly, the integer status argument can be used to 38
pass any information about the nature of the exit, e.g an encountered error
or a found solution. Since shmem_global_exit is a non-collective routine,
there is no implied synchronization, and all PEs must ter- minate
regardless of their current execution state. While I/O must be flushed for
standard language I/O calls from C/C++/Fortran, it is implementation
dependent as to how I/O done by other means (e.g. third party I/O
libraries) is handled. Similarly, resources are released according to
C/C++/Fortran standard language requirements, but this may not include all
resources allocated for the OpenSHMEM program. However, a quality
implementation will make a best effort to flush all I/O and clean up all
UPC says this:
7.2.1 Termination of all threads
1 #include <upc.h>
void upc_global_exit(int status);
2 upc_global_exit() flushes all I/O, releases all storage,
and terminates the execution for all active threads.
jeff.science at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpi-forum