[Mpi3-ft] Matching MPI communication object creation across process failure

Josh Hursey jjhursey at open-mpi.org
Tue Jan 31 11:35:19 CST 2012

Actually the following thread might be more useful for this discussion:

The example did not come out well in the archives, so below is the diagram
again (hopefully that will work):
So the process stack looks like:
P0                        P1
---------------        ----------------
Dup(comm[X-1])         Dup(comm[X-1])
MPI_Allreduce()        MPI_Allreduce()
Dup(comm[X])           -> Error
 -> Error

So should P1 be required to call Dup(comm[X])?

-- Josh

On Wed, Jan 25, 2012 at 5:08 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:

> The current proposal states that MPI object creation functions (e.g.,
> MPI_Comm_create, MPI_Win_create, MPI_File_open):
> -------------------
> All participating communicator(s) must be collectively active before
> calling any communicator creation operation. Otherwise, the communicator
> creation operation will uniformly raise an error code of the class
> If a process failure prevents the uniform creation of the communicator
> then the communicator construction operation must ensure that the
> communicator is not created, and all alive participating processes will
> raise an error code of the class MPI_ERR_PROC_FAIL_STOP. Communicator
> construction operations will match across the notification of a process
> failure. As such, all alive processes must call the communicator
> construction operations the same number of times regardless of whether the
> emergent process failure makes the call irrelevant to the application.
> -------------------
> So there are three points here:
>  (1) That the communicator must be 'collectively active' before calling
> the operation,
>  (2) Uniform creation of the communication object, and
>  (3) Creation operations match across process failure.
> Point (2) seems to be necessary so that all processes only ever see a
> communication object that is consistent across all processes. This implies
> a fault tolerant agreement protocol (group membership).
> There was a question about why point (3) is necessary. We (Darius, UTK,
> and I) discussed this on 12/12/2011, and I posted my notes on this to the
> list:
>   http://lists.mpi-forum.org/mpi3-ft/2011/12/0938.php
> Looking back at my written notes, they don't have much more than that to
> add to the discussion.
> So the problem with (3) seemed to arrise out of the failure handlers,
> though I am not convinced that they are strictly to blame in this
> circumstance. It seems that the agreement protocol might be factoring into
> the discussion as well since it is strongly synchronizing, if not all
> processes call the operation how does it know when to bail out. The peer
> processes are (a) calling that operation, (b) going to call it but have not
> yet, or (c) will never call it because they decided independently not to
> base on a locally reported process failure.
> It seems that the core problem has to do with when to break out of the
> collective creation operation, and when to restore matching.
> So should re reconsider the restriction on (3)? More to the point, is it
> safe to not require (3)?
> -- Josh
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey

Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20120131/1162897c/attachment-0001.html>

More information about the mpiwg-ft mailing list