[Mpi3-ft] MPI_Comm_validate parameters

Fri Feb 25 14:47:16 CST 2011

Why would an application want to distinguish between what you have called
local processes and global processes ?  Seems that an app would just want
to ask for the set of failed procs, and then decide if to have local or
global response.  I can't think of a use case that would deliberately want
only a partial answer to that question.

Rich

On 2/25/11 2:58 PM, "Darius Buntinas" <buntinas at mcs.anl.gov> wrote:

>
>On the last concall, we discussed the semantics of the comm validate
>functions, and how to query for failed processes.  Josh asked me to
>capture these ideas and send it out to the list.  Below is my attempt at
>this.  Please let me know if I missed anything from the discussion.
>
>Comments are encouraged.
>
>-d
>
>
>Processes' Views of Failed Processes
>------------------------------------
>
>For each communicator, with the set of processes C = {0, ...,
>size_of_communicator-1}, consider three sets of failed processes:
>
>    F, the set of actually failed processes in the communicator;
>
>    L, the set of processes of the communicator known to be failed by
>    the local process; and
>
>    G, the set of processes of the communicator that the local process
>    knows that every process in the communicator knows to be failed.
>
>Note that C >= F >= L >= G.  Because the process may have an imperfect
>view of the state of the system, a process may never know F, so we
>allow the process to query only for L or G.
>
>Determining the Sets of Failed Processes
>----------------------------------------
>
>MPI provides the following functions to determine L and G.  [These
>names can, and probably should, change.]
>
>    MPI_Comm_validate_local(comm, &num_failed, &num_new) -- This is a
>        local function that checks for failed processes and updates
>        the set L.  Note that this check for failed processes may be
>        imperfect, in that even if no additional processes fail L
>        might not be the same as F.  When the function returns,
>        num_failed = |L| and num_new will be the number of newly
>        discovered failed processes, i.e., if L' = L before
>        MPI_Comm_validate_local wass called, then num_new = |L|-|L'|.
>
>    MPI_Comm_validate_global(comm, &num_failed, &num_new) -- This is
>        collective.  Each process in the communicator checks for
>        failed processes, performs an all-reduce to take the union of
>        all the processes known to have failed by each process, then
>        updates the sets G and L.  After this function returns L = G.
>        The parameters num_failed and num_new are set the same as in
>        MPI_Comm_validate_local.
>
>Note that the set L will not change outside of a call to
>MPI_Comm_validate_local or MPI_Comm_validate_global, and the set G
>will not change outside of a call to MPI_Comm_validate_global.
>
>Querying the Sets of Failed Processes
>-------------------------------------
>
>MPI also provides functions to allow the user to query L and G.
>
>    MPI_Comm_get_failed_local(comm, failed_type, failed_array) -- This
>        is a local function.  The parameter failed_type is one of
>        MPI_FAILED_ALL, MPI_FAILED_NEW.
>
>        When failed_type = MPI_FAILED_ALL, the set L is returned in
>        failed_array.  The size of the array returned is the value
>        returned in num_failed in the last call to
>        MPI_Comm_validate_local or MPI_Comm_validate_global.  The
>        caller of the function must ensure that failed_array is large
>        enough to hold num_failed integers.
>
>    When failed_type = MPI_FAILED_NEW, the set of newly discovered
>    failed processes is returned in failed_array.  I.e., if L' = L
>    before MPI_Comm_validate_local or MPI_Comm_validate_global was
>    called, then failed_array contains the list of processes in
>    the set L - L'.  It is the caller's responsibility to ensure
>    that the failed_array is large enough to hold num_new
>    integers, where num_new is the value returned the last time
>    MPI_Comm_validate_local or MPI_Comm_validate_global was
>    called.
>
>    MPI_Comm_get_failed_global(comm, failed_type, failed_array) -- This
>        is a local function.  The parameter failed_type is one of
>        MPI_FAILED_ALL, MPI_FAILED_NEW.
>
>        When failed_type = MPI_FAILED_ALL the set G is retured in
>        failed_array.  The size of the array is the value returned in
>        num_failed in the last call to MPI_Comm_validate_global.  The
>        caller of the function must ensure that the failed_array is
>        large enough to hold num_failed integers.
>
>    When failed_type = MPI_FAILED_NEW, the set of newly discovered
>    failed processes is returned in failed_array.  I.e., if G' = G
>    before MPI_Comm_validate_global was called, then failed_array
>    contains the list of processes in the set G - G'.  It is the
>    caller's responsibility to ensure that the failed_array is
>    large enough to hold num_new integers, where num_new is the
>    value returned the last time MPI_Comm_validate_global was
>    called.
>
>[Initially, I included a num_uncleared variable in
>MPI_Comm_validate_local to return the number of uncleared processes,
>but since the number of cleared processes does not change in that
>function, it seemed redundant.  Instead, I'm thinking we should have a
>separate mechansim to query the number and set of cleared/uncleared
>processes to go along with the functions for clearing a failed
>process.]
>
>
>
>
>Implementation Notes: Memory Requirements for Maintaining the Sets of
>Failed Processes
>---------------------------------------------------------------------
>
>One set G is needed per communicator.  The set G can be stored in a
>distributed fashion across the processes of the communicator to reduce
>the storage needed by each individual process.  [What happens if some
>of the processes storing the distributed array fail?  We can store
>them with some redundancy and tolerate K failures.  We can also
>recompute the set, but we'd need the old value to be able to return
>the FAILED_NEW set.]
>
>L cannot be stored in a distributed way, because each process may have
>a different set L.  However, since L >= G, L can be represented as the
>union of G and L', so only L' needs to be stored locally.  Presumably
>L' would be small.  Note that the implementation is free to let L' be
>a subset of the processes it has connected to or tried to connect to.
>So the set L' would be about as large as the number of connected
>processes, and can be represented as a state variable (e.g., UP, or
>DOWN) in the data structure of connected processes.  Of course
>pathological cases can be found, but these would require (1) that
>there are a very large number of failed processes, and (2) that the
>process has tried to connect to a very large number of them.
>
>The storage requirements to keep track of the set of cleared processes
>is similar to that of storing L.  Since G is a subset of the set of
>cleared processes which itself is a subset of L, the set of cleared
>processes can be represented as an additional state in the connected
>processes data structure (e.g., UP, DOWN or CLEARED).  Note that after
>a global validate, L = G = the set of cleared processes, so calling
>the global validate function could potentially free up storage.  A
>note about this and the memory requirements of clearing processes
>locally can be included in advice to users.
>
>
>
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft