[Mpi3-ft] MPI_Comm_validate parameters
Graham, Richard L.
rlgraham at ornl.gov
Fri Feb 25 14:47:16 CST 2011
Why would an application want to distinguish between what you have called
local processes and global processes ? Seems that an app would just want
to ask for the set of failed procs, and then decide if to have local or
global response. I can't think of a use case that would deliberately want
only a partial answer to that question.
Rich
On 2/25/11 2:58 PM, "Darius Buntinas" <buntinas at mcs.anl.gov> wrote:
>
>On the last concall, we discussed the semantics of the comm validate
>functions, and how to query for failed processes. Josh asked me to
>capture these ideas and send it out to the list. Below is my attempt at
>this. Please let me know if I missed anything from the discussion.
>
>Comments are encouraged.
>
>-d
>
>
>Processes' Views of Failed Processes
>------------------------------------
>
>For each communicator, with the set of processes C = {0, ...,
>size_of_communicator-1}, consider three sets of failed processes:
>
> F, the set of actually failed processes in the communicator;
>
> L, the set of processes of the communicator known to be failed by
> the local process; and
>
> G, the set of processes of the communicator that the local process
> knows that every process in the communicator knows to be failed.
>
>Note that C >= F >= L >= G. Because the process may have an imperfect
>view of the state of the system, a process may never know F, so we
>allow the process to query only for L or G.
>
>Determining the Sets of Failed Processes
>----------------------------------------
>
>MPI provides the following functions to determine L and G. [These
>names can, and probably should, change.]
>
> MPI_Comm_validate_local(comm, &num_failed, &num_new) -- This is a
> local function that checks for failed processes and updates
> the set L. Note that this check for failed processes may be
> imperfect, in that even if no additional processes fail L
> might not be the same as F. When the function returns,
> num_failed = |L| and num_new will be the number of newly
> discovered failed processes, i.e., if L' = L before
> MPI_Comm_validate_local wass called, then num_new = |L|-|L'|.
>
> MPI_Comm_validate_global(comm, &num_failed, &num_new) -- This is
> collective. Each process in the communicator checks for
> failed processes, performs an all-reduce to take the union of
> all the processes known to have failed by each process, then
> updates the sets G and L. After this function returns L = G.
> The parameters num_failed and num_new are set the same as in
> MPI_Comm_validate_local.
>
>Note that the set L will not change outside of a call to
>MPI_Comm_validate_local or MPI_Comm_validate_global, and the set G
>will not change outside of a call to MPI_Comm_validate_global.
>
>Querying the Sets of Failed Processes
>-------------------------------------
>
>MPI also provides functions to allow the user to query L and G.
>
> MPI_Comm_get_failed_local(comm, failed_type, failed_array) -- This
> is a local function. The parameter failed_type is one of
> MPI_FAILED_ALL, MPI_FAILED_NEW.
>
> When failed_type = MPI_FAILED_ALL, the set L is returned in
> failed_array. The size of the array returned is the value
> returned in num_failed in the last call to
> MPI_Comm_validate_local or MPI_Comm_validate_global. The
> caller of the function must ensure that failed_array is large
> enough to hold num_failed integers.
>
> When failed_type = MPI_FAILED_NEW, the set of newly discovered
> failed processes is returned in failed_array. I.e., if L' = L
> before MPI_Comm_validate_local or MPI_Comm_validate_global was
> called, then failed_array contains the list of processes in
> the set L - L'. It is the caller's responsibility to ensure
> that the failed_array is large enough to hold num_new
> integers, where num_new is the value returned the last time
> MPI_Comm_validate_local or MPI_Comm_validate_global was
> called.
>
> MPI_Comm_get_failed_global(comm, failed_type, failed_array) -- This
> is a local function. The parameter failed_type is one of
> MPI_FAILED_ALL, MPI_FAILED_NEW.
>
> When failed_type = MPI_FAILED_ALL the set G is retured in
> failed_array. The size of the array is the value returned in
> num_failed in the last call to MPI_Comm_validate_global. The
> caller of the function must ensure that the failed_array is
> large enough to hold num_failed integers.
>
> When failed_type = MPI_FAILED_NEW, the set of newly discovered
> failed processes is returned in failed_array. I.e., if G' = G
> before MPI_Comm_validate_global was called, then failed_array
> contains the list of processes in the set G - G'. It is the
> caller's responsibility to ensure that the failed_array is
> large enough to hold num_new integers, where num_new is the
> value returned the last time MPI_Comm_validate_global was
> called.
>
>[Initially, I included a num_uncleared variable in
>MPI_Comm_validate_local to return the number of uncleared processes,
>but since the number of cleared processes does not change in that
>function, it seemed redundant. Instead, I'm thinking we should have a
>separate mechansim to query the number and set of cleared/uncleared
>processes to go along with the functions for clearing a failed
>process.]
>
>
>
>
>Implementation Notes: Memory Requirements for Maintaining the Sets of
>Failed Processes
>---------------------------------------------------------------------
>
>One set G is needed per communicator. The set G can be stored in a
>distributed fashion across the processes of the communicator to reduce
>the storage needed by each individual process. [What happens if some
>of the processes storing the distributed array fail? We can store
>them with some redundancy and tolerate K failures. We can also
>recompute the set, but we'd need the old value to be able to return
>the FAILED_NEW set.]
>
>L cannot be stored in a distributed way, because each process may have
>a different set L. However, since L >= G, L can be represented as the
>union of G and L', so only L' needs to be stored locally. Presumably
>L' would be small. Note that the implementation is free to let L' be
>a subset of the processes it has connected to or tried to connect to.
>So the set L' would be about as large as the number of connected
>processes, and can be represented as a state variable (e.g., UP, or
>DOWN) in the data structure of connected processes. Of course
>pathological cases can be found, but these would require (1) that
>there are a very large number of failed processes, and (2) that the
>process has tried to connect to a very large number of them.
>
>The storage requirements to keep track of the set of cleared processes
>is similar to that of storing L. Since G is a subset of the set of
>cleared processes which itself is a subset of L, the set of cleared
>processes can be represented as an additional state in the connected
>processes data structure (e.g., UP, DOWN or CLEARED). Note that after
>a global validate, L = G = the set of cleared processes, so calling
>the global validate function could potentially free up storage. A
>note about this and the memory requirements of clearing processes
>locally can be included in advice to users.
>
>
>
>
>
>_______________________________________________
>mpi3-ft mailing list
>mpi3-ft at lists.mpi-forum.org
>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list