[Mpi3-ft] MPI_Comm_validate parameters

Tue Mar 1 08:30:26 CST 2011

On Feb 28, 2011, at 3:13 PM, Darius Buntinas wrote:

> 
> On Feb 28, 2011, at 1:50 PM, Joshua Hursey wrote:
> 
>> Reworked block below
>> -----------------------
> 
> Looks good
> 
>>>> We have the following states (prefix with MPI_RANK_STATE_):
>>>> - OK (active)
>>>> - FAILED (failed, unrecognized)
>>>> - NULL (failed, recognized)
>>>> 
>>>> We could add a few new modifiers (prefix with MPI_RANK_STATE_MOD_):
>>>> - NEW (since last call to {global|local} validate)
>>>> - OLD (before last call to {global|local} validate)
>>>> - RECOGNIZED (-- maybe to replace the NULL state above?
>>> 
>>> I like this idea.
>> 
>> The idea of or'ing states, or the idea of having a 'Recognized' modifier, or both?
> 
> Both.  (But I prefer NULL to RECOGNIZED.)
> 
>>> 
>>>> To determine "L" or "G" they would use the following functions:
>>>> ----------------------------
>>>> MPI_Comm_validate_local(comm, &num_failed)
>>>> - Local operation
>>>> - Update L
>>>> - num_failed = |L| (both recognized and unrecognized)
>>>> 
>>>> MPI_Comm_validate_global(comm, &num_failed)
>>>> - Collective operation
>>>> - Update G
>>>> - Update L = G
>>>> - num_failed = |L| = |G|
>>>> ----------------------------
>>>> 
>>>> 
>>>> Accessors have the following properties:
>>>> - These are local operations
>>>> - None of them modify "L" or "G"
>>>> - Take an or'ed list of states and modifiers to determine 'type'
>>>> - If incount = 0, then outcount = |L| or |G|, rank_infos ignored
>>>> 
>>>> ----------------------------
>>>> MPI_Comm_get_state_local(comm, type, incount, &outcount, rank_infos[])
>>>> - Local operation
>>>> - Returns the set of processes in "L" that match the 'type' specified
>>>> - outcount = min(incount, |L|)
>>>> - MPI_ERR_SIZE if incount != 0 and incount < |L|
>>>> 
>>>> MPI_Comm_get_state_global(comm, type, incount, &outcount, rank_infos[])
>>>> - Local operation
>>>> - Returns the set of processes in "G" that match the 'type' specified
>>>> - outcount = min(incount, |G|)
>>>> - MPI_ERR_SIZE if incount != 0 and incount < |G|
>>>> ----------------------------
>>>> 
>>>> 
>>>> So an application can do something like:
>>>> ------------
>>>> MPI_Comm_validate_global(comm, &num_failed_start);
>>>> /* Do work */
>>>> MPI_Comm_validate_global(comm, &num_failed_end);
>>>> 
>>>> if( num_failed_start < num_failed_end ) { /* something failed */
>>>> incount = 0;
>>>> MPI_Comm_get_state_global(comm,
>>>> MPI_RANK_STATE_NULL|MPI_RANK_STATE_MOD_NEW,
>>>> incount, &outcount, NULL);
>>>> rank_infos = malloc(... * outcount);
>>>> incount = outcount;
>>>> MPI_Comm_get_state_global(comm,
>>>> MPI_RANK_STATE_NULL|MPI_RANK_STATE_MOD_NEW,
>>>> incount, &outcount, rank_infos);
>>>> }
>>>> ------------
>>>> 
>>>> Instead of having the 'if incount = 0' rule, we could just introduce a new function like:
>>>> ----------------------------
>>>> MPI_Comm_get_num_state_local(comm, type, &count);
>>>> MPI_Comm_get_num_state_global(comm, type, &count);
>>> 
>>> In that case we can even replace num_failed in the comm_validate functions with a flag: new_failures.  Then use the above to get the counts.
>> 
>> Or even better, eliminate the second argument from the MPI_Comm_validate_{local|global}, and just pass the communicator to it - similar to MPI_Barrier. Since the accessor functions are always related to the last update call there is no real need (other than shorthand) to have the additional parameter.
> 
> I considered that, but that would require two calls to determine if anything failed.  Replacing the count with a are_there_new_failures flag would solve that.

I like the idea of a flag for new failures. Reduces the number of function calls for the common case.

> 
>> 
>> My removing the count parameter from the MPI_Comm_validate_{local|global} we get out of the business of deciding which count to return, and let the user specify it explicitly. 
>> 
>> The example would now expand out a bit to be:
>> ------------
>> MPI_Comm_validate_global(comm);
>> MPI_Comm_get_num_state_global(comm, STATE_NULL|MOD_NEW, &num_failed_start);
>> /* Do work */
>> MPI_Comm_validate_global(comm);
>> MPI_Comm_get_num_state_global(comm, STATE_NULL|MOD_NEW, &num_failed_end);
>> 
>> if( num_failed_start < num_failed_end ) { /* something failed */
>> incount = num_failed_end;
>> rank_infos = malloc(... * incount);
>> MPI_Comm_get_state_global(comm,
>>     MPI_RANK_STATE_NULL|MPI_RANK_STATE_MOD_NEW,
>>     incount, &outcount, rank_infos);
>> }
>> ------------
> 
> Replacing count with a flag would look like this.  So in the common, non-error case you just do a branch.  It's not so much a performance thing (validate_global is collective), but a convenience thing to the programmer.
> 
> MPI_Comm_validate_global(comm, &new_failures);
> /* Do work */
> MPI_Comm_validate_global(comm, &new_failures);
> 
> if( new_failures ) { /* something failed */
>  MPI_Comm_get_num_state_global(comm, STATE_NULL|MOD_NEW, &num_failed_end);
>  incount = num_failed_end;
>  rank_infos = malloc(... * incount);
>  MPI_Comm_get_state_global(comm,
>     MPI_RANK_STATE_NULL|MPI_RANK_STATE_MOD_NEW,
>     incount, &outcount, rank_infos);
> }
> 
> Hmm.  We could combine validate and get_num_state:
>    MPI_Comm_validate_global(comm, count_type, &count)
> This would let the user decide what count to return.

I think if we let the user define the 'type' here then they may get confused with the interface. Since they could read it as agree upon all processes only in the specified state, instead of agree upon the state of all processes in any state (and returning a flag if there are new failures).

I also just noticed that we are missing individual rank accessors (useful when watching one or more critical 'root' ranks, for example):
----------------------------
MPI_Comm_get_state_rank_local(comm, rank, rank_info)
 - Local operation
 - Returns the state of the rank specified within the MPI_Rank_info object as known in "L_i"
 - error if rank is invalid

MPI_Comm_get_state_rank_global(comm, rank, rank_info)
 - same as MPI_Comm_get_stae_rank_local, but over "G"
----------------------------

One side note that we should make explicit is that "L_i" can be updated in two ways. First, when an application calls MPI_comm_validate_local() to update "L_i" with all locally known failures. Secondly, if a failure is detected during any MPI operation (except the validate accessor functions) it will also update "L_i". So if we do a MPI_Send(rank=2), and rank 2 fails during the send then we want to make sure that if the user asks for the state of rank 2 it is identified as failed and not active. The application is implicitly noticing the update to "L_i" from the return code of the MPI_Send operation.

Do folks see a problem with this additional semantic?

-- Josh

> 
> -d
> 
> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey