[Mpi3-ft] MPI_Comm_validate parameters

Thu Mar 3 11:25:51 CST 2011

If we allow the "L_i" and "G" lists to be only updated explicitly by the user, then they are in control over the management of the view of these vectors among their various threads. Per Darius's comment about not allowing MPI_Send to update "L_i", I think this reinforces this constraint though introduces a slightly awkward semantic.

I'll take another look at the old documents and see if I can pull out any thread safety conversations. Admittedly, on my initial pass, I was not looking at the thread safety aspects of the original proposal in too much detail so I may have overlooked something subtle.

-- Josh

On Mar 2, 2011, at 11:49 AM, Graham, Richard L. wrote:

> I believe that what is being proposed here is not thread-safe.  If you
> look back at some of the docs that were put together when we were actively
> looking at this over a year ago, there was a proposal on how to do this in
> a thread safe manner.  Having said that, I am not sure if thread safety is
> an issue here or not - it depends very much on the life-cycle of the state
> information.  In the original proposal, the "vector" returned by
> comm_validate had some state information.
> 
> Rich
> 
> On 3/2/11 9:51 AM, "Joshua Hursey" <jjhursey at open-mpi.org> wrote:
> 
>> 
>> On Mar 1, 2011, at 12:45 PM, Darius Buntinas wrote:
>> 
>>> 
>>> On Mar 1, 2011, at 8:30 AM, Joshua Hursey wrote:
>>> 
>>>> One side note that we should make explicit is that "L_i" can be
>>>> updated in two ways. First, when an application calls
>>>> MPI_comm_validate_local() to update "L_i" with all locally known
>>>> failures. Secondly, if a failure is detected during any MPI operation
>>>> (except the validate accessor functions) it will also update "L_i". So
>>>> if we do a MPI_Send(rank=2), and rank 2 fails during the send then we
>>>> want to make sure that if the user asks for the state of rank 2 it is
>>>> identified as failed and not active. The application is implicitly
>>>> noticing the update to "L_i" from the return code of the MPI_Send
>>>> operation.
>>>> 
>>>> Do folks see a problem with this additional semantic?
>>> 
>>> This could present a problem with get_num_state and get_state in a
>>> multithreaded environment.  Of course if one has only one thread per
>>> communicator, then it's OK, but is that realistic?  The idea of L_i and
>>> G being constant between validate calls was to avoid  races like this.
>>> 
>>> The user should understand that the way to get the state of a process
>>> is 
>>>   validate_local();get_state_rank()
>>> or 
>>>   validate_local();get_num_state();malloc();get_state()
>>> And if the process has multiple threads using the same communicator, it
>>> can synchronize access to validate_local as appropriate.
>>> 
>>> I think the _local functions can be considered a convenience for the
>>> user, since the user could keep track of L_i herself using the _global
>>> values and noticing failed sends/receives.  So if we look at it that
>>> way, the fact that get_state doesn't report rank 2 (in your example
>>> above) as having failed immediately after the send, might be OK.
>> 
>> It just seems a bit odd as a semantic for the interfaces. But I
>> understand your point.
>> 
>> So the situation is (in shorthand):
>> ------------------------
>> validate_local(comm);
>> get_state_rank(comm, 2, state) /* state=OK; */
>> 
>> /*** 2 fails ***/
>> 
>> ret = MPI_Send(comm, 2); /* Error */
>> if( ERR_FAIL_STOP == ret ) {
>> get_state_rank(comm, 2, state) /* state=OK; */
>> validate_local(comm);
>> get_state_rank(comm, 2, state) /* state=FAILED; */
>> }
>> ------------------------
>> 
>> instead of:
>> ------------------------
>> validate_local(comm);
>> get_state_rank(comm, 2, state) /* state=OK; */
>> 
>> /*** 2 fails ***/
>> 
>> ret = MPI_Send(comm, 2); /* Error */
>> if( ERR_FAIL_STOP == ret ) {
>> get_state_rank(comm, 2, state) /* state=FAILED; */
>> }
>> ------------------------
>> 
>> So the MPI implementation must keep another list of failed processes
>> known to the MPI implementation (call it "H_i" for Hidden at Process i),
>> but not yet made available in "L_i". So the MPI implementation checks
>> "H_i" to determine if the MPI_Send() should fail. "H_i" represents the
>> set of additional failures not in "G" or "L_i" at some time T for Process
>> i. We can use list projections (similar to how "L_i" can be a physically
>> smaller list than "G") for representing "H_i" to reduce the memory
>> impact, but this does mean that there is slightly more work on the MPI
>> internals side of things.
>> 
>> So the programming practice that we are advocating is that before any
>> get_state() operation that the user call validate_local() - or more
>> precisely synchronize their local/global view of the state of the
>> processes on the communicator. Right?
>> 
>> -- Josh
>> 
>> 
>>> 
>>> -d
>>> 
>>> 
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>> 
>> 
>> ------------------------------------
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 

------------------------------------
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey