<font size=2 face="sans-serif">Since the 2nd paragraph seems to be covering

MPI_ANY_SOURCE as a special case I doubt that a reader would attempt to

draw many conclusions about MPI_ANY_SOURCE from the first paragraph.  If

they did, then the first sentence could easily be taken as an indication

that blocking MPI_ANY_SOURCE receives will not return an error due to a

single process failure because that failure is not guaranteed to prevent

the MPI implementation from successfully completing the communication.

  We might intend it to mean that a blocking recv on MPI_ANY_SOURCE

returns an error when a process fails, but I don't think the text is clear

about that.   The old text wasn't terribly clear either, but the catchall

"In all other cases, the operation must return MPI_ERR_PROC_FAILED"

 sort of covered it.</font>

<br><font size=2 face="sans-serif">Dave</font>

<br>

<br>

<br>

<br><font size=1 color=#5f5f5f face="sans-serif">From:      

 </font><font size=1 face="sans-serif">Josh Hursey <jjhursey@open-mpi.org></font>

<br><font size=1 color=#5f5f5f face="sans-serif">To:      

 </font><font size=1 face="sans-serif">"MPI 3.0 Fault

Tolerance and Dynamic Process Control working Group" <mpi3-ft@lists.mpi-forum.org></font>

<br><font size=1 color=#5f5f5f face="sans-serif">Date:      

 </font><font size=1 face="sans-serif">03/16/2012 01:44 PM</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Subject:    

   </font><font size=1 face="sans-serif">Re: [Mpi3-ft]

Clarification Ticket 323: MPI_ANY_SOURCE and        MPI_Test(any)</font>

<br><font size=1 color=#5f5f5f face="sans-serif">Sent by:    

   </font><font size=1 face="sans-serif">mpi3-ft-bounces@lists.mpi-forum.org</font>

<br>

<hr noshade>

<br>

<br>

<br><tt><font size=2>Blocking MPI_ANY_SOURCE receives are covered by the

previous paragraph<br>

in the proposal:<br>

 "Future point-to-point communication with the same process on this<br>

communicator must also return MPI_ERR_PROC_FAILED."<br>

<br>

The paragraph that follows after that is just a clarification for a<br>

nonblocking receive of MPI_ANY_SOURCE. So I do not think that is an<br>

issue.<br>

<br>

I was mostly trying to figure out if my interpretation of the<br>

MPI_Test* functionality was in sync with how others interpreted the<br>

text. I think that my interpretation is correct based on my reading of<br>

the standard.<br>

<br>

I do not know if these semantics are a problem for users as long as<br>

they are aware of them (that the MPI_Test* functions will not return<br>

an error if the nonblocking MPI_ANY_SOURCE receive operation is<br>

'pending'). Since the pending operation can still be matched and<br>

completed at some point in the future without the need to clear the<br>

error (via failure_ack). I suppose that it seems odd that MPI_Test*<br>

would behave this way, and users should be aware. If the program just<br>

polls on MPI_Test for completion and never uses MPI_Wait, then they<br>

will not get notification of failures that may affect their<br>

completion, so they will need to be aware of this situation and find a<br>

way to work around it. Seems icky though ('icky' in the technical<br>

sense, of course).<br>

<br>

UTK can you comment on your interpretation of this?<br>

<br>

-- Josh<br>

<br>

<br>

On Fri, Mar 16, 2012 at 12:06 PM, David Solt <dsolt@us.ibm.com> wrote:<br>

> I believe I agree with everything you wrote.    I also believe

that our<br>

> previous draft stated what happens for a blocking MPI_ANY_SOURCE recv

("In<br>

> all other cases, the operation must return MPI_ERR_PROC_FAILED"),

but during<br>

> our rework of that text we no longer state what happens when a blocking<br>

> MPI_ANY_SOURCE recv sees a failure.<br>

><br>

> I'd like to avoid making many changes between our last reading and

the first<br>

> vote because late changes don't inspire confidence, but I think Josh's

issue<br>

> is valid.<br>

><br>

> Personally I was always in favor of blocking and non-blocking recv

on<br>

> MPI_ANY_SOURCE failing if any process fails.   The recv can complete

as<br>

> failed with the status pointing to the failed process.   The

user can still<br>

> call MPI_Comm_failure_ack to exclude failed ranks from triggering

further<br>

> failures in MPI_ANY_SOURCE.   I don't see the value in MPI_ERR_PENDING.<br>

>  Reposting the recv is not a big deal and we don't care that

much about<br>

> performance in the failure case.   Still, I'd prefer any change

that can<br>

> address Josh's issue with the least change to the proposal.<br>

><br>

> Thanks,<br>

> Dave<br>

><br>

><br>

><br>

> From:        Josh Hursey <jjhursey@open-mpi.org><br>

> To:        "MPI 3.0 Fault Tolerance and Dynamic

Process Control working<br>

> Group" <mpi3-ft@lists.mpi-forum.org><br>

> Date:        03/16/2012 08:54 AM<br>

> Subject:        [Mpi3-ft] Clarification Ticket

323: MPI_ANY_SOURCE and<br>

> MPI_Test(any)<br>

> Sent by:        mpi3-ft-bounces@lists.mpi-forum.org<br>

> ________________________________<br>

><br>

><br>

><br>

> I believe my reasoning is correct below, but thought I would ask the<br>

> group to confirm.<br>

><br>

> Consider the following code snippet:<br>

> ---------------------<br>

> MPI_Irecv(..., MPI_ANY_SOURCE, ..., &req);<br>

> /* Some other process in the communicator fails */<br>

> MPI_Test(&req, &flag, &status);<br>

> ---------------------<br>

><br>

> The proposal in #323 says that the request should be marked as<br>

> MPI_ERR_PENDING and not complete. So what should the value of 'flag'<br>

> and 'status' be when returning from MPI_Test?<br>

><br>

> According to the standard, 'flag = true' indicates two things:<br>

> 1) the operation is completed<br>

> 2) The 'status' object is set<br>

><br>

> For the MPI_ANY_SOURCE case above, the operation is -not- completed,<br>

> so (1) is violated; therefore I think MPI_Test should set 'flag' equal<br>

> to 'false'. However, is the 'status' also not set? Should MPI_Test<br>

> return MPI_SUCCESS or MPI_ERR_PENDING?<br>

><br>

> If MPI_Test is to return MPI_ERR_PENDING directly, then there is no<br>

> needed to inspect 'status'. However if we replace MPI_Test with<br>

> MPI_Testany(1, &req, &index, &flag, &status) then

the operation would<br>

> return MPI_ERR_IN_STATUS, and the user must inspect the 'status' field<br>

> for the true error value. So we would still set 'flag = false', but<br>

> would also need to set the 'status'. That is if we want MPI_Test*<br>

> return an error code that indicates that the request as 'failed, but<br>

> not completed'.<br>

><br>

> According to the standard, if no operation is completed then<br>

> MPI_Testany "returns flag = false, returns a value of MPI_UNDEFINED

in<br>

> index and status is undefined." So according to the MPI_Testany

logic,<br>

> in this case 'flag = false', 'status is undefined', and the operation<br>

> should return MPI_SUCCESS. Is that the expected behavior for the code<br>

> snippet above?<br>

><br>

> I think so, but I thought I would double check with the group.<br>

><br>

> This means that the user can only 'see' the MPI_ERR_PENDING state

of<br>

> the request when they call an MPI_Wait* operation, which might not

be<br>

> what they would normally want to do (because they do not want to<br>

> block).<br>

><br>

> -- Josh<br>

><br>

> --<br>

> Joshua Hursey<br>

> Postdoctoral Research Associate<br>

> Oak Ridge National Laboratory<br>

> </font></tt><a href=http://users.nccs.gov/~jjhursey><tt><font size=2>http://users.nccs.gov/~jjhursey</font></tt></a><tt><font size=2><br>

><br>

> _______________________________________________<br>

> mpi3-ft mailing list<br>

> mpi3-ft@lists.mpi-forum.org<br>

> </font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>

><br>

><br>

><br>

> _______________________________________________<br>

> mpi3-ft mailing list<br>

> mpi3-ft@lists.mpi-forum.org<br>

> </font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>

<br>

<br>

<br>

-- <br>

Joshua Hursey<br>

Postdoctoral Research Associate<br>

Oak Ridge National Laboratory<br>

</font></tt><a href=http://users.nccs.gov/~jjhursey><tt><font size=2>http://users.nccs.gov/~jjhursey</font></tt></a><tt><font size=2><br>

<br>

_______________________________________________<br>

mpi3-ft mailing list<br>

mpi3-ft@lists.mpi-forum.org<br>

</font></tt><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft"><tt><font size=2>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft</font></tt></a><tt><font size=2><br>

<br>

</font></tt>

<br>