[Mpi-forum] Cancelling a send matched by a matching probe
wgropp at illinois.edu
Wed Nov 4 09:49:51 CST 2015
Yes, the current text is buggy and also trying to address a very subtle point.
The intent of the text you quote about MPI_Wait is to ensure that either a message is cancelled successfully, or it completes, rather than being in some deferred not-yet-cancelled-and-not-yet-completed state. The phrase “irrespective of the activities of the other processes” is the most confusing. It was intended to mean that, once cancel is called on the request, the request will either be successfully cancelled or it will successfully complete without relying on further MPI calls in the user program at the target rank. It was not intended to mean that this might not require processing at the target rank (and in fact some of the cancel text points this out), just that it can’t wait for some subsequent MPI call made by the user to decide whether the cancel was successful. I know, the text isn’t very clear about that, but that was the intent.
The text is also not correct once you include any way to “observe” a message other than posting a receive. Here’s the thinking on this.
Assume that there are only sends and receives. In the case where the cancel failed, the MPI_Wait should succeed because the send is matched. Thus, the receive will complete, and the MPI_Wait will return without the process on the target node needing to execute any *other* MPI calls (this assumes something about progress, of course). Thus, in the MPI sense, the MPI_Wait is local because it doesn’t require any (more) MPI calls at the target.
This breaks once you can observe a message with probe or mprobe (or MPI_T). Then you have an observation without a message match, and then MPI_Wait might not complete until there is a matching receive issued by the user at the target process. Fixing this would require a careful definition of the states and the consequences; in particular, it would probably require that MPI_Wait complete only in the case where the send had been matched by a receive or not observed at all; if observed without a receive, we’d need another state and would require the user to issue the necessary receive commands on it. Getting this right is not easy, which is why removing cancel of send might be the best route.
Director, Parallel Computing Institute
Thomas M. Siebel Chair in Computer Science
Chief Scientist, NCSA
University of Illinois Urbana-Champaign
On Nov 3, 2015, at 3:41 PM, Marek Tomáštík <tomastik.marek at gmail.com> wrote:
> Thank you for the detailed explanation. The described intent makes sense, but I am not sure how this statement follows from the standard:
> 2015-11-02 15:41 GMT+01:00 William Gropp <wgropp at illinois.edu>:
> However, note that if the cancel fails, then the communication is not marked for cancellation, and an MPI_Wait could then wait until the message is received.
> The standard says that "[a] call to MPI_CANCEL marks for cancellation a pending, nonblocking communication operation". It does not say that the cancellation has to suceed, but it does say that MPI_Cancel marks the operation for cancellation -- unconditionally, as far as I can see (unless an error occurs, of course -- but that wasn't what you meant by the cancel failing, or was it?). From this and the statement I cited previously ("[i]f a communication is marked for cancellation, then a MPI_WAIT call for that communication is guaranteed to return, irrespective of the activities of other processes (i.e., MPI_WAIT behaves as a local function)") it follows by modus ponens that MPI_Wait for a request on which MPI_Cancel has been called must behave as a local function. Am I interpreting this incorrectly?
> Once again, thank you for your time.
> Marek Tomáštík
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpi-forum