[Mpi3-ft] MPI_Recv + MPI_Comm_failure_ack
David Solt
dsolt at us.ibm.com
Fri Mar 15 12:32:03 CDT 2013
Based on the proposal:
MPI_Comm_failure_ack(blah, blah)
This local operation gives the users a way to acknowledge all locally
noticed failures on
comm. After the call, unmatched MPI_ANY_SOURCE receptions that would have
raised an
error code MPI_ERR_PENDING due to process failure (see Section 17.2.2)
proceed without
further reporting of errors due to those acknowledged failures.
I think this clearly indicates that MPI_Recv is uninfluenced by calls to
MPI_Comm_failure_ack. Therefore, there is no way to call
MPI_Recv(MPI_ANY_SOURCE) and ignore failures reported by
MPI_Comm_failure_ack.
I believe the following code will NOT work (i.e. after the first failure,
the MPI_Recv will continuously fail):
MPI_Comm_size(intercomm, &size);
while (failures < size) {
err = MPI_Recv(blah, blah, MPI_ANY_SOURCE, intercomm, &status);
if (err == MPI_PROC_FAILED) {
MPI_Comm_failure_ack(intercomm);
MPI_Comm_failure_get_acked(intercomm, &group);
MPI_Group_size(group, &failures);
} else {
/* process received data */
}
}
and has to be written as:
MPI_Comm_size(intercomm, &size);
while (failures < size) {
if (request == MPI_REQUEST_NULL) {
err = MPI_Irecv(blah, blah, MPI_ANY_SOURCE, intercomm,
&request);
}
err = MPI_Wait(&request, &status);
if (err == MPI_ERR_PENDING) {
MPI_Comm_failure_ack(intercomm);
MPI_Comm_failure_get_acked(intercomm, &group);
MPI_Group_size(group, &failures);
} else {
/* process received data */
}
}
Am I correct in my thinking?
If so, was there a reason why MPI_Recv could not also "obey"
MPI_Comm_failure_ack calls?
Thanks,
Dave
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130315/056d3963/attachment.html>
More information about the mpiwg-ft
mailing list