[Mpi3-ft] WG call on tuesday aug. 9, 3pm est

Jim Dinan james.dinan at gmail.com
Wed Jul 17 12:15:55 CDT 2013


Hi George,

Responses below:

> The group query routine that I was thinking of is a local query, which
> would return information on which processes I know to have failed.
>
>
> Any particular reason the current get_acked() is not satisfying?
>

Eek, ok, I had completely forgotten about this function.  This is what I
was asking for, hopefully my previous message now makes more sense.  :)

> I am probably going to regret asking this, but is it possible to include
> an MPI_Comm_resume() function that re-activates a revoked communicator with
> holes in it?
>
>
> For many reasons related to the complexities of distributed systems (lack
> of synchronization in the error detection, divergent view of the entire
> system from each process) this operation must have a consensus meaning.
> Thus in terms of cost it is similar to MPI_Comm_shrink (except the
> reordering of the processes). It might provide some limited benefit, for
> people that want to use such type of scenario. Now, if by doing the
> re-enable you expect that the communicator will behave as a freshly new
> communicator and everything MPI-related, file, one-sided, collective will
> just work on this communicator with holes … then we're talking about
> something so complex that I would not even dare considering for inclusion
> in the standard.
>

What if MPI_COMM_RESUME duplicated the revoked communicator and left the
holes in place?  This would preserve the revoked semantic, but allow the
user to continue without renumbering the processes.

 ~Jim.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20130717/7ccbc349/attachment-0001.html>


More information about the mpiwg-ft mailing list