[Mpi3-ft] WG call on tuesday aug. 9, 3pm est
james.dinan at gmail.com
Wed Jul 17 12:15:55 CDT 2013
> The group query routine that I was thinking of is a local query, which
> would return information on which processes I know to have failed.
> Any particular reason the current get_acked() is not satisfying?
Eek, ok, I had completely forgotten about this function. This is what I
was asking for, hopefully my previous message now makes more sense. :)
> I am probably going to regret asking this, but is it possible to include
> an MPI_Comm_resume() function that re-activates a revoked communicator with
> holes in it?
> For many reasons related to the complexities of distributed systems (lack
> of synchronization in the error detection, divergent view of the entire
> system from each process) this operation must have a consensus meaning.
> Thus in terms of cost it is similar to MPI_Comm_shrink (except the
> reordering of the processes). It might provide some limited benefit, for
> people that want to use such type of scenario. Now, if by doing the
> re-enable you expect that the communicator will behave as a freshly new
> communicator and everything MPI-related, file, one-sided, collective will
> just work on this communicator with holes … then we're talking about
> something so complex that I would not even dare considering for inclusion
> in the standard.
What if MPI_COMM_RESUME duplicated the revoked communicator and left the
holes in place? This would preserve the revoked semantic, but allow the
user to continue without renumbering the processes.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft