[Mpi3-ft] Multiple Communicator Version of MPI_Comm_validate

Bronis R. de Supinski bronis at llnl.gov
Thu Sep 15 17:08:19 CDT 2011


>From a scalability standpoint and from a MPI implementation
memory usage standpoint, you want validating any communicator
that includes an dead endpoint to eliminate ever having to
use collective communication to validate other communicators
that include that endpoint. If you can achieve that by the
current interface then you do not need to add anything (the
validate becomes a local call so no big deal, it just ensures
the code that uses that communicator has current information).

Otherwise, you want validating a communicator that includes
that endpoint to validate all other communicators that use
that endpoint, regardless of how the communicators have been
derived. You do not want a inherited interface, either explicit
or implicit (explicit is too hard to use and does not solve the
real problem of excessive cost being forced on the user by
the interface; similarly implicit but only inherited does not
solve the problem). The key is to provide some mechanism to
alert the user that the communicator has been backdoor validated.



On Thu, 15 Sep 2011, Josh Hursey wrote:

> On Wed, Sep 14, 2011 at 4:17 PM, Sur, Sayantan <sayantan.sur at intel.com> wrote:
>> Hi Josh,
>>
>>>
>>> Workaround:
>>> --------------------
>>> Call MPI_Comm_validate over all of the communicators individually.
>>> This would involve 'num_comms' collective operations and likely impede
>>> scalability.
>>>
>>> for(i=0; i < num_comms; ++i) {
>>>   MPI_Comm_validate(comm[i], failed_grps[i]);
>>> }
>>>
>>
>> Would it not be possible for the app to create a communicator that is the union of all the processes, and subsequently call validate only on that 'super' communicator? I hope I am not missing something from your example.
>
> In the current spec, no. MPI_Comm_validate only changes the state of
> the communicator passed. We probably want to create a new API like
> MPI_Comm_validate_many() to host these new semantics.
>
> It is important to remember that the validate operation changes the
> communicator (primarily just the 'are_collectives_enabled' flag on the
> communicator), and not anything to do with elements of the group that
> form it.
>
> Currently after creation, a communicator does not need to track from
> which communicators it was created (at least that's the way I
> understand it). So creating a super communicator and calling
> MPI_Comm_validate_many() on that would require such tracking to have
> the validation propagate to all of the communicators that built it. So
> we could do it, but the additional state tracking would force
> additional memory consumption even if the operation is never used,
> which is slightly problematic.
>
>
>>
>> I liked your Option B as such, however, as you point out, it has significant problems in case of applications consisting of several layers of libraries.
>
>
> Thinking through your question above, I think Option B would require
> that we track the heritage of communicators after creation, which
> would increase memory consumption. It would also require us to
> maintain that linkage across communicator destruction. For example,
> -----------
> MPI_Comm_dup(MPI_COMM_WORLD, commA);
> // MCW   is linked to commA
> MPI_Comm_dup(commA, commB);
> // MCW   is linked to commA
> // commA is linked to commB
> MPI_Comm_dup(commB, commC);
> // MCW   is linked to commA
> // commA is linked to commB
> // commB is linked to commC
> MPI_Comm_free(commB);
> // MCW   is linked to commA
> // commA is linked to commC (since commB is now gone)
> ------------
>
> In the discussion so far it seems that the inheritance is only one
> way. Meaning that in the example above calling
> MPI_Comm_validate_many() on commA would validate commC (and commB if
> it is still around), but not MPI_COMM_WORLD. Is that what we are
> looking for, or do we want it to be more complete?
>
>
> The explicit linking in Option C puts the user in more control over
> the overhead of tracking connections between communicators, but has
> other issues. :/
>
> -- Josh
>
>
>>
>> Sayantan.
>>
>>>
>>> Option A:
>>> Array of communicators
>>> --------------------
>>> MPI_Comm_validate_many(comm[], num_comms, failed_grp)
>>> Validate 'num_comms' communicators, and return a failed group.
>>>   - or -
>>> MPI_Comm_validate_many(comm[], num_comms, failed_grps[])
>>> Validate 'num_comms' communicators, and return a failed group for each
>>> communicator.
>>> ----
>>>
>>> In this version of the operation the user passes in an array of
>>> pointers to communicators. Since communicators are not often created
>>> in a contiguous array, pointers to communications should probably be
>>> used. The failed_grps is an array of failures in each of those
>>> communicators.
>>>
>>> Some questions:
>>>  * Should all processes pass in the same set of communicators at all
>>> processes?
>>>  * Should all communicators be duplicates or subsets of one another?
>>>  * Does this operation run the risk of a circular dependency if the
>>> user does not pass in the same set of communicators at all
>>> participating processes? Is that something the MPI library should
>>> protect the application from?
>>>
>>>
>>> Option B:
>>> Implicit inherited validation
>>> --------------------
>>> MPI_Comm_validate_many(comm, failed_grp)
>>> ----
>>>
>>> The idea is to add an additional semantic (or maybe new API) to allows
>>> the validation of a communicator to automatically validates all
>>> communicators created from it (only dups and subsets of it?).
>>>
>>> The problem with this is that if an application calls
>>> MPI_Comm_validate on MPI_COMM_WORLD, it changes the semantics of
>>> communicators that libraries might be using internally without
>>> notification in those libraries. So this breaks the abstraction
>>> barrier between the two in possibly a dangerous way.
>>>
>>> Some questions:
>>>  * Are there some other semantics that we can add to help protect
>>> libraries? (e.g., after implicit validation the first use of the
>>> communicator will return a special error code indicating that the
>>> communicator has been adjusted).
>>>  * Are there thread safety issues involved with this? (e.g., the
>>> library operates in a concurrent thread with its own duplicate of the
>>> communicator. The application does not know about or control the
>>> concurrent thread but calls MPI_Comm_validate on its own communicator
>>> and implicitly changes the semantics of the duplicate communicator.)
>>>  * It is only through the call to MPI_Comm_validate that we can
>>> provide a uniform group of failed processes globally known. For those
>>> that were implicitly validated, do we need to provide a way to access
>>> this group after the call? Does this have implications on the amount
>>> of storage required for this semantic?
>>>
>>>
>>> Option C:
>>> Explicit inherited validation
>>> --------------------
>>> MPI_Comm_validate_link(commA, commB);
>>> MPI_Comm_validate_many(commA, failed_grp)
>>> /* Implies MPI_Comm_validate(commB, NULL) */
>>>
>>> MPI_Comm_validate(commA, failed_grp)
>>> /* Does not imply MPI_Comm_validate(commB, NULL) */
>>> ----
>>>
>>> In this version the application explicitly links communicators. This
>>> prevents an application from implicitly altering derived communicators
>>> out of their scope (e.g., in use by other libraries).
>>>
>>> Some questions:
>>>  * It is only through the call to MPI_Comm_validate that we can
>>> provide a uniform group of failed processes globally known. For those
>>> that were implicitly validated, do we need to provide a way to access
>>> this group after the call (e.g., for commB)? Does this have
>>> implications on the amount of storage required for this semantic?
>>>  * Do we need a mechanism to 'unlink' communicators? Or determine
>>> which communicators are linked?
>>>  * Can a communicator be linked to multiple other communicators?
>>>  * Is the linking a unidirectional operation? (so in the example above
>>> validating commB does not validate commA unless there is a separate
>>> MPI_Comm_validate_link(commB, commA) call)
>>>
>>>
>>> Option D:
>>> Other
>>> --------------------
>>> Something else...
>>>
>>>
>>> Thoughts?
>>>
>>> -- Josh
>>>
>>>
>>> --
>>> Joshua Hursey
>>> Postdoctoral Research Associate
>>> Oak Ridge National Laboratory
>>> http://users.nccs.gov/~jjhursey
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>
>
>
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>


More information about the mpiwg-ft mailing list