[Mpi3-ft] Multiple Communicator Version of MPI_Comm_validate
Josh Hursey
jjhursey at open-mpi.org
Thu Sep 15 08:12:34 CDT 2011
On Wed, Sep 14, 2011 at 4:17 PM, Sur, Sayantan <sayantan.sur at intel.com> wrote:
> Hi Josh,
>
>>
>> Workaround:
>> --------------------
>> Call MPI_Comm_validate over all of the communicators individually.
>> This would involve 'num_comms' collective operations and likely impede
>> scalability.
>>
>> for(i=0; i < num_comms; ++i) {
>> MPI_Comm_validate(comm[i], failed_grps[i]);
>> }
>>
>
> Would it not be possible for the app to create a communicator that is the union of all the processes, and subsequently call validate only on that 'super' communicator? I hope I am not missing something from your example.
In the current spec, no. MPI_Comm_validate only changes the state of
the communicator passed. We probably want to create a new API like
MPI_Comm_validate_many() to host these new semantics.
It is important to remember that the validate operation changes the
communicator (primarily just the 'are_collectives_enabled' flag on the
communicator), and not anything to do with elements of the group that
form it.
Currently after creation, a communicator does not need to track from
which communicators it was created (at least that's the way I
understand it). So creating a super communicator and calling
MPI_Comm_validate_many() on that would require such tracking to have
the validation propagate to all of the communicators that built it. So
we could do it, but the additional state tracking would force
additional memory consumption even if the operation is never used,
which is slightly problematic.
>
> I liked your Option B as such, however, as you point out, it has significant problems in case of applications consisting of several layers of libraries.
Thinking through your question above, I think Option B would require
that we track the heritage of communicators after creation, which
would increase memory consumption. It would also require us to
maintain that linkage across communicator destruction. For example,
-----------
MPI_Comm_dup(MPI_COMM_WORLD, commA);
// MCW is linked to commA
MPI_Comm_dup(commA, commB);
// MCW is linked to commA
// commA is linked to commB
MPI_Comm_dup(commB, commC);
// MCW is linked to commA
// commA is linked to commB
// commB is linked to commC
MPI_Comm_free(commB);
// MCW is linked to commA
// commA is linked to commC (since commB is now gone)
------------
In the discussion so far it seems that the inheritance is only one
way. Meaning that in the example above calling
MPI_Comm_validate_many() on commA would validate commC (and commB if
it is still around), but not MPI_COMM_WORLD. Is that what we are
looking for, or do we want it to be more complete?
The explicit linking in Option C puts the user in more control over
the overhead of tracking connections between communicators, but has
other issues. :/
-- Josh
>
> Sayantan.
>
>>
>> Option A:
>> Array of communicators
>> --------------------
>> MPI_Comm_validate_many(comm[], num_comms, failed_grp)
>> Validate 'num_comms' communicators, and return a failed group.
>> - or -
>> MPI_Comm_validate_many(comm[], num_comms, failed_grps[])
>> Validate 'num_comms' communicators, and return a failed group for each
>> communicator.
>> ----
>>
>> In this version of the operation the user passes in an array of
>> pointers to communicators. Since communicators are not often created
>> in a contiguous array, pointers to communications should probably be
>> used. The failed_grps is an array of failures in each of those
>> communicators.
>>
>> Some questions:
>> * Should all processes pass in the same set of communicators at all
>> processes?
>> * Should all communicators be duplicates or subsets of one another?
>> * Does this operation run the risk of a circular dependency if the
>> user does not pass in the same set of communicators at all
>> participating processes? Is that something the MPI library should
>> protect the application from?
>>
>>
>> Option B:
>> Implicit inherited validation
>> --------------------
>> MPI_Comm_validate_many(comm, failed_grp)
>> ----
>>
>> The idea is to add an additional semantic (or maybe new API) to allows
>> the validation of a communicator to automatically validates all
>> communicators created from it (only dups and subsets of it?).
>>
>> The problem with this is that if an application calls
>> MPI_Comm_validate on MPI_COMM_WORLD, it changes the semantics of
>> communicators that libraries might be using internally without
>> notification in those libraries. So this breaks the abstraction
>> barrier between the two in possibly a dangerous way.
>>
>> Some questions:
>> * Are there some other semantics that we can add to help protect
>> libraries? (e.g., after implicit validation the first use of the
>> communicator will return a special error code indicating that the
>> communicator has been adjusted).
>> * Are there thread safety issues involved with this? (e.g., the
>> library operates in a concurrent thread with its own duplicate of the
>> communicator. The application does not know about or control the
>> concurrent thread but calls MPI_Comm_validate on its own communicator
>> and implicitly changes the semantics of the duplicate communicator.)
>> * It is only through the call to MPI_Comm_validate that we can
>> provide a uniform group of failed processes globally known. For those
>> that were implicitly validated, do we need to provide a way to access
>> this group after the call? Does this have implications on the amount
>> of storage required for this semantic?
>>
>>
>> Option C:
>> Explicit inherited validation
>> --------------------
>> MPI_Comm_validate_link(commA, commB);
>> MPI_Comm_validate_many(commA, failed_grp)
>> /* Implies MPI_Comm_validate(commB, NULL) */
>>
>> MPI_Comm_validate(commA, failed_grp)
>> /* Does not imply MPI_Comm_validate(commB, NULL) */
>> ----
>>
>> In this version the application explicitly links communicators. This
>> prevents an application from implicitly altering derived communicators
>> out of their scope (e.g., in use by other libraries).
>>
>> Some questions:
>> * It is only through the call to MPI_Comm_validate that we can
>> provide a uniform group of failed processes globally known. For those
>> that were implicitly validated, do we need to provide a way to access
>> this group after the call (e.g., for commB)? Does this have
>> implications on the amount of storage required for this semantic?
>> * Do we need a mechanism to 'unlink' communicators? Or determine
>> which communicators are linked?
>> * Can a communicator be linked to multiple other communicators?
>> * Is the linking a unidirectional operation? (so in the example above
>> validating commB does not validate commA unless there is a separate
>> MPI_Comm_validate_link(commB, commA) call)
>>
>>
>> Option D:
>> Other
>> --------------------
>> Something else...
>>
>>
>> Thoughts?
>>
>> -- Josh
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
--
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey
More information about the mpiwg-ft
mailing list