[Mpi3-ft] Multiple Communicator Version of MPI_Comm_validate

Josh Hursey jjhursey at open-mpi.org
Thu Sep 15 12:59:48 CDT 2011


On Thu, Sep 15, 2011 at 12:43 PM, Sur, Sayantan <sayantan.sur at intel.com> wrote:
>
>
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>> Sent: Thursday, September 15, 2011 6:13 AM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] Multiple Communicator Version of
>> MPI_Comm_validate
>>
>> On Wed, Sep 14, 2011 at 4:17 PM, Sur, Sayantan <sayantan.sur at intel.com>
>> wrote:
>> > Hi Josh,
>> >
>> >>
>> >> Workaround:
>> >> --------------------
>> >> Call MPI_Comm_validate over all of the communicators individually.
>> >> This would involve 'num_comms' collective operations and likely
>> impede
>> >> scalability.
>> >>
>> >> for(i=0; i < num_comms; ++i) {
>> >>   MPI_Comm_validate(comm[i], failed_grps[i]);
>> >> }
>> >>
>> >
>> > Would it not be possible for the app to create a communicator that is
>> the union of all the processes, and subsequently call validate only on
>> that 'super' communicator? I hope I am not missing something from your
>> example.
>>
>> In the current spec, no. MPI_Comm_validate only changes the state of
>> the communicator passed. We probably want to create a new API like
>> MPI_Comm_validate_many() to host these new semantics.
>
>
> I agree with your comments. I agree that calling 'validate' on communicators somehow not owned (through heritage, etc.) is tricky business as it changes the semantics.
>
> I guess my question is that in your example comm[0..num_comms-1] are 'owned' by the layer of application code that is calling validate? There are two cases here:

Yes, in that example I am assuming that the application or library
layer calling the loop 'owns' all of the communicators being validated
in the comm[] array.

>
> i) how to optimize validation for a set of possibly overlapping communicators (all owned by one layer of application code [library])
> ii) how to optimize validation for a set of communicators (possibly heritage linked by dup) across layers of libraries ... thus eliminating the requirement of each library to validate its own communicator
>
> Which of these cases does MPI_Comm_validate_many() aim to optimize?

I think we are looking at (i). We need to be able to protect the
libraries from actions outside of their communication context to the
greatest degree possible. So I do not think that the application
should be able to validate the communicators of the library (unless
the library explicitly allows this), and visa versa.


Sorry I misunderstood the first time around.

-- Josh


>
>
> Thanks.
>
>>
>> It is important to remember that the validate operation changes the
>> communicator (primarily just the 'are_collectives_enabled' flag on the
>> communicator), and not anything to do with elements of the group that
>> form it.
>>
>> Currently after creation, a communicator does not need to track from
>> which communicators it was created (at least that's the way I
>> understand it). So creating a super communicator and calling
>> MPI_Comm_validate_many() on that would require such tracking to have
>> the validation propagate to all of the communicators that built it. So
>> we could do it, but the additional state tracking would force
>> additional memory consumption even if the operation is never used,
>> which is slightly problematic.
>>
>>
>> >
>> > I liked your Option B as such, however, as you point out, it has
>> significant problems in case of applications consisting of several
>> layers of libraries.
>>
>>
>> Thinking through your question above, I think Option B would require
>> that we track the heritage of communicators after creation, which
>> would increase memory consumption. It would also require us to
>> maintain that linkage across communicator destruction. For example,
>> -----------
>> MPI_Comm_dup(MPI_COMM_WORLD, commA);
>> // MCW   is linked to commA
>> MPI_Comm_dup(commA, commB);
>> // MCW   is linked to commA
>> // commA is linked to commB
>> MPI_Comm_dup(commB, commC);
>> // MCW   is linked to commA
>> // commA is linked to commB
>> // commB is linked to commC
>> MPI_Comm_free(commB);
>> // MCW   is linked to commA
>> // commA is linked to commC (since commB is now gone)
>> ------------
>>
>> In the discussion so far it seems that the inheritance is only one
>> way. Meaning that in the example above calling
>> MPI_Comm_validate_many() on commA would validate commC (and commB if
>> it is still around), but not MPI_COMM_WORLD. Is that what we are
>> looking for, or do we want it to be more complete?
>>
>>
>> The explicit linking in Option C puts the user in more control over
>> the overhead of tracking connections between communicators, but has
>> other issues. :/
>>
>> -- Josh
>>
>>
>> >
>> > Sayantan.
>> >
>> >>
>> >> Option A:
>> >> Array of communicators
>> >> --------------------
>> >> MPI_Comm_validate_many(comm[], num_comms, failed_grp)
>> >> Validate 'num_comms' communicators, and return a failed group.
>> >>   - or -
>> >> MPI_Comm_validate_many(comm[], num_comms, failed_grps[])
>> >> Validate 'num_comms' communicators, and return a failed group for
>> each
>> >> communicator.
>> >> ----
>> >>
>> >> In this version of the operation the user passes in an array of
>> >> pointers to communicators. Since communicators are not often created
>> >> in a contiguous array, pointers to communications should probably be
>> >> used. The failed_grps is an array of failures in each of those
>> >> communicators.
>> >>
>> >> Some questions:
>> >>  * Should all processes pass in the same set of communicators at all
>> >> processes?
>> >>  * Should all communicators be duplicates or subsets of one another?
>> >>  * Does this operation run the risk of a circular dependency if the
>> >> user does not pass in the same set of communicators at all
>> >> participating processes? Is that something the MPI library should
>> >> protect the application from?
>> >>
>> >>
>> >> Option B:
>> >> Implicit inherited validation
>> >> --------------------
>> >> MPI_Comm_validate_many(comm, failed_grp)
>> >> ----
>> >>
>> >> The idea is to add an additional semantic (or maybe new API) to
>> allows
>> >> the validation of a communicator to automatically validates all
>> >> communicators created from it (only dups and subsets of it?).
>> >>
>> >> The problem with this is that if an application calls
>> >> MPI_Comm_validate on MPI_COMM_WORLD, it changes the semantics of
>> >> communicators that libraries might be using internally without
>> >> notification in those libraries. So this breaks the abstraction
>> >> barrier between the two in possibly a dangerous way.
>> >>
>> >> Some questions:
>> >>  * Are there some other semantics that we can add to help protect
>> >> libraries? (e.g., after implicit validation the first use of the
>> >> communicator will return a special error code indicating that the
>> >> communicator has been adjusted).
>> >>  * Are there thread safety issues involved with this? (e.g., the
>> >> library operates in a concurrent thread with its own duplicate of
>> the
>> >> communicator. The application does not know about or control the
>> >> concurrent thread but calls MPI_Comm_validate on its own
>> communicator
>> >> and implicitly changes the semantics of the duplicate communicator.)
>> >>  * It is only through the call to MPI_Comm_validate that we can
>> >> provide a uniform group of failed processes globally known. For
>> those
>> >> that were implicitly validated, do we need to provide a way to
>> access
>> >> this group after the call? Does this have implications on the amount
>> >> of storage required for this semantic?
>> >>
>> >>
>> >> Option C:
>> >> Explicit inherited validation
>> >> --------------------
>> >> MPI_Comm_validate_link(commA, commB);
>> >> MPI_Comm_validate_many(commA, failed_grp)
>> >> /* Implies MPI_Comm_validate(commB, NULL) */
>> >>
>> >> MPI_Comm_validate(commA, failed_grp)
>> >> /* Does not imply MPI_Comm_validate(commB, NULL) */
>> >> ----
>> >>
>> >> In this version the application explicitly links communicators. This
>> >> prevents an application from implicitly altering derived
>> communicators
>> >> out of their scope (e.g., in use by other libraries).
>> >>
>> >> Some questions:
>> >>  * It is only through the call to MPI_Comm_validate that we can
>> >> provide a uniform group of failed processes globally known. For
>> those
>> >> that were implicitly validated, do we need to provide a way to
>> access
>> >> this group after the call (e.g., for commB)? Does this have
>> >> implications on the amount of storage required for this semantic?
>> >>  * Do we need a mechanism to 'unlink' communicators? Or determine
>> >> which communicators are linked?
>> >>  * Can a communicator be linked to multiple other communicators?
>> >>  * Is the linking a unidirectional operation? (so in the example
>> above
>> >> validating commB does not validate commA unless there is a separate
>> >> MPI_Comm_validate_link(commB, commA) call)
>> >>
>> >>
>> >> Option D:
>> >> Other
>> >> --------------------
>> >> Something else...
>> >>
>> >>
>> >> Thoughts?
>> >>
>> >> -- Josh
>> >>
>> >>
>> >> --
>> >> Joshua Hursey
>> >> Postdoctoral Research Associate
>> >> Oak Ridge National Laboratory
>> >> http://users.nccs.gov/~jjhursey
>> >> _______________________________________________
>> >> mpi3-ft mailing list
>> >> mpi3-ft at lists.mpi-forum.org
>> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> >
>> > _______________________________________________
>> > mpi3-ft mailing list
>> > mpi3-ft at lists.mpi-forum.org
>> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> >
>> >
>>
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




More information about the mpiwg-ft mailing list