[Mpi3-ft] Nonblocking Process Creation and Management
Josh Hursey
jjhursey at open-mpi.org
Tue May 4 17:05:39 CDT 2010
I have updated the proposal on the wiki to reflect some of the notes from our meeting today at the forum.
One thing that was discussed but is not reflected in the proposal, was if any of the nonblocking interfaces were extraneous. The group seemed to think that each of the nonblocking calls identified in the proposal were useful to add to the standard. David, this was one of your concerns. If you wanted to touch base before the next call I can discuss some of the reasoning presented in the meeting (we could also briefly discuss it in the next teleconf too).
Thanks for all the feedback so far, keep it coming :)
-- Josh
On Apr 29, 2010, at 2:29 PM, Josh Hursey wrote:
>
> On Apr 29, 2010, at 11:30 AM, Solt, David George wrote:
>
>> Josh,
>>
>> I won't respond to everything right now, but I'll write or talk to you more about it later. Your point about all of these collectives being
>> Barrier-style collectives is a very good one. I'll need to think through that more. One quick point:
>>
>>> However I might be misunderstanding the problem that you are trying to highlight in your example. If so, can you elaborate a bit more?
>>
>> The example is trying to understand if one rank calls a collective and cancels it, are other ranks still required to call the collective. In my example, only rank n-1 called MPI_Icomm_accept and expects that the MPI_Wait to not hang even though no other ranks have made matching MPI_Icomm_accept calls. I think the two arguments are: 1) it is illegal because all ranks must call a collective. 2) it is legal because the rank that called the collective cancelled it, so it is as if it never was called.
>
> Ah I understand now. I believe that this is just an incorrect program, so the collective must be called by all processes in the communicator. Therefore this program is illegal since only one process joined the collective. Even though that process turned around and canceled the operation, the semantics of the collective operation were never satisfied so Wait should never return.
>
> Do others think that is a fair classification of this scenario? Should we explicitly call this out in the standard text?
>
> Thanks,
> Josh
>
>>
>> Thanks,
>> Dave
>>
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>> Sent: Thursday, April 29, 2010 10:18 AM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management
>>
>> Thanks for the feedback, more below.
>>
>> On Apr 27, 2010, at 11:53 PM, Solt, David George wrote:
>>
>>> This document says:
>>>
>>> "This call starts a nonblocking variant of MPI_COMM_SPAWN. It is
>>> erroneous
>>> to call MPI_REQUEST_FREE or MPI_CANCEL for the MPI_REQUEST
>>> associated with
>>> the MPI_ICOMM_SPAWN operation.
>>>
>>> If a MPI_REQUEST for MPI_ICOMM_SPAWN or MPI_ICOMM_SPAWN_MULTIPLE
>>> is marked
>>> for cancellation using MPI_CANCEL, then it must be the case that
>>> either the
>>> operation completed ...."
>>>
>>> Maybe the first paragraph was meant for deletion?
>>
>> Thanks for catching that. Fixed now.
>>
>>> It conflicts with the 2nd one.
>>>
>>> "It is a valid behavior for the MPI_COMM_ACCEPT call to
>>> timeout with accepting connections, and should not be considered
>>> an error."
>>>
>>> I think 'with' should be 'without' in the above text.
>>
>> Yes it should be 'without'. Good catch. Thanks
>>
>>>
>>> I haven't been focused on any discussions about cancel, so apologies
>>> if my concerns have already been discussed. We have only
>>> implemented non-blocking MPI_Icomm_accept for a singleton accepting
>>> from another singleton so far since that's what we needed. As I try
>>> to expand this to the general case I'm getting concerned about the
>>> difficulty of canceling any of these non-blocking collectives.
>>
>> Yeah. It is going to be a point of concern, if not only for the fact
>> that we are expanding the use of MPI_Cancel to include these
>> collective operations and not others (before cancel was only defined
>> for point-to-point communication).
>>
>>>
>>> I doubt that we can rely on non-blocking collectives to implement
>>> MPI_Icomm_join, MPI_Icomm_accept, etc. since the non-blocking
>>> collective group has decided they can't be cancelled (at least not
>>> before the point where the operation becomes un-cancellable). If
>>> implementers internally allow cancelling collectives then why not
>>> expose cancel of NB collectives to the users? Has this group talked
>>> to the NB collectives group about why cancel was not allowed for NB
>>> collectives. I am aware that the need to cancel an outstanding
>>> MPI_Comm_accept was a motivator for this whole strategy. We may
>>> want some restriction, such as MPI_Cancel can only be done at the
>>> root of an MPI_Icomm_accept or connect.
>>
>> Applying MPI_Cancel more broadly to the entire class of NB collectives
>> becomes difficult since many of the collectives have a 'leave-early'
>> type of semantic which makes it unclear what cancel could mean if some
>> of the processes have already left the collective (maybe it just
>> becomes non-cancelable at that point). I agree that we should talk
>> with the collectives group about this a bit more. I hope we will get a
>> chance to next week at the forum.
>>
>> The one interesting aspect of the collective operations in the
>> dynamics chapter is that they are all communicator creation
>> collectives. So they have an implicit barrier at the bottom to either
>> commit/create the new communicator or not as decided by the explicit
>> or implicit 'root' of the collective. So the root can decided when/if
>> the communicator creation call should be canceled while creating it.
>>
>> So we discussed whether or not only the root should be allowed to call
>> cancel. After discussion in the last meeting and in the teleconfs, we
>> felt that it was relatively easy for the MPI implementation to provide
>> the MPI_Cancel from any participating process. A non-root process
>> participating in the MPI_Connect (for example) could call MPI_Cancel
>> which will relay a message to the root of the collective indicating
>> that rank X requested that the operation be canceled. The root then
>> decides whether or not to cancel the operation, and sends out the
>> decision to all participating processes (which are waiting for a
>> decision to determine if they have created a communicator or not).
>>
>> So that was the line of reasoning that brought us to the current
>> MPI_Cancel semantics for the collective operations in the dynamics
>> chapter. I don't know if this line of reasoning will apply more
>> broadly to the collectives chapter for NB collectives.
>>
>>>
>>> Is the following code legal MPI code?
>>> {
>>> MPI_Init(..);
>>> ....
>>> If (rank == size-1) {
>>> MPI_Icomm_accept(......, 0, comm, &newcomm, &req);
>>> MPI_Cancel(&req);
>>> MPI_Wait(&req);
>>> }
>>> MPI_Finalize();
>>> }
>>>
>>
>> Yes, but with one modification (add MPI_Test_cancelled):
>> {
>> MPI_Init(..);
>> ....
>> If (rank == size-1) {
>> MPI_Icomm_accept(......, 0, comm, &newcomm, &req);
>> MPI_Cancel(&req); // Local operation (relayed to root as needed)
>> MPI_Wait(&req, &status);
>> MPI_Test_cancelled(status, &flag)
>> if( flag ) { // Successfully Canceled }
>> else { // Cancel failed, Accept completed the
>> connection }
>> }
>> MPI_Finalize();
>> }
>>
>>> If it is legal, I think cancel is going to add a huge amount of
>>> overhead. If it is not legal, then things become easier, but I
>>> think the additional overhead of ensuring consensus between ranks
>>> will outweigh any performance gains we think these routines will
>>> provide.
>>
>> We already need to ensure consensus between the ranks in the 'comm'
>> communicator in order to commit/create the communicator in the first
>> place (if not only for the cid allocation). So this is already a part
>> of the blocking versions of these calls. I don't see how adding cancel
>> will add any more overhead since at the bottom of the operation we are
>> just either sending 'success' or 'cancelled' (or 'failed') to all
>> participating processes.
>>
>> However I might be misunderstanding the problem that you are trying to
>> highlight in your example. If so, can you elaborate a bit more?
>>
>>> I'm against the "non-blocking version of everything" approach. I
>>> understand the desire for orthogonality, but this is MPI Standard
>>> bloating to me. I pretty much bound by my funding management to
>>> vote for any proposal that includes MPI_Icomm_accept or any other
>>> functionality we need. However, I really don't like it. Does
>>> anyone want MPI_Iunpublish_name? What % of applications call
>>> MPI_Unpublish_name? What % of their execution time is spent in
>>> MPI_Unpublish_name? I think we should look at each call and
>>> consider each individually rather than just make non-blocking
>>> versions of all of them.
>>
>> I agree that we should consider each of the functions to determine if
>> they have merit to become nonblocking. If we can think of a reasonable
>> use case for it then I am inclined to add it to the standard. Though I
>> do not agree with the argument that just because most or under a
>> certain percentage of applications do not currently use a function/
>> feature of the standard that it should be excluded from consideration.
>> Take for example MPI_Comm_accept(), many applications that may have
>> wanted to use it chose not to because it did not have a non-blocking
>> version (leading to non-standard nonblocking implementations of it).
>> It is impossible for us to quantify the number of users that do not
>> use a function because it is perceived as deficient. What we can do is
>> if we can convince ourselves that there is a real use case for the
>> function, then we should be open to including it. If we cannot then we
>> should leave it out of the standard for now.
>>
>> I think that is coming a long way around to overall agreeing with you
>> that we should consider the usefulness of these functions before
>> pressing them into the standard. In the first draft of this proposal I
>> created a nonblocking version of all the functions that I could
>> convince myself of use cases for (particularly those functions that
>> rely on a external resource for completion). But I am open to reducing
>> the proposal if folks feel that some of the nonblocking functions are
>> unnecessary. Slightly unrelated, but I am however not open to the idea
>> of pushing for deprecation of any functions as that should be
>> considered in a completely separate proposal.
>>
>> As a side note, the reason for the nonblocking 'publsh' and
>> 'unpublish' is because these operations may (often) rely on an
>> external name server. The name server may be slow, so having the
>> nonblocking versions limits the impact of the slow server on the
>> application. The 'timeout' keyword has a similar purpose, but
>> primarily intended for the blocking versions. So the question is can
>> we convince ourselves that there is a use case for an application to
>> prefer the nonblocking version over a blocking+timeout version.
>>
>> Thanks for the feedback :)
>>
>> -- Josh
>>
>>>
>>> Dave S.
>>>
>>> -----Original Message-----
>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org
>>> ] On Behalf Of Josh Hursey
>>> Sent: Wednesday, April 21, 2010 9:44 AM
>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>> Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management
>>>
>>> I updated the Nonblocking Process Creation and Management proposal on
>>> the wiki:
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Async-proc-mgmt
>>>
>>> The new version reflects conversations over the past couple months
>>> about the role of MPI_Cancel in the various nonblocking interfaces,
>>> and some touchups on the timeout language.
>>>
>>> I think the proposal to be pretty stable at the moment. If you have
>>> any issues with the current proposal let me know either on the list or
>>> the teleconf.
>>>
>>> Thanks,
>>> Josh
>>>
>>> On Jan 12, 2010, at 5:03 PM, Josh Hursey wrote:
>>>
>>>> I extended and cleaned up the Nonblocking Process Creation and
>>>> Management proposal on the wiki:
>>>> https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Async-proc-mgmt
>>>>
>>>> I added the rest of the nonblocking interface proposals, and touched
>>>> up some of the language. I do not have an implementation yet, but
>>>> will work on that next. There are a few items that I need to refine
>>>> a bit still (e.g., MPI_Cancel, mixing of blocking and nonblocking),
>>>> but this should give us a foundation to start from.
>>>>
>>>> I would like to talk about this next week during our working group
>>>> slot at the MPI Forum meeting.
>>>>
>>>> Let me know what you think, and if you see any problems.
>>>>
>>>> Thanks,
>>>> Josh
>>>>
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list