[Mpi3-ft] Stabilization Updated & MPI_Comm_size question
Darius Buntinas
buntinas at mcs.anl.gov
Mon Sep 20 09:51:15 CDT 2010
How would it be specified?
"To start an MPI application the user should call "mpiexec" or another command, which shall either start the number of MPI processes that the user requested, or will fail with error code XXX."
This would make any runtime that provides the -soft parameter noncompliant. Do we really want to do this?
Alternatively, we could define it something like:
"To start an MPI application the user should cal "mpiexec" or another command, which shall either start the number of MPI processes that the user requested, or another number or MPI processes depending on other parameters specified by the user, or fail with an error code."
But isn't this what we have now? What's wrong with that?
-d
On Sep 20, 2010, at 9:39 AM, Graham, Richard L. wrote:
> I strongly disagree here. This is such a basic bit of data, we clearly need to specify the output.
>
> Rich
>
> ----- Original Message -----
> From: Joshua Hursey [mailto:jjhursey at open-mpi.org]
> Sent: Monday, September 20, 2010 10:09 AM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group <mpi3-ft at lists.mpi-forum.org>
> Subject: Re: [Mpi3-ft] Stabilization Updated & MPI_Comm_size question
>
> This is an interesting discussion (I am glad it is archived).
>
> I feel that Darius makes a strong argument about leaving it undefined in the standard for many of the reasons that have been mentioned. I think an implementation can decide then if one way is more important to their users than another. It is my opinion that defining it does not gain us much since most applications base their initialization on the result of MPI_Comm_size() instead of requiring a specific number of processes. Further, if they do require a specific number, they can interpret the results of MPI_Comm_size() and choose to abort or work around the difference.
>
> -- Josh
>
> On Sep 17, 2010, at 11:00 AM, Graham, Richard L. wrote:
>
>> Why would we select one behaviour over another ? Right now users use the value returned from mpi_comm_size() to determine how to setup their parallel job, so telling them what the actually got seems like the most obvious return value to have, especially if we do not provide a way to get both what they asked for, and what they actually got. Having this be implementation specific does not seem to be prudent, since they would have to know what implementation they are using to figure out what to do with the data. I believe this is very different than an implementation choice to use an eager protocol for large message vs. a rendezvous which does not have correctness implications.
>>
>> Also, at startup the user has not set anything up yet, with respect to MPI, so forming a dense communicator is well defined and the easiest thing to do. Once failure occurs, I will argue exactly the opposite, based on the outcome of such a decisions, and the fact that we will provide them a way within MPI to figure out what has happened.
>>
>> Rich
>>
>> On 9/17/10 10:14 AM, "Darius Buntinas" <buntinas at mcs.anl.gov> wrote:
>>
>>
>>
>> I don't think we need to choose one or the other (in fact I feel strongly that we should not force one behavior or the other). The choice to have MPI_COMM_WORLD contain N or M processes (or failing if it can't get all N) is implementation dependent. Presumably the behavior would be user selectable (e.g., using a -soft option or something similar to mpiexec).
>>
>> The user would use the mechanisms we will provide to deal with any dead processes (e.g., validating a communicator, etc).
>>
>> -d
>>
>> On Sep 17, 2010, at 4:02 PM, Bronevetsky, Greg wrote:
>>
>>> I agree that MPI_Comm_size should return the number of ranks in the communicator regardless of whether they're operational or not. However, this just pushes the question further back: if the user asked for N processes but only M have started up, how many ranks should MPI_COMM_WORLD have? Either choice is going to be self-consistent from MPI's point of view. If its N, then some ranks will be dead. If it is M then the application may not have enough processes to work with. The former (N) case has property that the user doesn't need to add code to check for this condition since their existing error checking code will catch this situation. The latter case (M) is nice because it cheaper to check whether we got fewer processes than expected than to explicitly try to communicate with them.
>>>
>>> As such, I don't see a strong motivation for choosing either. However, we should just pick one and stick with it to avoid unnecessary API divergence.
>>>
>>> Greg Bronevetsky
>>> Lawrence Livermore National Lab
>>> (925) 424-5756
>>> bronevetsky at llnl.gov
>>> http://greg.bronevetsky.com
>>>
>>>> -----Original Message-----
>>>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>>>> bounces at lists.mpi-forum.org] On Behalf Of Bronis R. de Supinski
>>>> Sent: Friday, September 17, 2010 5:23 AM
>>>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>>>> Subject: Re: [Mpi3-ft] Stabilization Updated & MPI_Comm_size question
>>>>
>>>>
>>>> I agree with Rich and Darius.
>>>>
>>>> On Fri, 17 Sep 2010, Darius Buntinas wrote:
>>>>
>>>>>
>>>>> I don't think we should change the standard in this case. For
>>>> MPI_Comm_size to have any useful meaning, it needs to return the size of
>>>> the communicator: i.e., if comm_size returns N, you should be able to do a
>>>> send to processes 0 through N-1. Of course if some of those processes have
>>>> failed, you'll get an error associated with the process failure, but never
>>>> an error for an invalid rank.
>>>>>
>>>>> As discussed in the section about mpiexec, an implementation may decide
>>>> to provide a soft process count argument. So "mpiexec -n 10 -soft 5 ./cpi"
>>>> can start any number between 5 and 10 processes. But that does not affect
>>>> the meaning of the size of MPI_COMM_WORLD: regardless of the number of
>>>> processes the implementation decides to start, MPI_Comm_size will return
>>>> the _actual_ number of processes started.
>>>>>
>>>>> -d
>>>>>
>>>>> On Sep 17, 2010, at 11:22 AM, Graham, Richard L. wrote:
>>>>>
>>>>>> We need to clearly define what N or M is and not leave it to the
>>>> implementation. 100% of the codes that seen over the past 15 years that
>>>> check this value use it to indicate how many process have started. Any
>>>> thing else is really useless, aside from letting the user find out how many
>>>> processes actually started up, and then know how many did not start up.
>>>>>>
>>>>>> Rich
>>>>>>
>>>>>>
>>>>>> On 9/17/10 4:27 AM, "Josh Hursey" <jjhursey at open-mpi.org> wrote:
>>>>>>
>>>>>> So the Run-Through Stabilization proposal has been updated per our
>>>> discussion in the working group meeting at the MPI Forum. The changes are
>>>> summarized below:
>>>>>> - Add a Legacy Library Support example
>>>>>> - Clarify new error classes
>>>>>> - Update the MPI_Init and MPI_Finalize wording to be simpler and more
>>>> direct.
>>>>>> - Fix wording of group creation calls versus communicator creation
>>>> calls.
>>>>>>
>>>>>> https://BLOCKEDBLOCKEDsvn.mpi-forum.org/trac/mpi-forum-
>>>> web/wiki/ft/run_through_stabilization
>>>>>>
>>>>>>
>>>>>> One question that we discussed quite a bit during the meeting was the
>>>> issue of the return value of MPI_Comm_size() when processes fail during
>>>> launch. I attempted to capture the discussion in the room in the Open
>>>> Question attached to the discussion of MPI_Init:
>>>>>> https://BLOCKEDBLOCKEDsvn.mpi-forum.org/trac/mpi-forum-
>>>> web/wiki/ft/run_through_stabilization#MPI_INIT
>>>>>>
>>>>>> Open question:
>>>>>> If the user asks to start N processes on the command line, and only M
>>>> processes were successfully launched (where M < N), then what should be
>>>> returned from MPI_COMM_SIZE?
>>>>>>
>>>>>> The return value must be consistent across all alive members of the
>>>> group. The issue is if it should return N or M.
>>>>>>
>>>>>> The feeling in the room was that since the MPI standard does not define
>>>> the ability for the user to ask for a specific number of processes before
>>>> initthen it is hard to define that this is the number it should be.
>>>>>>
>>>>>> So it is left to the implementation whether it is M or N. If it is M,
>>>> then the user has other techniques to find out what it originally asked for
>>>> (e.g., by passing it as a command line argument to the application itself).
>>>>>>
>>>>>>
>>>>>> What do people think about the MPI_Comm_size issue?
>>>>>>
>>>>>> -- Josh
>>>>>>
>>>>>> ------------------------------------
>>>>>> Joshua Hursey
>>>>>> Postdoctoral Research Associate
>>>>>> Oak Ridge National Laboratory
>>>>>> http://BLOCKEDBLOCKEDwww.BLOCKEDBLOCKEDcs.indiana.edu/~jjhursey
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpi3-ft mailing list
>>>>>> mpi3-ft at lists.mpi-forum.org
>>>>>> http://BLOCKEDBLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> mpi3-ft mailing list
>>>>>> mpi3-ft at lists.mpi-forum.org
>>>>>> http://BLOCKEDBLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> mpi3-ft mailing list
>>>>> mpi3-ft at lists.mpi-forum.org
>>>>> http://BLOCKEDBLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>>>
>>>>>
>>>> _______________________________________________
>>>> mpi3-ft mailing list
>>>> mpi3-ft at lists.mpi-forum.org
>>>> http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>>
>>>
>>> _______________________________________________
>>> mpi3-ft mailing list
>>> mpi3-ft at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>
> ------------------------------------
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://www.cs.indiana.edu/~jjhursey
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list