[Mpi3-ft] Stabilization Updated & MPI_Comm_size question

Bronis R. de Supinski bronis at llnl.gov
Fri Sep 17 07:22:59 CDT 2010


I agree with Rich and Darius.

On Fri, 17 Sep 2010, Darius Buntinas wrote:

>
> I don't think we should change the standard in this case.  For MPI_Comm_size to have any useful meaning, it needs to return the size of the communicator: i.e., if comm_size returns N, you should be able to do a send to processes 0 through N-1.  Of course if some of those processes have failed, you'll get an error associated with the process failure, but never an error for an invalid rank.
>
> As discussed in the section about mpiexec, an implementation may decide to provide a soft process count argument.  So "mpiexec -n 10 -soft 5 ./cpi" can start any number between 5 and 10 processes.  But that does not affect the meaning of the size of MPI_COMM_WORLD: regardless of the number of processes the implementation decides to start, MPI_Comm_size will return the _actual_ number of processes started.
>
> -d
>
> On Sep 17, 2010, at 11:22 AM, Graham, Richard L. wrote:
>
>> We need to clearly define what N or M is and not leave it to the implementation.  100% of the codes that seen over the past 15 years that check this value use it to indicate how many process have started.  Any thing else is really useless, aside from letting the user find out how many processes actually started up, and then know how many did not start up.
>>
>> Rich
>>
>>
>> On 9/17/10 4:27 AM, "Josh Hursey" <jjhursey at open-mpi.org> wrote:
>>
>> So the Run-Through Stabilization proposal has been updated per our discussion in the working group meeting at the MPI Forum. The changes are summarized below:
>> - Add a Legacy Library Support example
>> - Clarify new error classes
>> - Update the MPI_Init and MPI_Finalize wording to be simpler and more direct.
>> - Fix wording of group creation calls versus communicator creation calls.
>>
>> https://BLOCKEDsvn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization
>>
>>
>> One question that we discussed quite a bit during the meeting was the issue of the return value of MPI_Comm_size() when processes fail during launch. I attempted to capture the discussion in the room in the Open Question attached to the discussion of MPI_Init:
>> https://BLOCKEDsvn.mpi-forum.org/trac/mpi-forum-web/wiki/ft/run_through_stabilization#MPI_INIT
>>
>> Open question:
>> If the user asks to start N processes on the command line, and only M processes were successfully launched (where M < N), then what should be returned from MPI_COMM_SIZE?
>>
>> The return value must be consistent across all alive members of the group. The issue is if it should return N or M.
>>
>> The feeling in the room was that since the MPI standard does not define the ability for the user to ask for a specific number of processes before initthen it is hard to define that this is the number it should be.
>>
>> So it is left to the implementation whether it is M or N. If it is M, then the user has other techniques to find out what it originally asked for (e.g., by passing it as a command line argument to the application itself).
>>
>>
>> What do people think about the MPI_Comm_size issue?
>>
>> -- Josh
>>
>> ------------------------------------
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://BLOCKEDwww.BLOCKEDcs.indiana.edu/~jjhursey
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>>
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://BLOCKEDlists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>



More information about the mpiwg-ft mailing list