[mpiwg-hybridpm] Hybrid WG telecon tomorrow

Daniel Holmes dholmes at epcc.ed.ac.uk
Tue Jul 1 04:34:58 CDT 2014


Hi Jim,

General comment: the word "process" in the MPI Standard is not 
well-defined. IMHO, it should always be short-hand for "MPI process" but 
it is often taken to mean "OS process". Adding a new implicit meaning, 
i.e. "endpoint", is not helpful. There is a one-to-many relationship 
between OS process and MPI process. There is a one-to-many relationship 
between MPI process and endpoint. The term "process" is used to refer to 
the child, the parent, or the grandparent in this hierarchy with the 
context (hopefully) determining which meaning is intended.

I've re-read all of the new text looking for process/rank wording 
issues. There are some inconsistencies where "rank" in the parent 
communicator is still used.

Page 244, lines 36-37
%%CURRENT
associated with a single calling rank in parent_comm.
%%SUGGESTION
associated with a single calling process in parent_comm.

Page 244, line 39
%%CURRENT
at the corresponding rank in parent_comm.
%%SUGGESTION
at the corresponding process in parent_comm.

Page 244, lines 39-40
%%CURRENT
Ranks associated with a process in parent_comm are numbered contiguously 
in the output communicator, and
%%PROBLEM
"process in parent_comm" is intended to refer to "rank in parent_comm".
Example: create an endpoints communicator, use split to re-order the 
ranks then create another endpoints communicator.
The ranks associated with an OS process may not be contiguous - they may 
consist of several sub-groups (each of which is internally contiguous), 
one per local rank in parent_comm.
%%SUGGESTION1
Ranks in new_comm_handles are numbered contiguously in the output 
communicator and
%%SUGGESTION2 (preferred)
Ranks in new_comm_handles are numbered contiguously and

Page 244, lines 40-41
%%CURRENT
and the starting rank is defined by the order of the associated rank in 
the parent communicator.
%%PROBLEM
"associated rank" is intended to refer to "calling process"
%%SUGGESTION
and the starting rank is defined by the order of the calling process in 
parent_comm.

Other comments:

%%PROBLEM
The statements about cached information, valid values for my_num_ep, and 
the condition for the error code/class all apply to the 
inter-communicator case as well as the intra-communicator case. Should 
this be made clearer by moving that text out of the intra-communicator 
paragraph and after the inter-communicator paragraph?
%%SUGGESTION
If parent_comm is an intracommunicator ... sum of the values of 
my_num_ep on all calling processes. If parent_comm is an 
intercommunicator ... MPI_COMM_NULL is returned in all entries of 
new_comm_handles.
<p>No cached information ... this function will return MPI_ERR_ENDPOINTS 
at all processes.

Page 245 line 20
%%CURRENT
Some operations, such as collective operations, cannot be used
%%PROBLEM
A single thread can perform a non-blocking collective even when there 
are multiple local endpoints. This would only require MPI_THREAD_SINGLE.
%%SUGGESTION
Some operations, such as blocking collective operations, cannot be used

Cheers,
Dan.

On 01/07/2014 01:45, Jim Dinan wrote:
> All,
>
> Below are two options for the endpoints error text:
>
> %% OLD:
> If the MPI implementation is not able to create a new communicator because
> of the \mpiarg{my\_num\_ep} argument given by any process, this function
> will
> return \error{MPI\_ERR\_ENDPOINTS}.
>
> %% Option 2 (Suggested by Pavan):
> If a process provides a valid \mpiarg{my\_num\_ep} argument, but the MPI
> implementation is not able to create a new communicator because of the
> \mpiarg{my\_num\_ep} argument at this process, this function will return
> \error{MPI\_ERR\_ENDPOINTS} at all processes.
>
> %% Option 1 (Smallest delta from old text):
> If the MPI implementation is not able to create a new communicator because a
> valid \mpiarg{my\_num\_ep} argument given by any process, this function will
> return \error{MPI\_ERR\_ENDPOINTS} at all processes.
>
> Opinions or suggestions for improvement?
>
> I have attached an updated review document (with option 2 above, but this
> is still open for discussion).  Please review page 245, lines 7-15.  I
> removed the use of "endpoint" and replaced it with "rank".  I also replaced
> instances where we talk about "rank" in the parent communicator with
> "process".  I think it's clearer now; interested to hear feedback.
>
> Cheers,
>   ~Jim.
>
>
> On Mon, Jun 30, 2014 at 11:43 AM, Jim Dinan <james.dinan at gmail.com> wrote:
>
>> I can attend, but will be 10 minutes late.  Attached is an updated copy of
>> the endpoints proposal with feedback from the Chicago meeting.  Let's
>> review these changes, and we also have an item to discuss from that meeting
>> with regard to how errors are returned from MPI_Comm_create_endpoints.
>>
>>
>> On Sun, Jun 29, 2014 at 2:40 PM, Balaji, Pavan <balaji at anl.gov> wrote:
>>
>>> All,
>>>
>>> This is a reminder that we'll have our hybrid WG telecon tomorrow (06/30)
>>> at 11am central time.
>>>
>>> I've made the promised corrections to ticket 411 and uploaded a new draft.
>>>
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/411
>>>
>>> We can discuss this tomorrow, together with the remaining tickets.
>>>
>>> Thanks,
>>>
>>>    --- Pavan
>>>
>>> _______________________________________________
>>> mpiwg-hybridpm mailing list
>>> mpiwg-hybridpm at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-hybridpm
>>>
>>
>
>
> _______________________________________________
> mpiwg-hybridpm mailing list
> mpiwg-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-hybridpm

-- 
Dan Holmes
Applications Consultant in HPC Research
EPCC, The University of Edinburgh
James Clerk Maxwell Building
The Kings Buildings
Mayfield Road
Edinburgh, UK
EH9 3JZ
T: +44(0)131 651 3465
E: dholmes at epcc.ed.ac.uk

*Please consider the environment before printing this email.*

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-hybridpm/attachments/20140701/1f9d5966/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-hybridpm/attachments/20140701/1f9d5966/attachment.ksh>


More information about the mpiwg-hybridpm mailing list