[Mpi-22] Higher-level languages proposal

Thu Oct 16 16:33:16 CDT 2008

Ah -- major point of clarification that I missed in your initial  
description (sorry): that the number of threads is a *local* number of  
threads.  That helps my understanding; I glossed over and thought that  
that number was a *total* number of threads.

Replying here on the list because it's a bit easier than lengthy /  
inter-threaded replies on the ticket... (I just put the web archive  
URL of the reply thread on the ticket):

- THREAD_REGISTER/SPLIT_THREAD: per your comment about not really  
being a "split" kind of action: ok, I can see that.  But the color/key  
aspect may still be useful here.

- "Must be invoked by >=1 thread" being superfluous: I still don't  
quite grok your definition of "collective" here -- it's not the same  
definition of "collective" as in other MPI collectives because it's  
*more* than just "every MPI process in the communicator" -- you now  
want every MPI process in the communicator plus other threads.

- Changing addressing to (comm_id, rank, thread_id): I specifically  
mentioned the *internals* of MPI implementation.  I realize that your  
proposal was aimed at keeping the external interface for MPI_SEND  
(etc.) the same.  I was stating that this is a fundamental change for  
the internals of MPI implementations, even for non-communication  
operations such as GROUP_TRANSLATE_RANKS.

- FINALIZE: It was an open question that you didn't really answer: if  
THREAD_REGISTER'ed threads *are* MPI processes, then do they each have  
to call MPI_FINALIZE?

- Can you call THREAD_REGISTER/SPLIT_THREAD with a comm argument that  
already contains threads-as-processes? If so, what exactly does it  
mean?  You said: "Same thing that it normally means. See the  
(newcomm,rank) addressing above. The applicability of all communicator  
and group management calls is stated in the description."  Can you  
describe exactly what it means for a thread-now-MPI-process to call  
THREAD_REGISTER with a num_threads argument >1?  What exactly happens  
in this scenario?

main() {
   MPI_INIT();
   spawn_threads(8, thread_main, NULL);
   wait_for_threads();
   MPI_FINALIZE();
}

void thread_main(void *arg) {
   MPI_Comm comm1;
   MPI_THREAD_REGISTER(MPI_COMM_SELF, my_thread_id, 8, comm1);
   spawn_threads(8, secondary_thread_main, comm1);
}

void secondary_thread_main(void *arg) {
   MPI_Comm comm2, parent = (MPI_Comm) arg;
   MPI_THREAD_REGISTER(parent, my_thread_id, 8, &comm2);
}

Which threads end up in which comm2?  (note that there will be 8  
comm2's)  Since threads are "unbound" to an MPI process before they  
invoke THREAD_REGISTER, the grouping is not guaranteed.

- THREAD_MULTIPLE: I now understand the distinction of  
num_local_threads; thanks.  But I think num_local_threads can only be  
 >1 if the local thread level is MPI_THREAD_MULTIPLE.  You didn't  
really address this in your answer.

- Abstracting away locality: I respectfully disagree.  :-)  Yes, we're  
enabling thread-specific addressing, and that may be a good thing.   
But MPI does not [currently] expose which communicator ranks are  
"local" to other communicator ranks.  And with this proposal, now we  
have at least 2 levels of "local" in the MPI spec itself (in the same  
OS process and outside of the OS process).  The hardware that the MPI  
job is running on likely has multiple levels of locality as well (on-  
vs. off-host, on- vs. off-processor, ....etc.).  So yes, we may have  
enabled one good thing, but made determining locality more difficult.   
That's my only point here.

On Oct 16, 2008, at 3:45 PM, Supalov, Alexander wrote:

> Dear Jeff,
>
> Thanks. I'm looking forward to a lot of discussion.
>
> To start it, I've answered your questions and added some  
> clarifications
> to the proposal. The main point is that in the newcomm, all processes
> (i.e., threads) have their own unique rank as do processes in the
> original "normal" MPI communicator comm.
>
> The number of threads per original MPI process that join the newcomm  
> is
> given by the local_num_threads argument. After this, the usual
> (comm,rank) addressing works both in the old-fashioned process-only
> communicators and in the new thread-based ones.
>
> By using the usual communicator and group management calls one can cut
> and splice these new communicators as needed. This is why the syntax
> comparable to that of the MPI_COMM_SPLIT looks superfluous to me, at
> least for now.
>
> And since we can create and free these communicators as often as  
> needed,
> we can follow the OpenMP parallel regions and other threading  
> constructs
> very closely throughout the course of the program execution.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi-22-bounces_at_[hidden]
> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Wednesday, October 15, 2008 3:44 PM
> To: MPI 2.2
> Subject: Re: [Mpi-22] Higher-level languages proposal
>
> I just added 2 lengthy comments on ticket 39 (
> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/39
>  ).
>
> I suspect that there will need to be a *lot* of discussion about this
> idea.
>
>
> On Oct 14, 2008, at 11:15 AM, Supalov, Alexander wrote:
>
>> Hi,
>>
>> The proposal is ready in draft, see
>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/39 . I targeted  
>> it
>> to MPI-2.2 for now. As you will see, it resolves the problem of  
>> thread
>> addressability without any extension to the Probe/Recv calls. I bet
>> there are more things that will follow, too.
>>
>> Here's the current text for reference:
>>
>> "A collective MPI call, MPI_Comm_thread_register, with the following
>> syntax (in C):
>>
>> int MPI_Comm_thread_register(MPI_Comm comm, int index, int num,
>> MPI_Comm
>> *newcomm)
>>
>> returns a newcomm for all num threads of the comm that called this
>> function. All threads are treated as MPI processes in the newcomm,  
>> and
>> their ranks are ordered according to the index argument that ranges
>> between 0 and num-1. This argument must be unique in every thread on
>> in
>> the given MPI process of the comm.
>>
>>> From this moment on, all threads contained in the newcomm are
>>> considered
>> as MPI processes, with all that this entails, including individual  
>> MPI
>> rank that makes the respective thread addressable in the usual  
>> manner.
>> All MPI communicator and group management calls can be applied to the
>> newcomm in order to produce new communicators, reorder the processes
>> in
>> it, etc. (see Figure 1).
>>
>> A slightly modified call MPI_Comm_free with the standard syntax (in
>> C):
>>
>> int MPI_Comm_free(MPI_Comm comm)
>>
>> can be used to destroy the respective communicator comm and thus
>> "demote" all the threads from the status of MPI processes in the comm
>> back to the unnamed threads typical of the MPI standard.
>>
>> This pair of calls, or their equivalent, allow threads to be  
>> addressed
>> directly in all MPI calls, and since the sequence of the
>> MPI_Comm_thread_register and MPI_Comm_free calls can be repeated as
>> needed, OpenMP parallel sections or any equivalent groups of threads
>> in
>> the MPI program can become MPI processes for a while and then return
>> to
>> their original status.
>>
>> If threads use (as they usually do) joint address space with one
>> (former) MPI process, the MPI communication calls can certainly take
>> advantage of this by copying data directly from the source to the
>> destination buffer. This equally applies to all point-to-point,
>> collective, one-sided, and file I/O calls.
>>
>> This call certainly makes sense only at the thread support level
>> MPI_THREAD_MULTIPLE."
>>
>> Best regards.
>>
>> Alexander
>>
>> -----Original Message-----
>> From: Supalov, Alexander
>> Sent: Tuesday, October 14, 2008 2:44 PM
>> To: 'MPI 2.2'
>> Subject: RE: [Mpi-22] Higher-level languages proposal
>>
>> Sure. This will most likely be a MPI-3 topic, though. I'll drop in a
>> link here once ready.
>>
>> -----Original Message-----
>> From: mpi-22-bounces_at_[hidden]
>> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff Squyres
>> Sent: Tuesday, October 14, 2008 2:35 PM
>> To: MPI 2.2
>> Subject: Re: [Mpi-22] Higher-level languages proposal
>>
>> On Oct 14, 2008, at 7:07 AM, Supalov, Alexander wrote:
>>
>>> Thanks. I'd rather say that if the purpose of the extension is
>>> indeed to
>>> serialize the Probe/Recv pair, the better way to solve this and many
>>> other problems would be to make threads directly addressable, as if
>>> they
>>> were MPI processes.
>>>
>>> One way to do this might be, say, to create a call like
>>> MPI_Comm_thread_enroll that creates an intra-communicator out of all
>>> threads that call this function in a loosely synchronous fashion,
>>> collectively over one or several MPI processes they constitute.
>>
>> I'm still not sure I follow.  Can you provide more details, perhaps
>> with function prototypes and specific rules?  (i.e., an alternate
>> proposal)?
>>
>>> If paired with the appropriately extended MPI_Comm_free, this would
>>> allow, for example, all threads in an OpenMP parallel section to be
>>> addressed as if they were fully fledged MPI processes. Note that  
>>> this
>>> would allow more than one parallel section during the program run.
>>>
>>> Other threading models would profit from this "opt-in/opt-out"
>>> method,
>>> too. This may be a more flexible way of dealing with threads than  
>>> the
>>> one-time MPI_Init variety mentioned by George Bosilica in his
>>> EuroPVM/MPI keynote, by the way.
>>>
>>> Best regards.
>>>
>>> Alexander
>>>
>>> -----Original Message-----
>>> From: mpi-22-bounces_at_[hidden]
>>> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Terry  
>>> Dontje
>>> Sent: Tuesday, October 14, 2008 12:45 PM
>>> To: MPI 2.2
>>> Subject: Re: [Mpi-22] Higher-level languages proposal
>>>
>>> Supalov, Alexander wrote:
>>>> Dear Jeff,
>>>>
>>>> Unfortunately, I won't be in Chicago, so we should rather discuss
>>>> this
>>>> here. I talked to Torsten last time about this extension. As far
>>>> as I
>>>> can remember, the main purpose of this extension is to make sure
>>>> that
>>>> the thread that called the MPI_Probe also calls the MPI_Recv and
>>>> gets
>>>> the message matched by the aforementioned MPI_Probe.
>>>>
>>>> If so, the main problem here is not the matching. The main problem
>>>> is
>>>> that one cannot address threads in MPI. If we fix that, the  
>>>> proposed
>>>> extension with the message handle and such will become superfluous.
>>>>
>>>> See what I mean?
>>>>
>>>>
>>> Interesting, so you are basically redefining the MPI_Probe/Recv pair
>>> to
>>> guarrantee a message to go to a specific thread.  Or in other words
>>> lowering the proposal's MPI_Mprobe/recv to be in the implementation
>>> of
>>> MPI_Probe/Recv.  This seems reasonable to me since MPI_Probe/Recv
>>> itself
>>>
>>> is basically useless unless the programmer assures serialization  
>>> when
>>> that combination is used.
>>>
>>> --td
>>>> Best regards.
>>>>
>>>> Alexander
>>>>
>>>> -----Original Message-----
>>>> From: mpi-22-bounces_at_[hidden]
>>>> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff
>>>> Squyres
>>>> Sent: Monday, October 13, 2008 11:48 PM
>>>> To: MPI 2.2
>>>> Subject: Re: [Mpi-22] Higher-level languages proposal
>>>>
>>>> On Oct 13, 2008, at 10:46 AM, Supalov, Alexander wrote:
>>>>
>>>>
>>>>> Thanks. The 2.1.1, which was presented last time, in my opinion
>>>>> does
>>>
>>>>> not
>>>>> seem to solve the right problem. Instead of defining a way for
>>>>> unambiguous addressing of the threads in MPI, which would  
>>>>> eliminate
>>>>> the
>>>>> MPI_Probe/Recv ambiguity and many other issues, it attempts to add
>>> yet
>>>>> another concept (this time, a message id) in the current situation
>>>>> where
>>>>> any thread can do what they please.
>>>>>
>>>>
>>>> I'm not quite sure I understand your proposal.
>>>>
>>>> <... after typing out a lengthy/rambling discourse that made very
>>>> little sense and was fraught with questions and  
>>>> ambiguities :-) ...>
>>>>
>>>> Let's discuss this in Chicago; Rich has allocated 5-7pm on Monday
>>>> for
>>>
>>>> discussion of this proposal.  These are exactly the kinds of larger
>>>> issues that we want to raise via this proposal.
>>>>
>>>>
>>>
>>> _______________________________________________
>>> mpi-22 mailing list
>>> mpi-22_at_[hidden]
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>>> ---------------------------------------------------------------------
>>> Intel GmbH
>>> Dornacher Strasse 1
>>> 85622 Feldkirchen/Muenchen Germany
>>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>>> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>>> VAT Registration No.: DE129385895
>>> Citibank Frankfurt (BLZ 502 109 00) 600119052
>>>
>>> This e-mail and any attachments may contain confidential material  
>>> for
>>> the sole use of the intended recipient(s). Any review or  
>>> distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.
>>>
>>>
>>> _______________________________________________
>>> mpi-22 mailing list
>>> mpi-22_at_[hidden]
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>>
>>
>> -- 
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> mpi-22 mailing list
>> mpi-22_at_[hidden]
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>> ---------------------------------------------------------------------
>> Intel GmbH
>> Dornacher Strasse 1
>> 85622 Feldkirchen/Muenchen Germany
>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>> VAT Registration No.: DE129385895
>> Citibank Frankfurt (BLZ 502 109 00) 600119052
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>> _______________________________________________
>> mpi-22 mailing list
>> mpi-22_at_[hidden]
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>
>
> -- 
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mpi-22 mailing list
> mpi-22_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> mpi-22 mailing list
> mpi-22_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22

-- 
Jeff Squyres
Cisco Systems