[Mpi-22] Higher-level languages proposal
Supalov, Alexander
alexander.supalov at [hidden]
Fri Oct 17 06:34:20 CDT 2008
Dear Jeff,
Thanks. I reply below.
Best regards.
Alexander
-----Original Message-----
From: mpi-22-bounces_at_[hidden]
[mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Thursday, October 16, 2008 11:33 PM
To: MPI 2.2
Subject: Re: [Mpi-22] Higher-level languages proposal
Ah -- major point of clarification that I missed in your initial
description (sorry): that the number of threads is a *local* number of
threads. That helps my understanding; I glossed over and thought that
that number was a *total* number of threads.
Replying here on the list because it's a bit easier than lengthy /
inter-threaded replies on the ticket... (I just put the web archive
URL of the reply thread on the ticket):
- THREAD_REGISTER/SPLIT_THREAD: per your comment about not really
being a "split" kind of action: ok, I can see that. But the color/key
aspect may still be useful here.
AS> I'd rather work around this in a modular way.
- "Must be invoked by >=1 thread" being superfluous: I still don't
quite grok your definition of "collective" here -- it's not the same
definition of "collective" as in other MPI collectives because it's
*more* than just "every MPI process in the communicator" -- you now
want every MPI process in the communicator plus other threads.
AS> In my opinion, it's superfluous because collectives must be called
by all processes on the comm in a loosely synchronous manner. This is
exactly what will happen, so that the requirement of >= 1 thread calling
the proposed function will be met automatically. I see your point wrt
the use of the word "collective" here, though. May be we'll come up with
a better wording.
- Changing addressing to (comm_id, rank, thread_id): I specifically
mentioned the *internals* of MPI implementation. I realize that your
proposal was aimed at keeping the external interface for MPI_SEND
(etc.) the same. I was stating that this is a fundamental change for
the internals of MPI implementations, even for non-communication
operations such as GROUP_TRANSLATE_RANKS.
AS> Agree on internals - this won't be a snap.
- FINALIZE: It was an open question that you didn't really answer: if
THREAD_REGISTER'ed threads *are* MPI processes, then do they each have
to call MPI_FINALIZE?
AS> I effectively said this should be possible, as long as MPI_Finalize
is able to handle live communicators of this kind.
- Can you call THREAD_REGISTER/SPLIT_THREAD with a comm argument that
already contains threads-as-processes? If so, what exactly does it
mean? You said: "Same thing that it normally means. See the
(newcomm,rank) addressing above. The applicability of all communicator
and group management calls is stated in the description." Can you
describe exactly what it means for a thread-now-MPI-process to call
THREAD_REGISTER with a num_threads argument >1? What exactly happens
in this scenario?
main() {
MPI_INIT();
spawn_threads(8, thread_main, NULL);
wait_for_threads();
MPI_FINALIZE();
}
void thread_main(void *arg) {
MPI_Comm comm1;
MPI_THREAD_REGISTER(MPI_COMM_SELF, my_thread_id, 8, comm1);
spawn_threads(8, secondary_thread_main, comm1);
}
void secondary_thread_main(void *arg) {
MPI_Comm comm2, parent = (MPI_Comm) arg;
MPI_THREAD_REGISTER(parent, my_thread_id, 8, &comm2);
}
Which threads end up in which comm2? (note that there will be 8
comm2's) Since threads are "unbound" to an MPI process before they
invoke THREAD_REGISTER, the grouping is not guaranteed.
AS> I defer my reply to your follow-up message.
- THREAD_MULTIPLE: I now understand the distinction of
num_local_threads; thanks. But I think num_local_threads can only be
>1 if the local thread level is MPI_THREAD_MULTIPLE. You didn't
really address this in your answer.
AS> I said for starters that the op is only meaningful for
MPI_THREAD_MULTIPLE. Other modes may require additional contemplation.
Calling this function with only one thread should not be a problem.
In the case of MPI_THREAD_SINGLE this op will be like MPI_COMM_DUP. This
infers that the comm should be an intracomm, by the way.
In the MPI_THREAD_FUNNELED only the main thread can call MPI, so this
will be equivalent to the above. This is not good, because funneled
model is used very frequently by mixed programs. May be we should relax
the requirement here and allow MPI_COMM_THREAD_REGISTER being called in
this case. This needs additional contemplation.
In the MPI_THREAD_SERIALIZED, the call would connect the threads that
called the function. Looks reasonable to me.
Finally, in the MPI_THREAD_MULTIPLE everything is fine, also if the
number of threads used is 1 per process.
- Abstracting away locality: I respectfully disagree. :-) Yes, we're
enabling thread-specific addressing, and that may be a good thing.
But MPI does not [currently] expose which communicator ranks are
"local" to other communicator ranks. And with this proposal, now we
have at least 2 levels of "local" in the MPI spec itself (in the same
OS process and outside of the OS process). The hardware that the MPI
job is running on likely has multiple levels of locality as well (on-
vs. off-host, on- vs. off-processor, ....etc.). So yes, we may have
enabled one good thing, but made determining locality more difficult.
That's my only point here.
AS> I see your point now, thanks. We do address locality already, but in
a different sense and indirectly: we sort of work around this matter by
providing virtual communicators.
On Oct 16, 2008, at 3:45 PM, Supalov, Alexander wrote:
> Dear Jeff,
>
> Thanks. I'm looking forward to a lot of discussion.
>
> To start it, I've answered your questions and added some
> clarifications
> to the proposal. The main point is that in the newcomm, all processes
> (i.e., threads) have their own unique rank as do processes in the
> original "normal" MPI communicator comm.
>
> The number of threads per original MPI process that join the newcomm
> is
> given by the local_num_threads argument. After this, the usual
> (comm,rank) addressing works both in the old-fashioned process-only
> communicators and in the new thread-based ones.
>
> By using the usual communicator and group management calls one can cut
> and splice these new communicators as needed. This is why the syntax
> comparable to that of the MPI_COMM_SPLIT looks superfluous to me, at
> least for now.
>
> And since we can create and free these communicators as often as
> needed,
> we can follow the OpenMP parallel regions and other threading
> constructs
> very closely throughout the course of the program execution.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi-22-bounces_at_[hidden]
> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff Squyres
> Sent: Wednesday, October 15, 2008 3:44 PM
> To: MPI 2.2
> Subject: Re: [Mpi-22] Higher-level languages proposal
>
> I just added 2 lengthy comments on ticket 39 (
> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/39
> ).
>
> I suspect that there will need to be a *lot* of discussion about this
> idea.
>
>
> On Oct 14, 2008, at 11:15 AM, Supalov, Alexander wrote:
>
>> Hi,
>>
>> The proposal is ready in draft, see
>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/39 . I targeted
>> it
>> to MPI-2.2 for now. As you will see, it resolves the problem of
>> thread
>> addressability without any extension to the Probe/Recv calls. I bet
>> there are more things that will follow, too.
>>
>> Here's the current text for reference:
>>
>> "A collective MPI call, MPI_Comm_thread_register, with the following
>> syntax (in C):
>>
>> int MPI_Comm_thread_register(MPI_Comm comm, int index, int num,
>> MPI_Comm
>> *newcomm)
>>
>> returns a newcomm for all num threads of the comm that called this
>> function. All threads are treated as MPI processes in the newcomm,
>> and
>> their ranks are ordered according to the index argument that ranges
>> between 0 and num-1. This argument must be unique in every thread on
>> in
>> the given MPI process of the comm.
>>
>>> From this moment on, all threads contained in the newcomm are
>>> considered
>> as MPI processes, with all that this entails, including individual
>> MPI
>> rank that makes the respective thread addressable in the usual
>> manner.
>> All MPI communicator and group management calls can be applied to the
>> newcomm in order to produce new communicators, reorder the processes
>> in
>> it, etc. (see Figure 1).
>>
>> A slightly modified call MPI_Comm_free with the standard syntax (in
>> C):
>>
>> int MPI_Comm_free(MPI_Comm comm)
>>
>> can be used to destroy the respective communicator comm and thus
>> "demote" all the threads from the status of MPI processes in the comm
>> back to the unnamed threads typical of the MPI standard.
>>
>> This pair of calls, or their equivalent, allow threads to be
>> addressed
>> directly in all MPI calls, and since the sequence of the
>> MPI_Comm_thread_register and MPI_Comm_free calls can be repeated as
>> needed, OpenMP parallel sections or any equivalent groups of threads
>> in
>> the MPI program can become MPI processes for a while and then return
>> to
>> their original status.
>>
>> If threads use (as they usually do) joint address space with one
>> (former) MPI process, the MPI communication calls can certainly take
>> advantage of this by copying data directly from the source to the
>> destination buffer. This equally applies to all point-to-point,
>> collective, one-sided, and file I/O calls.
>>
>> This call certainly makes sense only at the thread support level
>> MPI_THREAD_MULTIPLE."
>>
>> Best regards.
>>
>> Alexander
>>
>> -----Original Message-----
>> From: Supalov, Alexander
>> Sent: Tuesday, October 14, 2008 2:44 PM
>> To: 'MPI 2.2'
>> Subject: RE: [Mpi-22] Higher-level languages proposal
>>
>> Sure. This will most likely be a MPI-3 topic, though. I'll drop in a
>> link here once ready.
>>
>> -----Original Message-----
>> From: mpi-22-bounces_at_[hidden]
>> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff Squyres
>> Sent: Tuesday, October 14, 2008 2:35 PM
>> To: MPI 2.2
>> Subject: Re: [Mpi-22] Higher-level languages proposal
>>
>> On Oct 14, 2008, at 7:07 AM, Supalov, Alexander wrote:
>>
>>> Thanks. I'd rather say that if the purpose of the extension is
>>> indeed to
>>> serialize the Probe/Recv pair, the better way to solve this and many
>>> other problems would be to make threads directly addressable, as if
>>> they
>>> were MPI processes.
>>>
>>> One way to do this might be, say, to create a call like
>>> MPI_Comm_thread_enroll that creates an intra-communicator out of all
>>> threads that call this function in a loosely synchronous fashion,
>>> collectively over one or several MPI processes they constitute.
>>
>> I'm still not sure I follow. Can you provide more details, perhaps
>> with function prototypes and specific rules? (i.e., an alternate
>> proposal)?
>>
>>> If paired with the appropriately extended MPI_Comm_free, this would
>>> allow, for example, all threads in an OpenMP parallel section to be
>>> addressed as if they were fully fledged MPI processes. Note that
>>> this
>>> would allow more than one parallel section during the program run.
>>>
>>> Other threading models would profit from this "opt-in/opt-out"
>>> method,
>>> too. This may be a more flexible way of dealing with threads than
>>> the
>>> one-time MPI_Init variety mentioned by George Bosilica in his
>>> EuroPVM/MPI keynote, by the way.
>>>
>>> Best regards.
>>>
>>> Alexander
>>>
>>> -----Original Message-----
>>> From: mpi-22-bounces_at_[hidden]
>>> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Terry
>>> Dontje
>>> Sent: Tuesday, October 14, 2008 12:45 PM
>>> To: MPI 2.2
>>> Subject: Re: [Mpi-22] Higher-level languages proposal
>>>
>>> Supalov, Alexander wrote:
>>>> Dear Jeff,
>>>>
>>>> Unfortunately, I won't be in Chicago, so we should rather discuss
>>>> this
>>>> here. I talked to Torsten last time about this extension. As far
>>>> as I
>>>> can remember, the main purpose of this extension is to make sure
>>>> that
>>>> the thread that called the MPI_Probe also calls the MPI_Recv and
>>>> gets
>>>> the message matched by the aforementioned MPI_Probe.
>>>>
>>>> If so, the main problem here is not the matching. The main problem
>>>> is
>>>> that one cannot address threads in MPI. If we fix that, the
>>>> proposed
>>>> extension with the message handle and such will become superfluous.
>>>>
>>>> See what I mean?
>>>>
>>>>
>>> Interesting, so you are basically redefining the MPI_Probe/Recv pair
>>> to
>>> guarrantee a message to go to a specific thread. Or in other words
>>> lowering the proposal's MPI_Mprobe/recv to be in the implementation
>>> of
>>> MPI_Probe/Recv. This seems reasonable to me since MPI_Probe/Recv
>>> itself
>>>
>>> is basically useless unless the programmer assures serialization
>>> when
>>> that combination is used.
>>>
>>> --td
>>>> Best regards.
>>>>
>>>> Alexander
>>>>
>>>> -----Original Message-----
>>>> From: mpi-22-bounces_at_[hidden]
>>>> [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff
>>>> Squyres
>>>> Sent: Monday, October 13, 2008 11:48 PM
>>>> To: MPI 2.2
>>>> Subject: Re: [Mpi-22] Higher-level languages proposal
>>>>
>>>> On Oct 13, 2008, at 10:46 AM, Supalov, Alexander wrote:
>>>>
>>>>
>>>>> Thanks. The 2.1.1, which was presented last time, in my opinion
>>>>> does
>>>
>>>>> not
>>>>> seem to solve the right problem. Instead of defining a way for
>>>>> unambiguous addressing of the threads in MPI, which would
>>>>> eliminate
>>>>> the
>>>>> MPI_Probe/Recv ambiguity and many other issues, it attempts to add
>>> yet
>>>>> another concept (this time, a message id) in the current situation
>>>>> where
>>>>> any thread can do what they please.
>>>>>
>>>>
>>>> I'm not quite sure I understand your proposal.
>>>>
>>>> <... after typing out a lengthy/rambling discourse that made very
>>>> little sense and was fraught with questions and
>>>> ambiguities :-) ...>
>>>>
>>>> Let's discuss this in Chicago; Rich has allocated 5-7pm on Monday
>>>> for
>>>
>>>> discussion of this proposal. These are exactly the kinds of larger
>>>> issues that we want to raise via this proposal.
>>>>
>>>>
>>>
>>> _______________________________________________
>>> mpi-22 mailing list
>>> mpi-22_at_[hidden]
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>>>
---------------------------------------------------------------------
>>> Intel GmbH
>>> Dornacher Strasse 1
>>> 85622 Feldkirchen/Muenchen Germany
>>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>>> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>>> VAT Registration No.: DE129385895
>>> Citibank Frankfurt (BLZ 502 109 00) 600119052
>>>
>>> This e-mail and any attachments may contain confidential material
>>> for
>>> the sole use of the intended recipient(s). Any review or
>>> distribution
>>> by others is strictly prohibited. If you are not the intended
>>> recipient, please contact the sender and delete all copies.
>>>
>>>
>>> _______________________________________________
>>> mpi-22 mailing list
>>> mpi-22_at_[hidden]
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> mpi-22 mailing list
>> mpi-22_at_[hidden]
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>> ---------------------------------------------------------------------
>> Intel GmbH
>> Dornacher Strasse 1
>> 85622 Feldkirchen/Muenchen Germany
>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
>> VAT Registration No.: DE129385895
>> Citibank Frankfurt (BLZ 502 109 00) 600119052
>>
>> This e-mail and any attachments may contain confidential material for
>> the sole use of the intended recipient(s). Any review or distribution
>> by others is strictly prohibited. If you are not the intended
>> recipient, please contact the sender and delete all copies.
>>
>>
>> _______________________________________________
>> mpi-22 mailing list
>> mpi-22_at_[hidden]
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> mpi-22 mailing list
> mpi-22_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> mpi-22 mailing list
> mpi-22_at_[hidden]
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
--
Jeff Squyres
Cisco Systems
_______________________________________________
mpi-22 mailing list
mpi-22_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052
This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
More information about the Mpi-22
mailing list