[Mpi3-hybridpm] endpoint proposal

Tue Oct 25 10:04:03 CDT 2011

PS. I think there's a least one case where this interface won't work, and we should be able to judge this w/o compiler people. Is this a limitation of the interface or a user error?

Imagine a program that uses, say, some non-OpenMP runtime like Intel TBB and some MPI. On one thread in the user program, I decide it's time to reinforce the MPI internals, and rent this thread out to the MPI using the helper thread proposal. The thread goes into the MPI scope, so to say, still being under control of the TBB runtime which may be different from MPI's.

Now, this thread sits idle for a while inside the MPI, and the TBB dispatcher assigns some useful work to it, because it's, well, idle, and the dispatcher can see this. Now I have a thread that is nominally controlled by the MPI and actually doing some work well outside of it. I.e., I'll have to put hooks into the resp. MPI call to inform the TBB that I'm taking over and the TBB dispatcher shouldn't care.

Not a problem per se, unless I have more than TBB to take care of. I may have Cilk+, etc., all based on work stealing. Will this proposal work in this case?

Also, if we recall that there are user threads and actual HW resources that execute them, the assumption of the proposal is that by renting out a user thread one also rents out the resp. resources. I'm afraid the above scenario shows this may not be the case: although the user thread was rented out to the MPI, the resources went away to serve some TBB runtime business.

It would be interesting to hear what you think about it.

-----Original Message-----
From: Supalov, Alexander 
Sent: Tuesday, October 25, 2011 4:07 PM
To: 'Bronis R. de Supinski'; mpi3-hybridpm at lists.mpi-forum.org
Subject: RE: [Mpi3-hybridpm] endpoint proposal

Thanks. Even if true (which I'll have to check w/ our compiler people; currently I doubt it), this does not invalidate the observation that we have several states, and we use vastly different interfaces for the state transition.

-----Original Message-----
From: mpi3-hybridpm-bounces at lists.mpi-forum.org [mailto:mpi3-hybridpm-bounces at lists.mpi-forum.org] On Behalf Of Bronis R. de Supinski
Sent: Tuesday, October 25, 2011 3:59 PM
To: mpi3-hybridpm at lists.mpi-forum.org
Subject: Re: [Mpi3-hybridpm] endpoint proposal

More importnt problem: Alexander misses the point of the helper
thread proposal. He blithely asserts "A good threading runtime
will know w/o a hint when to hijack the idle user threads for
the internal use by the MPI library." This assertion assumes
that an dle thread will remain idle. It is essentially the
same statement that Howard made some time ago. He then consulted
his compiler implementers who stated unequivocally that the
compiler cannot make this determination. They need assistance
from the user, who knows the intent of the code. As the issue
is well beyond static analysis (or combined static/dynamic
analysis) that has full access to the source code, it is
even more beyond the runtime or OS. The entire point of the
helper thread proposal is to let the user communicate their
knowledge of the intended use of RESOURCES (so the thread
implementations used by the different levels are irrelevant).

On Tue, 25 Oct 2011, Supalov, Alexander wrote:

> Thanks. That's the point: we can have a thread in three mutually exclusive states. The interface ought to reflect this.
>
> -----Original Message-----
> From: mpi3-hybridpm-bounces at lists.mpi-forum.org [mailto:mpi3-hybridpm-bounces at lists.mpi-forum.org] On Behalf Of Marc Snir
> Sent: Tuesday, October 25, 2011 3:49 PM
> To: mpi3-hybridpm at lists.mpi-forum.org
> Subject: Re: [Mpi3-hybridpm] endpoint proposal
>
> Two problems:
>
> 1. An unbound thread would not, normally be allowed to make MPI calls. Need to think through it more carefully.
> 2. An internal thread cannot make MPI calls -- it is controlled by MPI. Some other thread has to detach it.
>
> On Oct 25, 2011, at 12:52 AM, Supalov, Alexander wrote:
>
>> Hi everybody,
>>
>> After some offline discussion w/ Pavan, it appears to me that we may not be understanding each other. I'll try to find time and show up at the next WG meeting to discuss with you the following points that in my opinion indicate that the endpoint and the helper thread proposals may need a closer joint consideration if not an outright merge:
>>
>> 1. A good threading runtime will know w/o a hint when to hijack the idle user threads for the internal use by the MPI library. Note that this admittedly implies that both the user application and the MPI implementation use the same threading runtime. This may and often is not the case.
>> 2. An assumption that one set of calls will work to pass the hint to the MPI library again assumes that both use the same underlying threading runtime. I.e., this interface will effectively force a threading runtime unification, which may or may not be feasible.
>> 3. The formulation of the helper thread proposal may need to be simplified by eliminating the teams and extending one extra call from the endpoint proposal by a few arguments, or adding 2 more calls to describe the desired thread state transition instead of the total of 7 in both proposals now.
>>
>> The essence of the latter is like this: If I were writing a naive interface that toggles thread state between (not visible to the MPI) - (associated with an endpoint and doing user's MPI work) - (doing some work inside the MPI implementation), I'd probably have but one call with a constant for this, like:
>>
>> int MPI_Thread_state(int state,...)
>>
>> where state is:
>>
>> MPI_THREAD_UNBOUND    // thread has no MPI relation
>> MPI_THREAD_ENDPOINT   // thread is attached to an MPI endpoint. Vararg contains the endpoint to bind the thread to.
>> MPI_THREAD_INTERNAL   // thread is used by the MPI internally
>>
>> Alternatively, I'd have at most 3 calls to represent the required state transition:
>>
>> MPI_Thread_attach_internal
>> MPI_Thread_attach_endpoint
>> MPI_Thread_detach
>>
>> Currently, the endpoint proposal seems to have one call to enter the resp. state (MPI_Thread_attach == MPI_Thread_attach_endpoint) and no exit call. The helper thread proposal describes no less than 6 calls - enter, exit, and some while inside. Why? Do we really need the thread teams and the overhead related to managing them? I'm not sure we do. Why not having just 3 calls instead of 7?
>>
>> If this is a gross misunderstanding of both proposals on my part, please let me know.
>>
>> Best regards.
>>
>> Alexander
>>
>> -----Original Message-----
>> From: Supalov, Alexander
>> Sent: Tuesday, October 25, 2011 1:16 AM
>> To: 'Pavan Balaji'
>> Cc: mpi3-hybridpm at lists.mpi-forum.org
>> Subject: RE: [Mpi3-hybridpm] endpoint proposal
>>
>> Well, I did explain this. But since you've considered this matter already, and they will be merged in one chapter anyway, that's fine.
>>
>> -----Original Message-----
>> From: Pavan Balaji [mailto:balaji at mcs.anl.gov]
>> Sent: Tuesday, October 25, 2011 12:27 AM
>> To: Supalov, Alexander
>> Cc: mpi3-hybridpm at lists.mpi-forum.org
>> Subject: Re: [Mpi3-hybridpm] endpoint proposal
>>
>> Alexander,
>>
>> We have discussed these two proposals and how they interact with each
>> other several times in the working group. We don't believe any changes
>> are required in either proposal for them to work together. Also, either
>> proposal would work without the other proposal. So they are, in fact,
>> orthogonal.
>>
>> With respect to merging the two proposals, there is no good reason to.
>> You did not explain why one proposal depends on the other. If there is
>> no dependency, merging is not required.
>>
>>  -- Pavan
>>
>> On 10/24/2011 04:53 PM, Supalov, Alexander wrote:
>>> Thanks. I don't think they are orthogonal.
>>>
>>> They are mutually exclusive as far as the resulting thread states are concerned. You may have:
>>>
>>> 0 (thread has no MPI involvement at all)
>>> A (thread is used as a helper thread by the MPI implementation)
>>> B (thread is used MPI by the MPI user in association w/ some endpoint)
>>>
>>> And there's no AB (both helper and user thread w/ MPI). So, B above is actually -A.
>>>
>>> Thus, the proposals will have to deal with the transitions 0A, A0, 0(-A),-A0, and most interestingly, -AA and A(-A).
>>>
>>> Basing on this, it may be necessary to consider them together and make reciprocal adjustments in both documents.
>>>
>>> The means of achieving the states A and -A may indeed appear orthogonal at the moment. I'd argue the closer they look, the better off we'll be. The helper thread proposal looks very rich on the ways to cut the cake. The endpoint proposal looks rather minimalistic. Would the helper thread proposal profit from some simplification? After all, you're "only" giving a thread on loan to the MPI. This should be doable with 1 call, maximum 2 (in and out).
>>>
>>> I can even imagine that the THREAD_JOIN might look like (join user MPI space w/ assoc. endpoint) and (join MPI implementation space). Then the means would not be orthogonal either, reflecting the mutual exclusivity of the resp. states. This would arguably be a better design.
>>>
>>> If the above makes sense, we'd rather merge the proposals.
>>>
>>> -----Original Message-----
>>> From: Pavan Balaji [mailto:balaji at mcs.anl.gov]
>>> Sent: Monday, October 24, 2011 5:03 PM
>>> To: mpi3-hybridpm at lists.mpi-forum.org
>>> Cc: Supalov, Alexander
>>> Subject: Re: [Mpi3-hybridpm] endpoint proposal
>>>
>>> Alexander,
>>>
>>> These are orthogonal concepts. Users can certainly use them together,
>>> but there is no reason to merge them.
>>>
>>>   -- Pavan
>>>
>>> On 10/24/2011 05:25 AM, Supalov, Alexander wrote:
>>>> Thanks. Has someone tried to look into #217 (helper threads) and #284 (endpoints) with the view on merging them? After all, threads attaching to the endpoints are almost, but not quite unlike helper threads given on "loan" to the MPI implementation. In the first case, the application is using the threads by itself. In the second case, the MPI is using threads for internal purposes. See the analogy? In both cases we let threads use or be used by MPI. Is this something one could possibly unite? They are being placed like on different sides of the boundary:
>>>>
>>>> Endpoints (#284):
>>>>
>>>>     Application threads
>>>>     Endpoints
>>>>     Communication resources
>>>>
>>>> Helper threads (#217):
>>>>
>>>>     [Endpoints]
>>>>     Internal threads
>>>>     Communication resources
>>>>
>>>> So, if we merge the pictures:
>>>>
>>>>     Application threads
>>>>     Endpoints (#284)
>>>>     Internal threads (#217)
>>>>     Communication resources
>>>>
>>>> And what the unification would do is provide a way to convert the application threads into the internal threads and back. May be orthogonal, but certainly deserves a joint consideration.
>>>>
>>>> -----Original Message-----
>>>> From: mpi3-hybridpm-bounces at lists.mpi-forum.org [mailto:mpi3-hybridpm-bounces at lists.mpi-forum.org] On Behalf Of Marc Snir
>>>> Sent: Thursday, October 20, 2011 10:40 PM
>>>> To: mpi3-hybridpm at lists.mpi-forum.org
>>>> Subject: [Mpi3-hybridpm] endpoint proposal
>>>>
>>>> Slides and text for the endpoint proposal
>>>>
>>>>
>>>>
>>>> --------------------------------------------------------------------------------------
>>>> Intel GmbH
>>>> Dornacher Strasse 1
>>>> 85622 Feldkirchen/Muenchen, Deutschland
>>>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>>>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>>>> Registergericht: Muenchen HRB 47456
>>>> Ust.-IdNr./VAT Registration No.: DE129385895
>>>> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>>>>
>>>>
>>>> _______________________________________________
>>>> Mpi3-hybridpm mailing list
>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>> --------------------------------------------------------------------------------------
>> Intel GmbH
>> Dornacher Strasse 1
>> 85622 Feldkirchen/Muenchen, Deutschland
>> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
>> Registergericht: Muenchen HRB 47456
>> Ust.-IdNr./VAT Registration No.: DE129385895
>> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>>
>>
>> _______________________________________________
>> Mpi3-hybridpm mailing list
>> Mpi3-hybridpm at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>
>
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
> --------------------------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen, Deutschland
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456
> Ust.-IdNr./VAT Registration No.: DE129385895
> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>
>
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>
_______________________________________________
Mpi3-hybridpm mailing list
Mpi3-hybridpm at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
--------------------------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen, Deutschland 
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 
Ust.-IdNr./VAT Registration No.: DE129385895
Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052