[Mpi3-hybridpm] Clarification on single-communication multiple-threads in latest proposal (9-nov-2009)

Thu Nov 12 07:22:21 CST 2009

I like the idea of making helper threads exposed to the application. But
I'm still not clear on just how a communication would be performed.
Consider an MPI_Allreduce that splits the operation into 4 units of work,
each of which can be performed on separate threads, in parallel. In that
case, I envision two possible scenarios:

1) One thread attaches to endpoint "A" normally, and calls MPI_Allreduce.
The allreduce appends units of work to endpoints A, B, C, and D. Then 3
other threads attach to endpoints B, C, and D as "helpers" and the
allreduce is performed and completed.

2) One thread attaches to endpoint "A" normally, and calls MPI_Allreduce.
The allreduce appends all four units of work to endpoint A. The 3 other
threads attach to endpoint A as "helpers". All four threads pick up a unit
of work each, and the allreduce is performed and completed.

The problem I see with option 1 is that it allows for units of work to
appear unexpectedly on endpoints, which I think requires all endpoints to
have helper threads attached at all times. This makes it difficult to have
threads that transition between compute and communication phases. It also
seems strange to have a unit of work being performed on an endpoint that
has nothing to do with the "rank" associated with that endpoint.

The problem I see with option 2 is that it requires endpoints to have a
notion of "depth", or how many helper threads should be attached. It makes
it less clear just why multiple endpoints are needed, and complicates how
the messaging layers would divide up hardware resources since each helper
thread would tend to want separate hardware resources, meaning a single
endpoint would represent multiple hardware channels, each of which could be
used independently for communication.

             "Snir, Marc"                                                  
             <snir at illinois.ed                                             
             u>                                                         To 
             Sent by:                  "mpi3-hybridpm at lists.mpi-forum.org" 
             mpi3-hybridpm-bou         <mpi3-hybridpm at lists.mpi-forum.org> 
             nces at lists.mpi-fo                                          cc 
             rum.org                                                       
                                                                   Subject 
                                       Re: [Mpi3-hybridpm]                 
             11/11/2009 02:51          Clarification    on                 
             PM                        single-communication                
                                       multiple-threads in    latest       
                                       proposal   (9-nov-2009)             
             Please respond to                                             
             mpi3-hybridpm at lis                                             
             ts.mpi-forum.org                                              

On Nov 11, 2009, at 2:24 PM, Douglas Miller wrote:

>
> That's an interesting idea.  How would a thread that attaches as a
> "helper"
> actually end up in the progress engine? Does the thread relinquish
> control
> when it attaches as a helper? Thus the MPI ADI would simply not
> return but
> rather call into a messaging layer progress loop?

Yes -- that's the idea

> And then the "detach all
> helper threads" call would set some flag that causes the helper
> threads to
> break out of the loop and return? In this case, do you envision these
> helper threads are all attached to the same endpoint, or are there
> separate
> endpoints?

I envisage that a call by an application thread that is attached to a
particular endpoint could detach all the helper threads from that
particular endpoint. The call would return when the threads detached.
This avoids the need for identifying thread ids

elper threads could attach to different endpoints.

>
>
>
>
>
>
>
>             "Snir, Marc"
>             <snir at illinois.ed
>
> u>                                                         To
>             Sent by:                  "mpi3-hybridpm at lists.mpi-forum.org
> "
>             mpi3-hybridpm-bou         <mpi3-hybridpm at lists.mpi-forum.org
> >
>             nces at lists.mpi-
> fo                                          cc
>             rum.org
>
> Subject
>                                       Re: [Mpi3-hybridpm]
> Clarification
>             11/11/2009 05:05          on   single-communication
>             AM                        multiple-threads in latest
>                                       proposal   (9-nov-2009)
>
>             Please respond to
>             mpi3-hybridpm at lis
>             ts.mpi-forum.org
>
>
>
>
>
>
> Possible design with two new calls:
> 1 A thread can attach as helper;
> 2 an attached compute thread can detach all helper threads from it's
> endpoint.
>
> Short  iphone email
>
>
> On Nov 11, 2009, at 12:19 AM, "Snir, Marc" <snir at illinois.edu> wrote:
>
>> I put a placeholder for this idea. It is easy for a thread to join as
>> a helper. Harder for the apllication to request the thread back
>>
>> Short  iphone email
>>
>>
>> On Nov 10, 2009, at 3:03 PM, "Douglas Miller" <dougmill at us.ibm.com>
>> wrote:
>>
>>>
>>> I've been looking at the latest (v3) MPI3 Hybrid proposal and had a
>>> question about support for parallelism within a communication. It
>>> seems
>>> that the direction we're going here is to have the application own
>>> threads
>>> and then "lend" them to the messaging software for use during
>>> communications. This model works well in many situations, so it
>>> seems worth
>>> pursuing. One situation that it works well is when oversubscribing
>>> processor resources is detrimental to performance, and so the
>>> application
>>> and messaging layers should be cooperative about using threads and
>>> avoiding
>>> oversubscription.
>>>
>>> The dimension of parallelism in question is where multiple threads
>>> need to
>>> participate in a single communication, either point-to-point or
>>> collective.
>>> The way in which those threads will participate is up to the
>>> messaging
>>> software, but some examples are: message striping across multiple
>>> channels
>>> of a network (or shared memory) and collectives that consist of
>>> compound
>>> communications operations that benefit from multiple threads each
>>> performing a role in the larger collective. The question is, how
>>> would this
>>> be supported when using the model of threads, endpoints, and agents?
>>>
>>> Some examples might help illustrate the concerns. Consider a
>>> blocking
>>> collective that can benefit from parallelism. In order for the
>>> application
>>> to assign threads (or agents) to the collective, multiple threads
>>> must call
>>> into the messaging layer and indicate that they are part of the
>>> particular
>>> collective. This requires some sort of common identifier or other
>>> mechanism
>>> by which the messaging layer can identify these threads as being
>>> part of
>>> the same collective. Since the operation is blocking, there is no
>>> "request"
>>> or other object that can be used in a WAIT, so in order to ensure
>>> all
>>> threads are involved in the progress of the collective the
>>> application must
>>> arrange for each thread to call. Non-blocking calls also present
>>> problems,
>>> as there probably needs to be multiple requests generated which are
>>> shared
>>> among threads (agents) which all make progress individually.
>>>
>>> There are likely other ways to solve this, but the idea is to expose
>>> this
>>> dimension of parallelism such that applications can be written to
>>> take
>>> advantage of it. It would always be the case that a message layer
>>> could
>>> choose to use only one participant, the rest essentially performing
>>> a NO-OP
>>> (or barrier). It would also always be valid for an application to
>>> use only
>>> one thread, and not take advantage of possible parallelism.
>>>
>>> thanks,
>>>
>>> _______________________________________________
>>> Douglas Miller                  BlueGene Messaging Development
>>> IBM Corp., Rochester, MN USA                    Bldg 030-2 A410
>>> dougmill at us.ibm.com               Douglas Miller/Rochester/IBM
>>>
>>> _______________________________________________
>>> Mpi3-hybridpm mailing list
>>> Mpi3-hybridpm at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>
>> _______________________________________________
>> Mpi3-hybridpm mailing list
>> Mpi3-hybridpm at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>
>
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm

Marc Snir
4323 Siebel Center, 201 N Goodwin, IL 61801
Tel (217) 244 6568
Web http://www.cs.uiuc.edu/homes/snir

_______________________________________________
Mpi3-hybridpm mailing list
Mpi3-hybridpm at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm