[Mpi3-hybridpm] [EXTERNAL] Re: Threading homeworking / next telecon

Mon Mar 25 21:56:29 CDT 2013

Jeff,

One of the goals of the original proposal was to use this in an MPI+UPC
environment.  If there was inheritance, the UPC runtime could dup
COMM_WORLD as many times as there are UPC threads and just hand them to
each thread as a new "COMM_WORLD", say upc_comm[] (each element of this
array would be a comm specific to a UPC thread).  Without inheritance,
this becomes tricky since each new communicator creation has to be
tracked and setup explicitly using Comm_set_info.  This is especially
hard when you have stacked libraries.

I can explain in person why we rejected this model, if it's still not clear.

 -- Pavan

On 03/25/2013 09:29 PM US Central Time, Jeff Hammond wrote:
> Hi Pavan,
> 
> I am confused why one cannot use MPI_Comm_get_info+MPI_Comm_set_info
> to inherit this information.  I found
> http://meetings.mpi-forum.org/secretary/2012/12/slides/mpi31-hybrid.pptx
> online and it seems that there is some issue with this method but I
> cannot determine it from the slides.  Can you elaborate on what is the
> problem with this approach?
> 
> Thanks,
> 
> Jeff
> 
> On Mon, Mar 25, 2013 at 8:19 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>
>> I guess you could do that.  In our case, it was still not helpful as we
>> needed the inheritance to be automatic, once an upper-layer (such as
>> UPC) passes a 'comm' as an alternative to MPI_COMM_WORLD.
>>
>>  -- Pavan
>>
>> On 03/25/2013 07:47 PM US Central Time, Jeff Hammond wrote:
>>> Could the MPI_Info kv-pair not associate a communicator with a
>>> collection of communicators upon which progress was made
>>> simultaneously?  If the key is "communicator team" and the value is an
>>> integer indexing said teams, can one not create such groups?
>>>
>>> Jeff
>>>
>>> On Mon, Mar 25, 2013 at 7:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>
>>>> The problem was that this wasn't allow us to create a group of
>>>> communicators on which progress is made.  Each communicator was
>>>> independent of everything else.
>>>>
>>>> However, our goal was allowing each "UPC thread" to be an MPI rank,
>>>> where all threads share that rank.  Your goal is different, so this
>>>> might or might not be a concern for you.
>>>>
>>>>  -- Pavan
>>>>
>>>> On 03/25/2013 07:39 PM US Central Time, Jeff Hammond wrote:
>>>>> Why can't a user do MPI_COMM_SET_INFO explicitly every time they want
>>>>> per-communicator semantics?
>>>>>
>>>>> Jeff
>>>>>
>>>>> On Mon, Mar 25, 2013 at 7:30 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>>>
>>>>>> FWIW, we discussed a similar in the hybrid WG a few meetings ago.  The
>>>>>> main reason why we didn't go down that path was because per-communicator
>>>>>> semantics are not fully inherited for child communicators.  For example,
>>>>>> split does not inherit info arguments or communicator attributes, while
>>>>>> dup does.
>>>>>>
>>>>>>  -- Pavan
>>>>>>
>>>>>> On 03/25/2013 05:31 PM US Central Time, Sur, Sayantan wrote:
>>>>>>> This is interesting. It might be useful for implementers if the app
>>>>>>> could inform the MPI library that in its usage model, per-communicator
>>>>>>> queues might lead to a performance benefit. Such as in the case of many
>>>>>>> threads (among others).
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Info key? Assert?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sayantan
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> *From:*mpi3-hybridpm-bounces at lists.mpi-forum.org
>>>>>>> [mailto:mpi3-hybridpm-bounces at lists.mpi-forum.org] *On Behalf Of
>>>>>>> *William Gropp
>>>>>>> *Sent:* Monday, March 25, 2013 2:24 PM
>>>>>>> *To:* mpi3-hybridpm at lists.mpi-forum.org
>>>>>>> *Subject:* Re: [Mpi3-hybridpm] [EXTERNAL] Re: Threading homeworking /
>>>>>>> next telecon
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> An implementation is free to use separate queues for each communicator;
>>>>>>> some of us have discussed this in the past, in part to permit use of
>>>>>>> lock-free structures for the queue updates, particularly as this is the
>>>>>>> only place there are no wild cards, ever.  I believe that this is within
>>>>>>> the existing semantics.  It even has benefits for single threaded
>>>>>>> execution, since the communicator matching is done once, rather than in
>>>>>>> every query on the queue.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> In terms of progress, the standard is deliberately vague on the details,
>>>>>>> and thus I don't believe we have the requirement that you quote.  And
>>>>>>> some of the other interpretations of progress would not be helped by any
>>>>>>> thread-safety restriction.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Bill
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> William Gropp
>>>>>>>
>>>>>>> Director, Parallel Computing Institute
>>>>>>>
>>>>>>> Deputy Director for Research
>>>>>>>
>>>>>>> Institute for Advanced Computing Applications and Technologies
>>>>>>>
>>>>>>> Thomas M. Siebel Chair in Computer Science
>>>>>>>
>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mar 25, 2013, at 4:15 PM, Jeff Hammond wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Mar 25, 2013 at 3:17 PM, William Gropp <wgropp at illinois.edu
>>>>>>> <mailto:wgropp at illinois.edu>> wrote:
>>>>>>>
>>>>>>> I was only addressing the issue of calling the thread level routines before
>>>>>>>
>>>>>>>     knowing what thread level you had.
>>>>>>>
>>>>>>>
>>>>>>> Okay, sorry, I cannot tell which tickets people are referring to since
>>>>>>> I have a bunch of different ones right now.
>>>>>>>
>>>>>>>
>>>>>>> I'm not sure what you are looking for.  In the case of MPI_THREAD_MULTIPLE,
>>>>>>>
>>>>>>>     an implementation can provide significant concurrency today without any
>>>>>>>
>>>>>>>     change in the MPI standard - that's a major reason for that table
>>>>>>>     (more to
>>>>>>>
>>>>>>>     the point - this table is meant as a guide for not using locks).
>>>>>>>      Can you
>>>>>>>
>>>>>>>     give me an example of something that the current MPI semantics prohibits
>>>>>>>
>>>>>>>     that you'd like to achieve with MPI_THREAD_PER_OBJECT?
>>>>>>>
>>>>>>>
>>>>>>> It is my understanding of the progress requirements that any call to
>>>>>>> MPI must make progress on all MPI operations.  This means that two
>>>>>>> threads calling e.g. MPI_Recv must walk all of the message queues.  If
>>>>>>> a thread needs to modify any queue because it matches, then this must
>>>>>>> be done in a thread-safe way, which presumably requires something
>>>>>>> resembling mutual exclusion or transactions.  If a call to MPI_Recv
>>>>>>> only had to make progress on its own communicator, then two threads
>>>>>>> calling MPI_Recv on two different communicators would (1) only have to
>>>>>>> walk the message queue associated with that communicator and (2)
>>>>>>> nothing resembling mutual exclusion is required for the thread to
>>>>>>> update the message queue in the event that matching occurs.
>>>>>>>
>>>>>>> Forgive me if I've got some of the details wrong.  If I've got all of
>>>>>>> the details and the big picture wrong, then I'll think about it more.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>>
>>>>>>> On Mar 25, 2013, at 2:53 PM, Jeff Hammond wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     That doesn't do much for me in terms of enabling greater concurrency
>>>>>>>
>>>>>>>     in performance-critical operations.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     I'd like to propose that we try to make all of "Access Only", "Update
>>>>>>>
>>>>>>>     RefCount", "Read of List" and "None" thread safe in all cases.  All of
>>>>>>>
>>>>>>>     these are read-only except for "Update RefCount", but this can be done
>>>>>>>
>>>>>>>     with atomics.  I am assuming that concurrent reads are only permitted
>>>>>>>
>>>>>>>     to happen after the writing calls on the object have completed.  This
>>>>>>>
>>>>>>>     is the essence of MPI_THREAD_PER_OBJECT.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     Jeff
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>     _______________________________________________
>>>>>>>
>>>>>>>     Mpi3-hybridpm mailing list
>>>>>>>
>>>>>>>     Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>     <mailto:Mpi3-hybridpm at lists.mpi-forum.org>
>>>>>>>
>>>>>>>     http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Hammond
>>>>>>> Argonne Leadership Computing Facility
>>>>>>> University of Chicago Computation Institute
>>>>>>> jhammond at alcf.anl.gov <mailto:jhammond at alcf.anl.gov> / (630) 252-5381
>>>>>>> http://www.linkedin.com/in/jeffhammond
>>>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>>>>>> _______________________________________________
>>>>>>> Mpi3-hybridpm mailing list
>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org <mailto:Mpi3-hybridpm at lists.mpi-forum.org>
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mpi3-hybridpm mailing list
>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pavan Balaji
>>>>>> http://www.mcs.anl.gov/~balaji
>>>>>> _______________________________________________
>>>>>> Mpi3-hybridpm mailing list
>>>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>
>>>
>>>
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
> 
> 
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji