[Mpi3-hybridpm] [EXTERNAL] Re: Threading homeworking / next telecon

Pavan Balaji balaji at mcs.anl.gov
Tue Mar 26 11:07:12 CDT 2013


A manual approach to modify every communicator creation call (either new
calls or append each one with a Comm_set_info), such as what you
suggested, will work of course.  But we didn't want a manual approach.

 -- Pavan

On 03/26/2013 07:05 AM US Central Time, Jeff Hammond wrote:
> You'll have to explain in person because I still don't see why the
> following doesn't work.
> 
> int MPIX_Comm_split_inherit_info(MPI_Comm comm, int color, int key,
> MPI_Comm * newcomm)
> {
>   /* assuming MPI_ERRORS_ARE_FATAL... */
>   MPI_Info info;
>   MPI_Comm_get_info(comm, &info);
>   MPI_Comm_split(comm, color, key, newcomm);
>   MPI_Comm_set_info(*newcomm, info);
>   MPI_Info_free(&info);
>   return MPI_SUCCESS;
> }
> 
> I did not bother to implement attribute copying because I don't
> understand how those work yet and because I am in favor of using
> MPI_Info to tell the implementation about communicator team progress
> anyways.
> 
> Best,
> 
> Jeff
> 
> On Mon, Mar 25, 2013 at 9:56 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>> Jeff,
>>
>> One of the goals of the original proposal was to use this in an MPI+UPC
>> environment.  If there was inheritance, the UPC runtime could dup
>> COMM_WORLD as many times as there are UPC threads and just hand them to
>> each thread as a new "COMM_WORLD", say upc_comm[] (each element of this
>> array would be a comm specific to a UPC thread).  Without inheritance,
>> this becomes tricky since each new communicator creation has to be
>> tracked and setup explicitly using Comm_set_info.  This is especially
>> hard when you have stacked libraries.
>>
>> I can explain in person why we rejected this model, if it's still not clear.
>>
>>  -- Pavan
>>
>> On 03/25/2013 09:29 PM US Central Time, Jeff Hammond wrote:
>>> Hi Pavan,
>>>
>>> I am confused why one cannot use MPI_Comm_get_info+MPI_Comm_set_info
>>> to inherit this information.  I found
>>> http://meetings.mpi-forum.org/secretary/2012/12/slides/mpi31-hybrid.pptx
>>> online and it seems that there is some issue with this method but I
>>> cannot determine it from the slides.  Can you elaborate on what is the
>>> problem with this approach?
>>>
>>> Thanks,
>>>
>>> Jeff
>>>
>>> On Mon, Mar 25, 2013 at 8:19 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>
>>>> I guess you could do that.  In our case, it was still not helpful as we
>>>> needed the inheritance to be automatic, once an upper-layer (such as
>>>> UPC) passes a 'comm' as an alternative to MPI_COMM_WORLD.
>>>>
>>>>  -- Pavan
>>>>
>>>> On 03/25/2013 07:47 PM US Central Time, Jeff Hammond wrote:
>>>>> Could the MPI_Info kv-pair not associate a communicator with a
>>>>> collection of communicators upon which progress was made
>>>>> simultaneously?  If the key is "communicator team" and the value is an
>>>>> integer indexing said teams, can one not create such groups?
>>>>>
>>>>> Jeff
>>>>>
>>>>> On Mon, Mar 25, 2013 at 7:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>>>
>>>>>> The problem was that this wasn't allow us to create a group of
>>>>>> communicators on which progress is made.  Each communicator was
>>>>>> independent of everything else.
>>>>>>
>>>>>> However, our goal was allowing each "UPC thread" to be an MPI rank,
>>>>>> where all threads share that rank.  Your goal is different, so this
>>>>>> might or might not be a concern for you.
>>>>>>
>>>>>>  -- Pavan
>>>>>>
>>>>>> On 03/25/2013 07:39 PM US Central Time, Jeff Hammond wrote:
>>>>>>> Why can't a user do MPI_COMM_SET_INFO explicitly every time they want
>>>>>>> per-communicator semantics?
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>> On Mon, Mar 25, 2013 at 7:30 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>>>>>
>>>>>>>> FWIW, we discussed a similar in the hybrid WG a few meetings ago.  The
>>>>>>>> main reason why we didn't go down that path was because per-communicator
>>>>>>>> semantics are not fully inherited for child communicators.  For example,
>>>>>>>> split does not inherit info arguments or communicator attributes, while
>>>>>>>> dup does.
>>>>>>>>
>>>>>>>>  -- Pavan
>>>>>>>>
>>>>>>>> On 03/25/2013 05:31 PM US Central Time, Sur, Sayantan wrote:
>>>>>>>>> This is interesting. It might be useful for implementers if the app
>>>>>>>>> could inform the MPI library that in its usage model, per-communicator
>>>>>>>>> queues might lead to a performance benefit. Such as in the case of many
>>>>>>>>> threads (among others).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Info key? Assert?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sayantan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> *From:*mpi3-hybridpm-bounces at lists.mpi-forum.org
>>>>>>>>> [mailto:mpi3-hybridpm-bounces at lists.mpi-forum.org] *On Behalf Of
>>>>>>>>> *William Gropp
>>>>>>>>> *Sent:* Monday, March 25, 2013 2:24 PM
>>>>>>>>> *To:* mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>>> *Subject:* Re: [Mpi3-hybridpm] [EXTERNAL] Re: Threading homeworking /
>>>>>>>>> next telecon
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> An implementation is free to use separate queues for each communicator;
>>>>>>>>> some of us have discussed this in the past, in part to permit use of
>>>>>>>>> lock-free structures for the queue updates, particularly as this is the
>>>>>>>>> only place there are no wild cards, ever.  I believe that this is within
>>>>>>>>> the existing semantics.  It even has benefits for single threaded
>>>>>>>>> execution, since the communicator matching is done once, rather than in
>>>>>>>>> every query on the queue.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In terms of progress, the standard is deliberately vague on the details,
>>>>>>>>> and thus I don't believe we have the requirement that you quote.  And
>>>>>>>>> some of the other interpretations of progress would not be helped by any
>>>>>>>>> thread-safety restriction.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Bill
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> William Gropp
>>>>>>>>>
>>>>>>>>> Director, Parallel Computing Institute
>>>>>>>>>
>>>>>>>>> Deputy Director for Research
>>>>>>>>>
>>>>>>>>> Institute for Advanced Computing Applications and Technologies
>>>>>>>>>
>>>>>>>>> Thomas M. Siebel Chair in Computer Science
>>>>>>>>>
>>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 25, 2013, at 4:15 PM, Jeff Hammond wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Mar 25, 2013 at 3:17 PM, William Gropp <wgropp at illinois.edu
>>>>>>>>> <mailto:wgropp at illinois.edu>> wrote:
>>>>>>>>>
>>>>>>>>> I was only addressing the issue of calling the thread level routines before
>>>>>>>>>
>>>>>>>>>     knowing what thread level you had.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Okay, sorry, I cannot tell which tickets people are referring to since
>>>>>>>>> I have a bunch of different ones right now.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I'm not sure what you are looking for.  In the case of MPI_THREAD_MULTIPLE,
>>>>>>>>>
>>>>>>>>>     an implementation can provide significant concurrency today without any
>>>>>>>>>
>>>>>>>>>     change in the MPI standard - that's a major reason for that table
>>>>>>>>>     (more to
>>>>>>>>>
>>>>>>>>>     the point - this table is meant as a guide for not using locks).
>>>>>>>>>      Can you
>>>>>>>>>
>>>>>>>>>     give me an example of something that the current MPI semantics prohibits
>>>>>>>>>
>>>>>>>>>     that you'd like to achieve with MPI_THREAD_PER_OBJECT?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> It is my understanding of the progress requirements that any call to
>>>>>>>>> MPI must make progress on all MPI operations.  This means that two
>>>>>>>>> threads calling e.g. MPI_Recv must walk all of the message queues.  If
>>>>>>>>> a thread needs to modify any queue because it matches, then this must
>>>>>>>>> be done in a thread-safe way, which presumably requires something
>>>>>>>>> resembling mutual exclusion or transactions.  If a call to MPI_Recv
>>>>>>>>> only had to make progress on its own communicator, then two threads
>>>>>>>>> calling MPI_Recv on two different communicators would (1) only have to
>>>>>>>>> walk the message queue associated with that communicator and (2)
>>>>>>>>> nothing resembling mutual exclusion is required for the thread to
>>>>>>>>> update the message queue in the event that matching occurs.
>>>>>>>>>
>>>>>>>>> Forgive me if I've got some of the details wrong.  If I've got all of
>>>>>>>>> the details and the big picture wrong, then I'll think about it more.
>>>>>>>>>
>>>>>>>>> Jeff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mar 25, 2013, at 2:53 PM, Jeff Hammond wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     That doesn't do much for me in terms of enabling greater concurrency
>>>>>>>>>
>>>>>>>>>     in performance-critical operations.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     I'd like to propose that we try to make all of "Access Only", "Update
>>>>>>>>>
>>>>>>>>>     RefCount", "Read of List" and "None" thread safe in all cases.  All of
>>>>>>>>>
>>>>>>>>>     these are read-only except for "Update RefCount", but this can be done
>>>>>>>>>
>>>>>>>>>     with atomics.  I am assuming that concurrent reads are only permitted
>>>>>>>>>
>>>>>>>>>     to happen after the writing calls on the object have completed.  This
>>>>>>>>>
>>>>>>>>>     is the essence of MPI_THREAD_PER_OBJECT.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     Jeff
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     _______________________________________________
>>>>>>>>>
>>>>>>>>>     Mpi3-hybridpm mailing list
>>>>>>>>>
>>>>>>>>>     Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>>>     <mailto:Mpi3-hybridpm at lists.mpi-forum.org>
>>>>>>>>>
>>>>>>>>>     http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jeff Hammond
>>>>>>>>> Argonne Leadership Computing Facility
>>>>>>>>> University of Chicago Computation Institute
>>>>>>>>> jhammond at alcf.anl.gov <mailto:jhammond at alcf.anl.gov> / (630) 252-5381
>>>>>>>>> http://www.linkedin.com/in/jeffhammond
>>>>>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>>>>>>>> _______________________________________________
>>>>>>>>> Mpi3-hybridpm mailing list
>>>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org <mailto:Mpi3-hybridpm at lists.mpi-forum.org>
>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mpi3-hybridpm mailing list
>>>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Pavan Balaji
>>>>>>>> http://www.mcs.anl.gov/~balaji
>>>>>>>> _______________________________________________
>>>>>>>> Mpi3-hybridpm mailing list
>>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Pavan Balaji
>>>>>> http://www.mcs.anl.gov/~balaji
>>>>>
>>>>>
>>>>>
>>>>
>>>> --
>>>> Pavan Balaji
>>>> http://www.mcs.anl.gov/~balaji
>>>
>>>
>>>
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
> 
> 
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the mpiwg-hybridpm mailing list