[Mpi3-hybridpm] [EXTERNAL] Re: Threading homeworking / next telecon

Jeff Hammond jhammond at alcf.anl.gov
Tue Mar 26 07:05:41 CDT 2013


You'll have to explain in person because I still don't see why the
following doesn't work.

int MPIX_Comm_split_inherit_info(MPI_Comm comm, int color, int key,
MPI_Comm * newcomm)
{
  /* assuming MPI_ERRORS_ARE_FATAL... */
  MPI_Info info;
  MPI_Comm_get_info(comm, &info);
  MPI_Comm_split(comm, color, key, newcomm);
  MPI_Comm_set_info(*newcomm, info);
  MPI_Info_free(&info);
  return MPI_SUCCESS;
}

I did not bother to implement attribute copying because I don't
understand how those work yet and because I am in favor of using
MPI_Info to tell the implementation about communicator team progress
anyways.

Best,

Jeff

On Mon, Mar 25, 2013 at 9:56 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> Jeff,
>
> One of the goals of the original proposal was to use this in an MPI+UPC
> environment.  If there was inheritance, the UPC runtime could dup
> COMM_WORLD as many times as there are UPC threads and just hand them to
> each thread as a new "COMM_WORLD", say upc_comm[] (each element of this
> array would be a comm specific to a UPC thread).  Without inheritance,
> this becomes tricky since each new communicator creation has to be
> tracked and setup explicitly using Comm_set_info.  This is especially
> hard when you have stacked libraries.
>
> I can explain in person why we rejected this model, if it's still not clear.
>
>  -- Pavan
>
> On 03/25/2013 09:29 PM US Central Time, Jeff Hammond wrote:
>> Hi Pavan,
>>
>> I am confused why one cannot use MPI_Comm_get_info+MPI_Comm_set_info
>> to inherit this information.  I found
>> http://meetings.mpi-forum.org/secretary/2012/12/slides/mpi31-hybrid.pptx
>> online and it seems that there is some issue with this method but I
>> cannot determine it from the slides.  Can you elaborate on what is the
>> problem with this approach?
>>
>> Thanks,
>>
>> Jeff
>>
>> On Mon, Mar 25, 2013 at 8:19 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>
>>> I guess you could do that.  In our case, it was still not helpful as we
>>> needed the inheritance to be automatic, once an upper-layer (such as
>>> UPC) passes a 'comm' as an alternative to MPI_COMM_WORLD.
>>>
>>>  -- Pavan
>>>
>>> On 03/25/2013 07:47 PM US Central Time, Jeff Hammond wrote:
>>>> Could the MPI_Info kv-pair not associate a communicator with a
>>>> collection of communicators upon which progress was made
>>>> simultaneously?  If the key is "communicator team" and the value is an
>>>> integer indexing said teams, can one not create such groups?
>>>>
>>>> Jeff
>>>>
>>>> On Mon, Mar 25, 2013 at 7:41 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>>
>>>>> The problem was that this wasn't allow us to create a group of
>>>>> communicators on which progress is made.  Each communicator was
>>>>> independent of everything else.
>>>>>
>>>>> However, our goal was allowing each "UPC thread" to be an MPI rank,
>>>>> where all threads share that rank.  Your goal is different, so this
>>>>> might or might not be a concern for you.
>>>>>
>>>>>  -- Pavan
>>>>>
>>>>> On 03/25/2013 07:39 PM US Central Time, Jeff Hammond wrote:
>>>>>> Why can't a user do MPI_COMM_SET_INFO explicitly every time they want
>>>>>> per-communicator semantics?
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> On Mon, Mar 25, 2013 at 7:30 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>>>>>>>
>>>>>>> FWIW, we discussed a similar in the hybrid WG a few meetings ago.  The
>>>>>>> main reason why we didn't go down that path was because per-communicator
>>>>>>> semantics are not fully inherited for child communicators.  For example,
>>>>>>> split does not inherit info arguments or communicator attributes, while
>>>>>>> dup does.
>>>>>>>
>>>>>>>  -- Pavan
>>>>>>>
>>>>>>> On 03/25/2013 05:31 PM US Central Time, Sur, Sayantan wrote:
>>>>>>>> This is interesting. It might be useful for implementers if the app
>>>>>>>> could inform the MPI library that in its usage model, per-communicator
>>>>>>>> queues might lead to a performance benefit. Such as in the case of many
>>>>>>>> threads (among others).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Info key? Assert?
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sayantan
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *From:*mpi3-hybridpm-bounces at lists.mpi-forum.org
>>>>>>>> [mailto:mpi3-hybridpm-bounces at lists.mpi-forum.org] *On Behalf Of
>>>>>>>> *William Gropp
>>>>>>>> *Sent:* Monday, March 25, 2013 2:24 PM
>>>>>>>> *To:* mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>> *Subject:* Re: [Mpi3-hybridpm] [EXTERNAL] Re: Threading homeworking /
>>>>>>>> next telecon
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> An implementation is free to use separate queues for each communicator;
>>>>>>>> some of us have discussed this in the past, in part to permit use of
>>>>>>>> lock-free structures for the queue updates, particularly as this is the
>>>>>>>> only place there are no wild cards, ever.  I believe that this is within
>>>>>>>> the existing semantics.  It even has benefits for single threaded
>>>>>>>> execution, since the communicator matching is done once, rather than in
>>>>>>>> every query on the queue.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> In terms of progress, the standard is deliberately vague on the details,
>>>>>>>> and thus I don't believe we have the requirement that you quote.  And
>>>>>>>> some of the other interpretations of progress would not be helped by any
>>>>>>>> thread-safety restriction.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Bill
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> William Gropp
>>>>>>>>
>>>>>>>> Director, Parallel Computing Institute
>>>>>>>>
>>>>>>>> Deputy Director for Research
>>>>>>>>
>>>>>>>> Institute for Advanced Computing Applications and Technologies
>>>>>>>>
>>>>>>>> Thomas M. Siebel Chair in Computer Science
>>>>>>>>
>>>>>>>> University of Illinois Urbana-Champaign
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 25, 2013, at 4:15 PM, Jeff Hammond wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Mar 25, 2013 at 3:17 PM, William Gropp <wgropp at illinois.edu
>>>>>>>> <mailto:wgropp at illinois.edu>> wrote:
>>>>>>>>
>>>>>>>> I was only addressing the issue of calling the thread level routines before
>>>>>>>>
>>>>>>>>     knowing what thread level you had.
>>>>>>>>
>>>>>>>>
>>>>>>>> Okay, sorry, I cannot tell which tickets people are referring to since
>>>>>>>> I have a bunch of different ones right now.
>>>>>>>>
>>>>>>>>
>>>>>>>> I'm not sure what you are looking for.  In the case of MPI_THREAD_MULTIPLE,
>>>>>>>>
>>>>>>>>     an implementation can provide significant concurrency today without any
>>>>>>>>
>>>>>>>>     change in the MPI standard - that's a major reason for that table
>>>>>>>>     (more to
>>>>>>>>
>>>>>>>>     the point - this table is meant as a guide for not using locks).
>>>>>>>>      Can you
>>>>>>>>
>>>>>>>>     give me an example of something that the current MPI semantics prohibits
>>>>>>>>
>>>>>>>>     that you'd like to achieve with MPI_THREAD_PER_OBJECT?
>>>>>>>>
>>>>>>>>
>>>>>>>> It is my understanding of the progress requirements that any call to
>>>>>>>> MPI must make progress on all MPI operations.  This means that two
>>>>>>>> threads calling e.g. MPI_Recv must walk all of the message queues.  If
>>>>>>>> a thread needs to modify any queue because it matches, then this must
>>>>>>>> be done in a thread-safe way, which presumably requires something
>>>>>>>> resembling mutual exclusion or transactions.  If a call to MPI_Recv
>>>>>>>> only had to make progress on its own communicator, then two threads
>>>>>>>> calling MPI_Recv on two different communicators would (1) only have to
>>>>>>>> walk the message queue associated with that communicator and (2)
>>>>>>>> nothing resembling mutual exclusion is required for the thread to
>>>>>>>> update the message queue in the event that matching occurs.
>>>>>>>>
>>>>>>>> Forgive me if I've got some of the details wrong.  If I've got all of
>>>>>>>> the details and the big picture wrong, then I'll think about it more.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mar 25, 2013, at 2:53 PM, Jeff Hammond wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     That doesn't do much for me in terms of enabling greater concurrency
>>>>>>>>
>>>>>>>>     in performance-critical operations.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     I'd like to propose that we try to make all of "Access Only", "Update
>>>>>>>>
>>>>>>>>     RefCount", "Read of List" and "None" thread safe in all cases.  All of
>>>>>>>>
>>>>>>>>     these are read-only except for "Update RefCount", but this can be done
>>>>>>>>
>>>>>>>>     with atomics.  I am assuming that concurrent reads are only permitted
>>>>>>>>
>>>>>>>>     to happen after the writing calls on the object have completed.  This
>>>>>>>>
>>>>>>>>     is the essence of MPI_THREAD_PER_OBJECT.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     Jeff
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>     _______________________________________________
>>>>>>>>
>>>>>>>>     Mpi3-hybridpm mailing list
>>>>>>>>
>>>>>>>>     Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>>     <mailto:Mpi3-hybridpm at lists.mpi-forum.org>
>>>>>>>>
>>>>>>>>     http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Hammond
>>>>>>>> Argonne Leadership Computing Facility
>>>>>>>> University of Chicago Computation Institute
>>>>>>>> jhammond at alcf.anl.gov <mailto:jhammond at alcf.anl.gov> / (630) 252-5381
>>>>>>>> http://www.linkedin.com/in/jeffhammond
>>>>>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>>>>>>> _______________________________________________
>>>>>>>> Mpi3-hybridpm mailing list
>>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org <mailto:Mpi3-hybridpm at lists.mpi-forum.org>
>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mpi3-hybridpm mailing list
>>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Pavan Balaji
>>>>>>> http://www.mcs.anl.gov/~balaji
>>>>>>> _______________________________________________
>>>>>>> Mpi3-hybridpm mailing list
>>>>>>> Mpi3-hybridpm at lists.mpi-forum.org
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Pavan Balaji
>>>>> http://www.mcs.anl.gov/~balaji
>>>>
>>>>
>>>>
>>>
>>> --
>>> Pavan Balaji
>>> http://www.mcs.anl.gov/~balaji
>>
>>
>>
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond



More information about the mpiwg-hybridpm mailing list