[mpiwg-p2p] Ordering of P2P messages in multithreaded applications
balaji at anl.gov
Sat Nov 24 11:40:11 CST 2018
That sentence is taken out of context. It only makes sense when you place it in the right context. Something like this:
“the semantics of thread execution may not define a relative order between two send operations executed by two distinct threads. [In such cases] The operations are logically concurrent, even if one physically precedes the other.”
Would this alternate text work:
“the semantics of threading runtime might or might not define a relative order between two send operations executed by two distinct threads. In such cases, unless the user performs additional synchronization to explicitly order the operations, they are considered to be logically concurrent.”
Sent from my iPhone
On Nov 24, 2018, at 11:33 AM, Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>> wrote:
On Nov 24, 2018, at 8:56 AM, Balaji, Pavan <balaji at anl.gov<mailto:balaji at anl.gov>> wrote:
I’m OK with adding additional text to clarify it.
FWIW, I still think the text is not ambiguous. In particular, it is simply warning the user that the thread execution may not define a relative order (as in, you might accidentally get some order because of the OS behavior). That does not mean that one cannot achieve a relative order using additional synchronization outside of MPI.
How is the relative order outside of MPI on which you intend to rely not the physical order referenced in the following?
The operations are logically concurrent, even if one physically precedes the other.
In any case, let’s just add some text as you suggested below instead of arguing about it.
It’s hard to know what the right fix is when we cannot agree on whether there is a problem in the first place.
Sent from my iPhone
On Nov 24, 2018, at 10:38 AM, Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>> wrote:
On Fri, Nov 23, 2018 at 2:59 PM Balaji, Pavan <balaji at anl.gov<mailto:balaji at anl.gov>> wrote:
> On Nov 23, 2018, at 4:11 AM, HOLMES Daniel <d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>> wrote:
> However, it is *also* a correct implementation choice to ignore that “physical order” even in this case because the MPI library does not know, and cannot determine, *why* that “physical order” happened.
I don't think this is a correct implementation and I'm not sure what part of the chapter is causing you to interpret this as a correct implementation. If there's algorithmic logic in the application to guarantee an order, then those operations are not logically concurrent. Although I'm happy to help clarify something that's unclear in the standard, I'm at a loss as to what is unclear here.
As I included before, this is the relevant text:
If a process has a single thread of execution, then any two communications executed by this process are ordered. On the other hand, if the process is multithreaded, then the semantics of thread execution may not define a relative order between two send operations executed by two distinct threads. The operations are logically concurrent, even if one physically precedes the other. In such a case, the two messages sent can be received in any order. Similarly, if two receive operations that are logically concurrent receive two successively sent messages, then the two messages can match the two receives in either order.
The problem with the text is that it does not state any means for the user to logically order operations on different threads. The explicit statement that physical order does not imply logical order means that users cannot rely on the order of thread execution alone.
The solution to this problem is to add text that indicates that the user can impart a logical order via thread synchronization primitives that order the execution of sends and weaken the problematic sentence such that it only applies when physical ordering is coincidental and not the result of any synchronization between threads.
FWIW, every implementation of MPI that I know of interprets the standard the way I stated it, i.e., those operations are not concurrent and the MPI library has to process them in the order that it sees it. Whether that is an explicit scheduling done by the user or is an accidental schedule created by the OS cannot be determined by the MPI library, so it better respect the order that it sees.
It would be good to look at MPI implementations that support multi-rail interconnects. How does MVAPICH2 mrail implement ordering in this case? Do they just use one rail per process or one rail per communicator?
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-p2p