[mpiwg-p2p] Ordering of P2P messages in multithreaded applications

HOLMES Daniel d.holmes at epcc.ed.ac.uk
Fri Nov 23 04:11:44 CST 2018


Hi Pavan,

1) The text clearly suffers from ambiguity of interpretation, because there are two different interpretations espoused in this email chain and both have defenders who are not easily swayed to the other position by a re-reading of the text in question.

2) Preserving the “physical order” of these operations as presented to the MPI library is a correct implementation choice. I believe there is no argument or ambiguity on that point. However, it is *also* a correct implementation choice to ignore that “physical order” even in this case because the MPI library does not know, and cannot determine, *why* that “physical order” happened. From the point-of-view of the MPI library, there is no difference between “A occurred before B because a critical section in user-code enforced it” and “A happened before B because of a non-deterministic thread scheduling accident”. Looking at the program as a whole, it is clear that the operations are not "physically concurrent”, e.g. interleaved or happens-before-in-either-order. But looking only at the information known to the MPI library (two operations with different thread ids), it is not possible to discriminate between happens-before-in-deterministic-order and happens-before-in-nondeterministic-order. Thus, the operations are “logically concurrent even if one physically precedes the other”. The thread id of the calling thread being different gives MPI the permission to ignore physical/chronological ordering, if it chooses to do so - or not, if it decides that is easier/better in some way. Only when the thread id is identical for different operations, is the MPI required to preserve physical/chronological ordering.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Applications Consultant in HPC Research
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—

On 23 Nov 2018, at 00:32, Balaji, Pavan via mpiwg-p2p <mpiwg-p2p at lists.mpi-forum.org<mailto:mpiwg-p2p at lists.mpi-forum.org>> wrote:

Jeff, all,

Sorry, SC took up all of my time so I couldn't respond earlier.

I'm not sure what the confusion is.  I don't think the text is ambiguous.  In the user program, the operations are *not* logically concurrent.  They are protected by a critical section.  Thus, I don't think the MPI implementation can ignore their ordering.

 -- Pavan

On Nov 15, 2018, at 11:09 AM, Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>> wrote:

Dan has convinced me that the MPI standard is terrible and, while my original interpretation is what we want and which is consistent with the principle of least surprise, it is not guaranteed by the following text.

Per our discussion, there are a few options:
1) make all MPI_Send logically concurrent, even on a single thread.  this will break stuff and make people sad.
2) force MPI to order injection <somehow>, which might for some implementations to add more memory ordering on the send path than they want, particularly if they do not have a TSO memory model.
3) add something like MPI_Win_sync that logically orders sends from multiple threads explicitly.
4) add MPI_THREAD_SERIALIZED_WITH_EXTRA_SAUCE that does the equivalent of 2 or 3 and thus doesn't cause a performance regression in MPI_THREAD_SERIALIZED.

Jeff
If a process has a single thread of execution, then any two communications executed by this process are ordered. On the other hand, if the process is multithreaded, then the semantics of thread execution may not define a relative order between two send operations executed by two distinct threads. The operations are logically concurrent, even if one physically precedes the other. In such a case, the two messages sent can be received in any order. Similarly, if two receive operations that are logically concurrent receive two successively sent messages, then the two messages can match the two receives in either order.


On Thu, Nov 15, 2018 at 10:55 AM Balaji, Pavan via mpiwg-p2p <mpiwg-p2p at lists.mpi-forum.org<mailto:mpiwg-p2p at lists.mpi-forum.org>> wrote:
Dan,

The matching *is* ordered in this case.  So the program will print 0 followed by 1.

MPI does not order delivery of the actual data, but the first message is guaranteed to go into the first buffer.  If the second message ends up going first, the MPI implementation will need to buffer it.

 — Pavan

Sent from my iPhone

On Nov 15, 2018, at 7:56 AM, HOLMES Daniel via mpiwg-p2p <mpiwg-p2p at lists.mpi-forum.org<mailto:mpiwg-p2p at lists.mpi-forum.org>> wrote:

Hi Joachim,

There is no guarantee of ordering between the two sends because they are logically concurrent. If they were issued on the same thread then MPI guarantees delivery order will be identical to the sequential issuing order.

Many MPI libraries are very likely to deliver these messages "in order”, that is, the first one to be called chronologically at the sender process is likely to leave first and therefore likely to arrive first. Interleaving execution of the sending threads may change the issuing order on the network and out-of-order networks may change the order of arrival.

On the other hand, if an MPI implementation is internally using sequence numbers (or a similar mechanism) to enforce ordering for the same-thread case, then it may also (incidentally) reconstruct the issuing order for this case. However, you cannot rely on this behaviour being portable from system to system or from MPI library to MPI library.

If you wish to enforce a particular ordering of these messages, then you can use tags to differentiate each from the other. There is an argument for always using tags in this type of situation to increase program readability.

Cheers,
Dan.
—
Dr Daniel Holmes PhD
Applications Consultant in HPC Research
d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
Phone: +44 (0) 131 651 3465
Mobile: +44 (0) 7940 524 088
Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
—
The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
—

On 15 Nov 2018, at 04:16, Joachim Protze via mpiwg-p2p <mpiwg-p2p at lists.mpi-forum.org> wrote:

Hi all,

I have a question on the "Semantics of Point-to-Point Communication" in a multithreaded context.

For me the situation for the code below is not clear, especially with respect to the paragraph in MPI-3.1 p.41, l.10-17 :


void test(int rank) {
int msg = 0;
if (rank == 0) {
#pragma omp parallel num_threads(2)
#pragma omp critical
  {
    MPI_Send(&msg, 1, MPI_INT, 1, 42, MPI_COMM_WORLD);
    msg++;
  }
} else if (rank == 1) {
  MPI_Recv(&msg, 1, MPI_INT, 0, 42, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
  printf("Received %i\n", msg);
  MPI_Recv(&msg, 1, MPI_INT, 0, 42, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
  printf("Received %i\n", msg);
}
}

Two threads on the first process send a message, the first thread sends 0, the second thread send 1. From OpenMP semantics, the first send happens before the second send.

Is there a guarantee, that the other process receives the 0 first?

Thanks,
Joachim


--
Dipl.-Inf. Joachim Protze

IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
D 52074  Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
protze at itc.rwth-aachen.de
www.itc.rwth-aachen.de

_______________________________________________
mpiwg-p2p mailing list
mpiwg-p2p at lists.mpi-forum.org
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-p2p

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
mpiwg-p2p mailing list
mpiwg-p2p at lists.mpi-forum.org
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-p2p
_______________________________________________
mpiwg-p2p mailing list
mpiwg-p2p at lists.mpi-forum.org<mailto:mpiwg-p2p at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-p2p


--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/

_______________________________________________
mpiwg-p2p mailing list
mpiwg-p2p at lists.mpi-forum.org<mailto:mpiwg-p2p at lists.mpi-forum.org>
https://lists.mpi-forum.org/mailman/listinfo/mpiwg-p2p

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-p2p/attachments/20181123/5de473f7/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-p2p/attachments/20181123/5de473f7/attachment-0001.ksh>


More information about the mpiwg-p2p mailing list