[mpiwg-tools] [mpiwg-p2p] Message matching for tools
sato5 at llnl.gov
Fri Dec 18 18:32:38 CST 2015
Thank you for the replay.
We analyze on the same scale as the measurement, thus we have one
thread per thread-local trace. Each thread processes its own
thread-local trace. When encountering a communication event, it
re-enacts this communication using the recorded communication
parameters (rank, tag, comm). A send event leads to an issued
send, a receive event leads to an issued receive.
(1) Replaying receive events Papers about “parallel replay (or
record-and-replay)” uses (rank, rag, comm) for correct replay of
message receive orders. Unfortunately, (rank, tag, comm) cannot
replay message receive orders even in MPI_THREAD_SINGLE ** In
general ** (Of course, it may work in particular case). You need
to record (rank, message_id_number), and actually (tag, comm) does
not work for this purpose. The details is described in Section 3.1
of this paper ( http://dl.acm.org/citation.cfm?id=2807642 ).
I think we (Scalasca) does not have requirements as strict as the ones
outlined in the paper. We need only to ensure that the same
send/receive pairs also exchange data during the analysis (ideal case)
or that we can at least detect a mismatch (in case of logically
concurrent messages) and fix it locally, by exchanging the mixed up
If I understand section 3.1 correctly, the problem with out-of-order
receives (Figure 3) does not pose a problem in our case, as we only
care that msg1 is matched by req1 and msg2 is matched by req2, both in
during measurement and replay. MPI ordering semantics should take care
If I correctly understand, you only need to ensure the same send/receive pairs,
and you ** DO NOT ** care about the message receive orders.
However, if you do not replay the message receive orders,
you cannot correctly replay the same send/receive pairs
because send destinations can change by message receive orders.
For example, an application, e.g., N-body, changes send destinations according to some numerical result (X).
In floating-point arithmetic, (a+b)+c is not necessary equal to a+(b+c).
So the numerical result (X=a+b+c) can change according to message receive orders(a, b & c).
It results in non-deterministic send destinations, thereby, non-deterministic send/recv pairs.
I know it’s really corner case.
So if detecting mismatch is find at least, this problem would be negligible for you.
Kento Sato | Center for Applied Scientific Computing (CASC) | Lawrence Livermore National Laboratory (LLNL) | http://people.llnl.gov/sato5 |
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-tools