[mpiwg-tools] [mpiwg-p2p] Message matching for tools

Thu Dec 17 15:44:15 CST 2015

Hi Daniel,

> In the existing MPI Standard, any program that relies on the tool 
> being able to recover the particular matching order that actually 
> happened from a set of possible matching orders that all satisfy
> the MPI definitions (for example by relying on the tool using some 
> externally imposed mechanism, such as piggy-backed sequence
> numbers) is relying on a particular sequentialisation of a
> race-condition between logically concurrent MPI messages, which is
> specifically called out as being an erroneous program.

I am a little confused on what the 'program' is and what the 'tool' is.

My tool is a program, and it relies on the reconstruction of the
message matching that actually occurred during measurement.

I agree that our tool would therefore be an erroneous program in the
MPI_THREAD_MULTIPLE case. This is what we want to fix. ;-)

> From MPI Forum's point-of-view, therefore, any behaviour in this 
> situation is allowed, including setting the data-centre on fire,
> and our work here is done.

From the P2P group's perspective, the work may be considered done.

From the tools group's perspective, there would be further work, as it
is our job to create interfaces that enable tools to support the full
MPI feature set.

Some of the proposed solutions may touch "P2P territory" again, which
is why I want to keep the group in the discussion.

> From a tool's perspective, however, the existence and consequences
> of this situation should be discovered and presented to the user
> [...] Alternatively, the existence of this situation could be
> discovered by examination of the trace during post-processing or
> analysis.

As a first step, a warning would be good, but ultimately we and also a
user would want a tool to be able to deal with such scenarios.

> The consequences of this situation could be presented to the user
> by showing all possible matching orders, as indicated by the
> information in the trace concerning which messages were logically
> concurrent during the actual run of the application. The tool would
> not be able to tell the user which matching order actually occurred
> during the measurement run but it would be able to identify that
> there was a race-condition, display all the possible outcomes, and
> simulate the effects of each possible route through the program.

The problem at hand is that the race-condition is in the analyzer, not
necessarily the application. The application may deal with ordering in
a different way, but a performance tool like ours needs to
reconstruct "what actually happened", as analysis results depend on it.

Cheers,
Marc-Andre
-- 
Marc-Andre Hermanns
Jülich Aachen Research Alliance,
High Performance Computing (JARA-HPC)
Jülich Supercomputing Centre (JSC)

Schinkelstrasse 2
52062 Aachen
Germany

Phone: +49 2461 61 2509 | +49 241 80 24381
Fax: +49 2461 80 6 99753
www.jara.org/jara-hpc
email: hermanns at jara.rwth-aachen.de

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4899 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-tools/attachments/20151217/cf323e85/attachment-0001.bin>