[mpiwg-p2p] Message matching for tools

Thu Dec 17 17:16:05 CST 2015

On Thu, Dec 17, 2015 at 4:32 AM, Marc-Andre Hermanns <
hermanns at jara.rwth-aachen.de> wrote:

> Hi Jeff,
>
> at the moment we don't handle MPI_THREAD_MULTIPLE at all. But we want
> to get there ;-)
>
>
You should vote for endpoints, as this may help you out here, particularly
if users start mapping endpoints 1:1 w/ threads.

> Here is a short recollection of what we do/need. Sorry for the folks
> who know/read this already in other context:
>
> We use, what we call "parallel replay" to analyze large event traces
> in parallel. Each thread has its own stream of events, such as enter
> and exit for tracking the calling context as well as send and receive
> for communication among ranks.
>
> We analyze on the same scale as the measurement, thus we have one
> thread per thread-local trace. Each thread processes its own
> thread-local trace. When encountering a communication event, it
> re-enacts this communication using the recorded communication
> parameters (rank, tag, comm). A send event leads to an issued send, a
> receive event leads to an issued receive.
>
> It is critical that during the analysis, the message matching is
> identical to the original application. However, we do not re-enact any
> computational time, that is the temporal distance between sends and
> receives is certainly different from the original application. As a
> consequence, while two sends may have some significant temporal
> distance in the original measurement, they could be issued right after
> each other during the analysis.
>
> Markus Geimer and I believe that creating some sort of a sequence
> number during measurement could help matching the right messages
> during the analysis, as a process could detect that it got mismatched
> messages and communicate with other threads to get the correct one.
>
>
> It is unclear, however, how to achieve this:
>
> a) Sending an additional message within the MPI wrapper at measurement
> time may lead to invalid matchings, as the additional message may be
> received by a different thread.
>
> b) Creating a derived datatype on the fly to add tool-level data to
> the original payload may induce a large overhead in practically
> _every_ send & receive operation and perturb the measurement.
>
>
You should evaluate this experimentally.  I wrote a simple test (
https://github.com/jeffhammond/BigMPI/blob/master/test/perf/typepiggy.c)
and measured 1.5 us per call of overhead to create a datatype.  That is not
significant except for very small messages.

Jeff

The best idea in the room so far is some callback mechanism into the
> MPI implementation that generates matching information both on sender
> and receiver side to generate some form of a sequence number that can
> then be saved during measurement. If available on both sender and
> receiver this information could then be used to fix incorrect matching
> during the analysis.
>
> Cheers,
> Marc-Andre
>
> On 16.12.15 16:40, Jeff Hammond wrote:
> > How do you handle MPI_THREAD_MULTIPLE?  Understanding what your tool
> > does there is a good starting point for this discussion.
> >
> > Jeff
> >
> > On Wed, Dec 16, 2015 at 1:37 AM, Marc-Andre Hermanns
> > <hermanns at jara.rwth-aachen.de <mailto:hermanns at jara.rwth-aachen.de>>
> > wrote:
> >
> >     Hi all,
> >
> >     CC: Tools-WG, Markus Geimer (not on either list)
> >
> >     sorry for starting a new thread and being so verbose, but I
> subscribed
> >     just now. I quoted Dan, Jeff, and Jim from the archive as
> appropriate.
> >
> >     First, let me state that we do not want to prevent this assertion in
> >     any way. For us as tools provider it is just quite a brain tickler on
> >     how to support this in our tool and in general.
> >
> >     Dan wrote:
> >     >>> [...] The basic problem is that message matching would be
> >     >>> non-deterministic and it would be impossible for a tool to show
> >     >>> the user which receive operation satisfied which send operation
> >     >>> without internally using some sort of sequence number for each
> >     >>> send/receive operation. [...]
> >     >>>
> >     >>> My responses were:
> >     >>> 1) the user asked for this behaviour so the tool could simply
> >     >>> gracefully give up the linking function and just state the
> >     >>> information it knows
> >     >
> >     Giving up can only be a temporary solution for tools. The user wants
> >     to use this advanced feature, thus just saying: "Hey, what you're
> >     doing is too sophisticated for us. You are on your own now." is not a
> >     viable long-term strategy.
> >
> >     >>> 2) the tool could hook all send and receive operations and
> >     >>> piggy-back a sequence number into the message header
> >
> >     We discussed piggy-backing within the tools group some time in the
> >     past, but never came to a satisfying way of how to implement this.
> If,
> >     in the process of reviving the discussion on a piggy-backing
> >     interface, we come to a viable solution, it would certainly help with
> >     the our issues with message matching in general.
> >
> >     Scalasca's problem here is that we need to detect (and partly
> >     recreate) the exact order of message matching to have the correct
> >     message reach the right receivers.
> >
> >     >>> 3) the tool could hook all send and receive operations and
> >     >>> serialise them to prevent overtaking
> >
> >     This is not an option for me. A "performance tool" should strive to
> >     measure as close to the original behavior as possible. Changing
> >     communication semantics just to make a tool "work" would have too
> >     great of an impact on application behavior. After all, if it would
> >     have only little impact, why should the user choose this option in
> the
> >     first place.
> >
> >     Jeff wrote:
> >     >> Remember that one of the use cases of allow_overtaking is
> >     applications that
> >     >> have exact matching, in which case allow_overtaking is a way of
> >     turning off
> >     >> a feature that isn't used, in order to get a high-performing
> >     message queue
> >     >> implementation. In the exact matching case, tools will have no
> >     problem
> >     >> matching up sends and recvs.
> >
> >     This is true. If the tools can identify this scenario, it could be
> >     supported by current tools without significant change. However, as it
> >     is not generally forbidden to have inexact matching (right?), it is
> >     unclear on how the tools would detect this.
> >
> >     What about an additional info key a user can set in this respect:
> >
> >     exact_matching => true/false
> >
> >     in which the user can state whether it is indeed a scenario of exact
> >     matching or not. The tool could check this, and issue a warning.
> >
> >     >> If tools cannot handle MPI_THREAD_MULTIPLE already, then I
> >     don't really
> >     >> care if they can't support this assertion either.
> >
> >     Not handling MPI_THREAD_MULTIPLE generally is not carved in stone.
> ;-)
> >
> >     As I said, we (Markus and I) see this as a trigger to come to a
> viable
> >     solution for tools like ours to support either situation.
> >
> >     >> And in any case, such tools can just intercept the info
> >     operations and
> >     >> strip this key if they can't support it.
> >
> >     As I wrote above in reply to Dan, stripping options that influence
> >     behavior is not a good option. I, personally, would rather bail out
> >     than (silently) change messaging semantics. I can't say what Markus'
> >     take on this is.
> >
> >     Jim wrote:
> >     > I don't really see any necessary fix to the proposal. We could
> >     add an
> >     > advice to users to remind them that they should ensure tools are
> >     compatible
> >     > with the info keys. And the reverse advice to tools writers that
> >     they
> >     > should check info keys for compatibility.
> >
> >     I would second this idea, while emphasizing the burden to be on the
> >     tool to check for this info key (and potentially others) and warn the
> >     user of "undersupport".
> >
> >     Cheers,
> >     Marc-Andre
> >     --
> >     Marc-Andre Hermanns
> >     Jülich Aachen Research Alliance,
> >     High Performance Computing (JARA-HPC)
> >     Jülich Supercomputing Centre (JSC)
> >
> >     Schinkelstrasse 2
> >     52062 Aachen
> >     Germany
> >
> >     Phone: +49 2461 61 2509 | +49 241 80 24381
> >     Fax: +49 2461 80 6 99753
> >     www.jara.org/jara-hpc <http://www.jara.org/jara-hpc>
> >     email: hermanns at jara.rwth-aachen.de
> >     <mailto:hermanns at jara.rwth-aachen.de>
> >
> >
> >     _______________________________________________
> >     mpiwg-p2p mailing list
> >     mpiwg-p2p at lists.mpi-forum.org <mailto:mpiwg-p2p at lists.mpi-forum.org>
> >     http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-p2p
> >
> >
> >
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> > http://jeffhammond.github.io/
> >
> >
> > _______________________________________________
> > mpiwg-p2p mailing list
> > mpiwg-p2p at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-p2p
> >
>
> --
> Marc-Andre Hermanns
> Jülich Aachen Research Alliance,
> High Performance Computing (JARA-HPC)
> Jülich Supercomputing Centre (JSC)
>
> Schinkelstrasse 2
> 52062 Aachen
> Germany
>
> Phone: +49 2461 61 2509 | +49 241 80 24381
> Fax: +49 2461 80 6 99753
> www.jara.org/jara-hpc
> email: hermanns at jara.rwth-aachen.de
>
>
> _______________________________________________
> mpiwg-p2p mailing list
> mpiwg-p2p at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-p2p
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-p2p/attachments/20151217/a667046e/attachment-0001.html>