[mpiwg-tools] [mpiwg-p2p] Message matching for tools
Jeff Hammond
jeff.science at gmail.com
Thu Dec 17 17:16:05 CST 2015
On Thu, Dec 17, 2015 at 4:32 AM, Marc-Andre Hermanns <
hermanns at jara.rwth-aachen.de> wrote:
> Hi Jeff,
>
> at the moment we don't handle MPI_THREAD_MULTIPLE at all. But we want
> to get there ;-)
>
>
You should vote for endpoints, as this may help you out here, particularly
if users start mapping endpoints 1:1 w/ threads.
> Here is a short recollection of what we do/need. Sorry for the folks
> who know/read this already in other context:
>
> We use, what we call "parallel replay" to analyze large event traces
> in parallel. Each thread has its own stream of events, such as enter
> and exit for tracking the calling context as well as send and receive
> for communication among ranks.
>
> We analyze on the same scale as the measurement, thus we have one
> thread per thread-local trace. Each thread processes its own
> thread-local trace. When encountering a communication event, it
> re-enacts this communication using the recorded communication
> parameters (rank, tag, comm). A send event leads to an issued send, a
> receive event leads to an issued receive.
>
> It is critical that during the analysis, the message matching is
> identical to the original application. However, we do not re-enact any
> computational time, that is the temporal distance between sends and
> receives is certainly different from the original application. As a
> consequence, while two sends may have some significant temporal
> distance in the original measurement, they could be issued right after
> each other during the analysis.
>
> Markus Geimer and I believe that creating some sort of a sequence
> number during measurement could help matching the right messages
> during the analysis, as a process could detect that it got mismatched
> messages and communicate with other threads to get the correct one.
>
>
> It is unclear, however, how to achieve this:
>
> a) Sending an additional message within the MPI wrapper at measurement
> time may lead to invalid matchings, as the additional message may be
> received by a different thread.
>
> b) Creating a derived datatype on the fly to add tool-level data to
> the original payload may induce a large overhead in practically
> _every_ send & receive operation and perturb the measurement.
>
>
You should evaluate this experimentally. I wrote a simple test (
https://github.com/jeffhammond/BigMPI/blob/master/test/perf/typepiggy.c)
and measured 1.5 us per call of overhead to create a datatype. That is not
significant except for very small messages.
Jeff
The best idea in the room so far is some callback mechanism into the
> MPI implementation that generates matching information both on sender
> and receiver side to generate some form of a sequence number that can
> then be saved during measurement. If available on both sender and
> receiver this information could then be used to fix incorrect matching
> during the analysis.
>
> Cheers,
> Marc-Andre
>
> On 16.12.15 16:40, Jeff Hammond wrote:
> > How do you handle MPI_THREAD_MULTIPLE? Understanding what your tool
> > does there is a good starting point for this discussion.
> >
> > Jeff
> >
> > On Wed, Dec 16, 2015 at 1:37 AM, Marc-Andre Hermanns
> > <hermanns at jara.rwth-aachen.de <mailto:hermanns at jara.rwth-aachen.de>>
> > wrote:
> >
> > Hi all,
> >
> > CC: Tools-WG, Markus Geimer (not on either list)
> >
> > sorry for starting a new thread and being so verbose, but I
> subscribed
> > just now. I quoted Dan, Jeff, and Jim from the archive as
> appropriate.
> >
> > First, let me state that we do not want to prevent this assertion in
> > any way. For us as tools provider it is just quite a brain tickler on
> > how to support this in our tool and in general.
> >
> > Dan wrote:
> > >>> [...] The basic problem is that message matching would be
> > >>> non-deterministic and it would be impossible for a tool to show
> > >>> the user which receive operation satisfied which send operation
> > >>> without internally using some sort of sequence number for each
> > >>> send/receive operation. [...]
> > >>>
> > >>> My responses were:
> > >>> 1) the user asked for this behaviour so the tool could simply
> > >>> gracefully give up the linking function and just state the
> > >>> information it knows
> > >
> > Giving up can only be a temporary solution for tools. The user wants
> > to use this advanced feature, thus just saying: "Hey, what you're
> > doing is too sophisticated for us. You are on your own now." is not a
> > viable long-term strategy.
> >
> > >>> 2) the tool could hook all send and receive operations and
> > >>> piggy-back a sequence number into the message header
> >
> > We discussed piggy-backing within the tools group some time in the
> > past, but never came to a satisfying way of how to implement this.
> If,
> > in the process of reviving the discussion on a piggy-backing
> > interface, we come to a viable solution, it would certainly help with
> > the our issues with message matching in general.
> >
> > Scalasca's problem here is that we need to detect (and partly
> > recreate) the exact order of message matching to have the correct
> > message reach the right receivers.
> >
> > >>> 3) the tool could hook all send and receive operations and
> > >>> serialise them to prevent overtaking
> >
> > This is not an option for me. A "performance tool" should strive to
> > measure as close to the original behavior as possible. Changing
> > communication semantics just to make a tool "work" would have too
> > great of an impact on application behavior. After all, if it would
> > have only little impact, why should the user choose this option in
> the
> > first place.
> >
> > Jeff wrote:
> > >> Remember that one of the use cases of allow_overtaking is
> > applications that
> > >> have exact matching, in which case allow_overtaking is a way of
> > turning off
> > >> a feature that isn't used, in order to get a high-performing
> > message queue
> > >> implementation. In the exact matching case, tools will have no
> > problem
> > >> matching up sends and recvs.
> >
> > This is true. If the tools can identify this scenario, it could be
> > supported by current tools without significant change. However, as it
> > is not generally forbidden to have inexact matching (right?), it is
> > unclear on how the tools would detect this.
> >
> > What about an additional info key a user can set in this respect:
> >
> > exact_matching => true/false
> >
> > in which the user can state whether it is indeed a scenario of exact
> > matching or not. The tool could check this, and issue a warning.
> >
> > >> If tools cannot handle MPI_THREAD_MULTIPLE already, then I
> > don't really
> > >> care if they can't support this assertion either.
> >
> > Not handling MPI_THREAD_MULTIPLE generally is not carved in stone.
> ;-)
> >
> > As I said, we (Markus and I) see this as a trigger to come to a
> viable
> > solution for tools like ours to support either situation.
> >
> > >> And in any case, such tools can just intercept the info
> > operations and
> > >> strip this key if they can't support it.
> >
> > As I wrote above in reply to Dan, stripping options that influence
> > behavior is not a good option. I, personally, would rather bail out
> > than (silently) change messaging semantics. I can't say what Markus'
> > take on this is.
> >
> > Jim wrote:
> > > I don't really see any necessary fix to the proposal. We could
> > add an
> > > advice to users to remind them that they should ensure tools are
> > compatible
> > > with the info keys. And the reverse advice to tools writers that
> > they
> > > should check info keys for compatibility.
> >
> > I would second this idea, while emphasizing the burden to be on the
> > tool to check for this info key (and potentially others) and warn the
> > user of "undersupport".
> >
> > Cheers,
> > Marc-Andre
> > --
> > Marc-Andre Hermanns
> > Jülich Aachen Research Alliance,
> > High Performance Computing (JARA-HPC)
> > Jülich Supercomputing Centre (JSC)
> >
> > Schinkelstrasse 2
> > 52062 Aachen
> > Germany
> >
> > Phone: +49 2461 61 2509 | +49 241 80 24381
> > Fax: +49 2461 80 6 99753
> > www.jara.org/jara-hpc <http://www.jara.org/jara-hpc>
> > email: hermanns at jara.rwth-aachen.de
> > <mailto:hermanns at jara.rwth-aachen.de>
> >
> >
> > _______________________________________________
> > mpiwg-p2p mailing list
> > mpiwg-p2p at lists.mpi-forum.org <mailto:mpiwg-p2p at lists.mpi-forum.org>
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-p2p
> >
> >
> >
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> > http://jeffhammond.github.io/
> >
> >
> > _______________________________________________
> > mpiwg-p2p mailing list
> > mpiwg-p2p at lists.mpi-forum.org
> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-p2p
> >
>
> --
> Marc-Andre Hermanns
> Jülich Aachen Research Alliance,
> High Performance Computing (JARA-HPC)
> Jülich Supercomputing Centre (JSC)
>
> Schinkelstrasse 2
> 52062 Aachen
> Germany
>
> Phone: +49 2461 61 2509 | +49 241 80 24381
> Fax: +49 2461 80 6 99753
> www.jara.org/jara-hpc
> email: hermanns at jara.rwth-aachen.de
>
>
> _______________________________________________
> mpiwg-p2p mailing list
> mpiwg-p2p at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-p2p
>
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-tools/attachments/20151217/a667046e/attachment-0001.html>
More information about the mpiwg-tools
mailing list