[Mpi3-tools] Meaning of meaning of unique_id in mqs_communicator?
Ashley Pittman
ashley at pittman.co.uk
Mon May 11 05:47:30 CDT 2009
All,
I've raised this point before on the anl mpi-debugger mailing list
however I'd still like resolution to the issue so I hope it's not
in-appropriate to raise it again in this forum.
The problem I have is that when inspecting a parallel job in a debugger
I find I'm unable to match communicators across MPI processes so it's
impossible to tell which communicators in process A match which
communicators in process B.
Each MPI process reports the existence of a number of communicators to
the debugger and through the unique_id field of the mqs_communicator
type it reports a ID for this communicator. It's the meaning of this ID
which I'd like clarified.
There are three possible interpretations for this field.
(a) The ID could be unique to this process, typically a pointer to some
internal struct.
(b) The ID could be unique to this process and also common to all
members of the communicator, regardless of the MPI process on which they
reside.
(c) The ID could be both unique to this process, common to all members
of the communicator and also unique to this communicator.
Current MPI implementation's seem to favour (a) however this doesn't
give the debugger enough information to match communicators in one
process with communicators in a different process.
I believe the intention of the code when written was (b) and this does
allow the debugger to match communicators across processes.
This issue presents me with real problems, I have a collective state
inspection tool which is capable of detecting deadlock and spotting
errant processes however without the ability to match communicators
across processes it's functionality is severely crippled.
If possible could we discuss this on the call with a view to clarifying
the current meaning and if necessary devising other means of exporting
the required information.
Ashley Pittman,
More information about the mpiwg-tools
mailing list