[mpiwg-hybridpm] End-point ambiguity with MPI_COMM_INTERCOMM_CREATE
Daniel Holmes
dholmes at epcc.ed.ac.uk
Wed May 7 12:02:19 CDT 2014
Hi all,
The operation MPI_COMM_INTERCOMM_CREATE is defined in MPI-3 on pages 261-2.
The peer communicator is supplied to enable communication between the
group leaders.
This requires knowledge of the local and remote ranks in peer_comm.
The remote_leader is the "rank of the remote group leader in peer_comm".
However, the local_leader is the "rank of local group leader in local_comm".
This must be translated to the appropriate rank in peer_comm.
Unfortunately, if peer_comm is an end-points communicator this rank may
not be particularly well-defined.
Proposal: change the definition of the local_leader parameter to be
"rank of local group leader in peer_comm".
---
In the simplest case, the tie can be broken by examining local parameter
values:
my_mcw_rank <- MPI_COMM_RANK(MPI_COMM_WORLD)
ep_comm <- MPI_COMM_CREATE_ENDPOINTS(parent:=MPI_COMM_WORLD, my_num_ep:=2)
if (my_mcw_rank == 10) {
// ep_comm[0] represents rank 20, is local_leader rank 20 or rank 21?
inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=MPI_COMM_SELF,
local_leader:=0, peer_comm:=ep_comm[0], remote_leader:=30, tag:=1)
} else if (my_mcw_rank == 15) {
// ep_comm[0] represents rank 30, is local_leader rank 30 or rank 31?
inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=MPI_COMM_SELF,
local_leader:=0, peer_comm:=ep_comm[0], remote_leader:=20, tag:=1)
}
In the above code, the local_leader (rank 0 in MPI_COMM_SELF) does not
identify a unique rank in the peer communicator.
Either of the two locally available communicator handles could be used
as the local_leader but the choice is ambiguous.
In this case, the communicator handle supplied as peer_comm can be used
to break the tie.
In a more tricky case, collective communication using local_comm would
be necessary to break the tie:
my_mcw_rank <- MPI_COMM_RANK(MPI_COMM_WORLD)
split_comm <- MPI_COMM_SPLIT(MPI_COMM_WORLD, my_mcw_rank % 3, my_mcw_rank)
ep_comm <- MPI_COMM_CREATE_ENDPOINTS(parent:=MPI_COMM_WORLD, my_num_ep:=2)
if (my_mcw_rank % 3 == 1) {
// ep_comm[0] represents rank 2*my_mcw_rank, is local_leader (rank 3
in split_comm, rank 10 in mcw) rank 20 or rank 21 in peer_comm?
inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=split_comm,
local_leader:=3, peer_comm:=ep_comm[0], remote_leader:=30, tag:=2)
} else if (my_mcw_rank % 3 == 0) {
// ep_comm[0] represents rank 2*my_mcw_rank, is local_leader (rank 6
in split_comm, rank 15 in mcw) rank 30 or rank 31 in peer_comm?
inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=split_comm,
local_leader:=6, peer_comm:=ep_comm[0], remote_leader:=20, tag:=2)
}
This code splits comm_world into three groups with keys 0, 1 and 2.
Then it attempts to create an inter-communicator from the groups with
keys 0 and 1.
It uses a peer communicator which has two end-points per rank in comm_world.
The specification of the local group leader is ambiguous in peer comm
because there are two valid answers.
Processes that determine they are in the split group with key 1 (MCW
ranks {1,4,7,10,13,...}),
supply a local group leader identified by <split_comm, rank 3>
which is unambiguously equivalent to <comm_world, rank 10>
which could be either <ep_comm, rank 20> or <ep_comm, rank 21>
This tie can be broken by broadcasting the rank of the peer_comm
communicator handle supplied by the local group leader to all ranks in
local_comm.
Cheers,
Dan.
--
Dan Holmes
Applications Consultant in HPC Research
EPCC, The University of Edinburgh
James Clerk Maxwell Building
The Kings Buildings
Mayfield Road
Edinburgh, UK
EH9 3JZ
T: +44(0)131 651 3465
E:dholmes at epcc.ed.ac.uk
*Please consider the environment before printing this email.*
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
More information about the mpiwg-hybridpm
mailing list