[mpiwg-hybridpm] End-point ambiguity with MPI_COMM_INTERCOMM_CREATE

Jeff Squyres (jsquyres) jsquyres at cisco.com
Tue May 13 09:29:13 CDT 2014


I'm sorry I missed the last hybrid call -- did this issue get discussed / incorporated into the endpoints proposal?


On May 7, 2014, at 1:02 PM, Daniel Holmes <dholmes at epcc.ed.ac.uk> wrote:

> Hi all,
> 
> The operation MPI_COMM_INTERCOMM_CREATE is defined in MPI-3 on pages 261-2.
> 
> The peer communicator is supplied to enable communication between the group leaders.
> This requires knowledge of the local and remote ranks in peer_comm.
> The remote_leader is the "rank of the remote group leader in peer_comm".
> However, the local_leader is the "rank of local group leader in local_comm".
> This must be translated to the appropriate rank in peer_comm.
> 
> Unfortunately, if peer_comm is an end-points communicator this rank may not be particularly well-defined.
> 
> Proposal: change the definition of the local_leader parameter to be "rank of local group leader in peer_comm".
> 
> ---
> 
> In the simplest case, the tie can be broken by examining local parameter values:
> 
> my_mcw_rank <- MPI_COMM_RANK(MPI_COMM_WORLD)
> ep_comm <- MPI_COMM_CREATE_ENDPOINTS(parent:=MPI_COMM_WORLD, my_num_ep:=2)
> if (my_mcw_rank == 10) {
>  // ep_comm[0] represents rank 20, is local_leader rank 20 or rank 21?
>  inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=MPI_COMM_SELF, local_leader:=0, peer_comm:=ep_comm[0], remote_leader:=30, tag:=1)
> } else if (my_mcw_rank == 15) {
>  // ep_comm[0] represents rank 30, is local_leader rank 30 or rank 31?
>  inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=MPI_COMM_SELF, local_leader:=0, peer_comm:=ep_comm[0], remote_leader:=20, tag:=1)
> }
> 
> In the above code, the local_leader (rank 0 in MPI_COMM_SELF) does not identify a unique rank in the peer communicator.
> Either of the two locally available communicator handles could be used as the local_leader but the choice is ambiguous.
> In this case, the communicator handle supplied as peer_comm can be used to break the tie.
> 
> In a more tricky case, collective communication using local_comm would be necessary to break the tie:
> 
> my_mcw_rank <- MPI_COMM_RANK(MPI_COMM_WORLD)
> split_comm <- MPI_COMM_SPLIT(MPI_COMM_WORLD, my_mcw_rank % 3, my_mcw_rank)
> ep_comm <- MPI_COMM_CREATE_ENDPOINTS(parent:=MPI_COMM_WORLD, my_num_ep:=2)
> if (my_mcw_rank % 3 == 1) {
>  // ep_comm[0] represents rank 2*my_mcw_rank, is local_leader (rank 3 in split_comm, rank 10 in mcw) rank 20 or rank 21 in peer_comm?
>  inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=split_comm, local_leader:=3, peer_comm:=ep_comm[0], remote_leader:=30, tag:=2)
> } else if (my_mcw_rank % 3 == 0) {
>  // ep_comm[0] represents rank 2*my_mcw_rank, is local_leader (rank 6 in split_comm, rank 15 in mcw) rank 30 or rank 31 in peer_comm?
>  inter_comm <- MPI_COMM_INTERCOMM_CREATE(local_comm:=split_comm, local_leader:=6, peer_comm:=ep_comm[0], remote_leader:=20, tag:=2)
> }
> 
> This code splits comm_world into three groups with keys 0, 1 and 2.
> Then it attempts to create an inter-communicator from the groups with keys 0 and 1.
> It uses a peer communicator which has two end-points per rank in comm_world.
> The specification of the local group leader is ambiguous in peer comm because there are two valid answers.
> Processes that determine they are in the split group with key 1 (MCW ranks {1,4,7,10,13,...}),
>  supply a local group leader identified by <split_comm, rank 3>
>  which is unambiguously equivalent to <comm_world, rank 10>
>  which could be either <ep_comm, rank 20> or <ep_comm, rank 21>
> This tie can be broken by broadcasting the rank of the peer_comm communicator handle supplied by the local group leader to all ranks in local_comm.
> 
> Cheers,
> Dan.
> 
> -- 
> Dan Holmes
> Applications Consultant in HPC Research
> EPCC, The University of Edinburgh
> James Clerk Maxwell Building
> The Kings Buildings
> Mayfield Road
> Edinburgh, UK
> EH9 3JZ
> T: +44(0)131 651 3465
> E:dholmes at epcc.ed.ac.uk
> 
> *Please consider the environment before printing this email.*
> 
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> mpiwg-hybridpm mailing list
> mpiwg-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-hybridpm


-- 
Jeff Squyres
jsquyres at cisco.com
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/




More information about the mpiwg-hybridpm mailing list