[Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE

Jeff Hammond jhammond at alcf.anl.gov
Sun Mar 24 14:36:33 CDT 2013

Martin encouraged me to socialize this with the Forum.  The idea here
seems broader than just one working group so I'm sending to the entire
Forum for feedback.

See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372.  The text
is included below in case it makes it easier to read on your
phone...because I know this is that urgent :-)


---------- Forwarded message ----------
From: MPI Forum <mpi-forum at lists.mpi-forum.org>
Date: Sun, Mar 24, 2013 at 1:50 PM
Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE

#372: additional keys for MPI_COMM_SPLIT_TYPE
                     Reporter:       |                       Owner:
  jhammond                           |                      Status:  new
                         Type:       |                   Milestone:
  Enhancements to standard           |  2013/03/11 Chicago, USA
                     Priority:       |                    Keywords:
  Forum feedback requested           |          Author: Bill Gropp:  0
                      Version:  MPI  |          Author: Adam Moody:  0
  <next>                             |       Author: Dick Treumann:  0
        Implementation status:       |      Author: George Bosilca:  0
  Waiting                            |  Author: Bronis de Supinski:  0
          Author: Rich Graham:  0    |        Author: Jeff Squyres:  0
      Author: Torsten Hoefler:  0    |   Author: Rolf Rabenseifner:  0
Author: Jesper Larsson Traeff:  0    |
           Author: David Solt:  0    |
        Author: Rajeev Thakur:  0    |
    Author: Alexander Supalov:  0    |
 {{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
 {{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory domains, as
 supported by shared memory windows ({{{MPI_WIN_ALLOCATE_SHARED}}}).

 This ticket proposes additional keys that appear useful enough to justify

 The first key address the need for users to have a portable way of
 querying properties of the filesystem.  This key requires the user to
 specify the specific file path of interest using an MPI_Info object.  The
 communicator returned represents the set of processes that can write to a
 single instance of that file path.  For a local disk, it is likely (but
 not necessary) that this communicator be the same as returned by
 {{{MPI_COMM_TYPE_SHARED}}}.  On the other hand, globally shared
 filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}.  When the
 implementation cannot determine the answer, the resulting communicator is
 {{{MPI_COMM_NULL}}} and users cannot assume any information about the

 To be perfectly honest, the choice of {{{key = MPI_COMM_TYPE_SHARED}}} to
 be only used for shared-memory is unfortunately, because we could have
 instead used this key for anything that could be shared and let the
 differences be enumerated by {{{MPI_Info}}}.  For exampled, shared memory
 windows could use {{{(key,value)=("memory","shared")}}} (granted, this is
 a somewhat silly set of options, but it would naturally permit one to use
 e.g. {{{(key,value)=("filepath","/home")}}} and
 {{{(key,value)=("device","gpu0")}}}.  We could implement the new set of
 options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
 mean "shared memory" but that isn't particularly appealing.

 In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated below, I
 think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
 without specifying the possible options that the {{{MPI_Info}}} can take,
 with the exception of noting that, if this key is used with
 {{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}.  Devices
 that implementations could define include coprocessors (aka accelerators
 or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any number
 of other machine-specific cases.  The purpose of standardizing the key is
 to encourage implementations to take the most portable route when trying
 to implement this type of feature, thus confining all of the non-portable
 aspects to the definition of the {{{(key,value)}}} pair in the info

 I have also considered  {{{MPI_COMM_TYPE_LOCALE}}}, which would also allow
 the implementation to define info keys to specific, e.g. subcomms sharing
 an IO node on Cray or Blue Gene, but this could just as easily be
 implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key.  Furthermore, since
 devices can be specified as a filepath (they often have an associated with
 a {{{/dev/something}}} on Linux systems), there is no compelling reason to
 add more than one key.  This is, of course, the reason why overloading
 {{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
 straightforward choice except in regards to backwards compatibility.

 What do people think?  Can we augment {{{MPI_COMM_TYPE_SHARED}}} with info
 keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared memory, do
 we break backwards compatibility or do we add a new key that has the
 desired catch-all properties?

 Here is an example program illustrating the use of the proposed
 functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:

 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <mpi.h>

 int main( int argc, char *argv[] )
     int rank;
     MPI_Info i1, i2, i3, i4, i5;
     MPI_Comm c0, c1, c2, c3, c4, c5;
     int result, happy=0;

     MPI_Comm_rank(MPI_COMM_WORLD, &rank);

     MPI_Info_create( &i1 );
     MPI_Info_create( &i2 );
     MPI_Info_create( &i3 );
     MPI_Info_create( &i4 );
     MPI_Info_create( &i5 );

     MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
     MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
     MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
     MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
     MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );

     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
 MPI_INFO_NULL, &c0 );

     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i1,
 &c1 );
     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i2,
 &c2 );
     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i3,
 &c3 );
     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i4,
 &c4 );
     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i5,
 &c5 );

     /* a globally visible shared filesystem should result in a comm that
 is equivalent to world */
     MPI_Comm_compare( MPI_COMM_WORLD, c1, &result );
     if ( result == MPI_CONGRUENT) happy++;

     /* a process-local filesystem should result in MPI_COMM_SELF */
     MPI_Comm_compare( MPI_COMM_SELF, c2, &result );
     if ( result == MPI_CONGRUENT) happy++;

     /* a filesystem shared within the node is likely to result in a
 communicator equivalent
         to the one that supports shared memory, provided shared memory is
 available */
     MPI_Comm_compare( c0, c3, &result );
     if ( result == MPI_CONGRUENT) happy++;

     /* the /proc-zero-fs is only visible from rank 0 of world... */
     if (rank==0) {
        MPI_Comm_compare( MPI_COMM_SELF, c4, &result );
        if ( result == MPI_CONGRUENT) happy++;
     } else {
        if ( c4 == MPI_COMM_NULL) happy++;

     /* the sharable nature of /dev/rand is probably a meaningless concept
         we expect the implementation to return MPI_COMM_NULL for c5 */
     if ( c5 == MPI_COMM_NULL) happy++;

     MPI_Comm_free( &c1 );
     MPI_Comm_free( &c2 );
     MPI_Comm_free( &c3 );
     MPI_Comm_free( &c4 );
     MPI_Comm_free( &c5 );

     MPI_Info_free( &i1 );
     MPI_Info_free( &i2 );
     MPI_Info_free( &i3 );
     MPI_Info_free( &i4 );
     MPI_Info_free( &i5 );

     MPI_Finalize( );
     return 0;

Ticket URL: <https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
MPI Forum <https://svn.mpi-forum.org/>
MPI Forum

Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381

More information about the mpi-forum mailing list