[Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE

Schulz, Martin schulzm at llnl.gov
Sun Mar 24 15:00:19 CDT 2013


Hi Jeff, all,

On Mar 24, 2013, at 12:36 PM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:

> Martin encouraged me to socialize this with the Forum.  The idea here
> seems broader than just one working group so I'm sending to the entire
> Forum for feedback.

I still think it would be good to have one WG take responsibility for this discussion. It seems to fit best into the collectives WG (and/or the communicator/groups chapter). Torsten: as the lead for the collectives group, does this sound reasonable to you?

Thanks,

Martin


> 
> See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372.  The text
> is included below in case it makes it easier to read on your
> phone...because I know this is that urgent :-)
> 
> Jeff
> 
> 
> ---------- Forwarded message ----------
> From: MPI Forum <mpi-forum at lists.mpi-forum.org>
> Date: Sun, Mar 24, 2013 at 1:50 PM
> Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE
> To:
> 
> 
> #372: additional keys for MPI_COMM_SPLIT_TYPE
> -------------------------------------+-------------------------------------
>                     Reporter:       |                       Owner:
>  jhammond                           |                      Status:  new
>                         Type:       |                   Milestone:
>  Enhancements to standard           |  2013/03/11 Chicago, USA
>                     Priority:       |                    Keywords:
>  Forum feedback requested           |          Author: Bill Gropp:  0
>                      Version:  MPI  |          Author: Adam Moody:  0
>  <next>                             |       Author: Dick Treumann:  0
>        Implementation status:       |      Author: George Bosilca:  0
>  Waiting                            |  Author: Bronis de Supinski:  0
>          Author: Rich Graham:  0    |        Author: Jeff Squyres:  0
>      Author: Torsten Hoefler:  0    |   Author: Rolf Rabenseifner:  0
> Author: Jesper Larsson Traeff:  0    |
>           Author: David Solt:  0    |
>        Author: Rajeev Thakur:  0    |
>    Author: Alexander Supalov:  0    |
> -------------------------------------+-------------------------------------
> {{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
> {{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory domains, as
> supported by shared memory windows ({{{MPI_WIN_ALLOCATE_SHARED}}}).
> 
> This ticket proposes additional keys that appear useful enough to justify
> standardization.
> 
> The first key address the need for users to have a portable way of
> querying properties of the filesystem.  This key requires the user to
> specify the specific file path of interest using an MPI_Info object.  The
> communicator returned represents the set of processes that can write to a
> single instance of that file path.  For a local disk, it is likely (but
> not necessary) that this communicator be the same as returned by
> {{{MPI_COMM_TYPE_SHARED}}}.  On the other hand, globally shared
> filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}.  When the
> implementation cannot determine the answer, the resulting communicator is
> {{{MPI_COMM_NULL}}} and users cannot assume any information about the
> path.
> 
> To be perfectly honest, the choice of {{{key = MPI_COMM_TYPE_SHARED}}} to
> be only used for shared-memory is unfortunately, because we could have
> instead used this key for anything that could be shared and let the
> differences be enumerated by {{{MPI_Info}}}.  For exampled, shared memory
> windows could use {{{(key,value)=("memory","shared")}}} (granted, this is
> a somewhat silly set of options, but it would naturally permit one to use
> e.g. {{{(key,value)=("filepath","/home")}}} and
> {{{(key,value)=("device","gpu0")}}}.  We could implement the new set of
> options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
> mean "shared memory" but that isn't particularly appealing.
> 
> In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated below, I
> think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
> without specifying the possible options that the {{{MPI_Info}}} can take,
> with the exception of noting that, if this key is used with
> {{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}.  Devices
> that implementations could define include coprocessors (aka accelerators
> or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any number
> of other machine-specific cases.  The purpose of standardizing the key is
> to encourage implementations to take the most portable route when trying
> to implement this type of feature, thus confining all of the non-portable
> aspects to the definition of the {{{(key,value)}}} pair in the info
> object.
> 
> I have also considered  {{{MPI_COMM_TYPE_LOCALE}}}, which would also allow
> the implementation to define info keys to specific, e.g. subcomms sharing
> an IO node on Cray or Blue Gene, but this could just as easily be
> implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key.  Furthermore, since
> devices can be specified as a filepath (they often have an associated with
> a {{{/dev/something}}} on Linux systems), there is no compelling reason to
> add more than one key.  This is, of course, the reason why overloading
> {{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
> straightforward choice except in regards to backwards compatibility.
> 
> What do people think?  Can we augment {{{MPI_COMM_TYPE_SHARED}}} with info
> keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared memory, do
> we break backwards compatibility or do we add a new key that has the
> desired catch-all properties?
> 
> Here is an example program illustrating the use of the proposed
> functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:
> {{{
> 
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <mpi.h>
> 
> int main( int argc, char *argv[] )
> {
>     int rank;
>     MPI_Info i1, i2, i3, i4, i5;
>     MPI_Comm c0, c1, c2, c3, c4, c5;
>     int result, happy=0;
> 
>     MPI_Init(&argc,&argv);
>     MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> 
>     MPI_Info_create( &i1 );
>     MPI_Info_create( &i2 );
>     MPI_Info_create( &i3 );
>     MPI_Info_create( &i4 );
>     MPI_Info_create( &i5 );
> 
>     MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
>     MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
>     MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
>     MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
>     MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );
> 
>     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
> MPI_INFO_NULL, &c0 );
> 
>     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i1,
> &c1 );
>     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i2,
> &c2 );
>     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i3,
> &c3 );
>     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i4,
> &c4 );
>     MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i5,
> &c5 );
> 
>     /* a globally visible shared filesystem should result in a comm that
> is equivalent to world */
>     MPI_Comm_compare( MPI_COMM_WORLD, c1, &result );
>     if ( result == MPI_CONGRUENT) happy++;
> 
>     /* a process-local filesystem should result in MPI_COMM_SELF */
>     MPI_Comm_compare( MPI_COMM_SELF, c2, &result );
>     if ( result == MPI_CONGRUENT) happy++;
> 
>     /* a filesystem shared within the node is likely to result in a
> communicator equivalent
>         to the one that supports shared memory, provided shared memory is
> available */
>     MPI_Comm_compare( c0, c3, &result );
>     if ( result == MPI_CONGRUENT) happy++;
> 
>     /* the /proc-zero-fs is only visible from rank 0 of world... */
>     if (rank==0) {
>        MPI_Comm_compare( MPI_COMM_SELF, c4, &result );
>        if ( result == MPI_CONGRUENT) happy++;
>     } else {
>        if ( c4 == MPI_COMM_NULL) happy++;
>     }
> 
>     /* the sharable nature of /dev/rand is probably a meaningless concept
> so
>         we expect the implementation to return MPI_COMM_NULL for c5 */
>     if ( c5 == MPI_COMM_NULL) happy++;
> 
>     MPI_Comm_free( &c1 );
>     MPI_Comm_free( &c2 );
>     MPI_Comm_free( &c3 );
>     MPI_Comm_free( &c4 );
>     MPI_Comm_free( &c5 );
> 
>     MPI_Info_free( &i1 );
>     MPI_Info_free( &i2 );
>     MPI_Info_free( &i3 );
>     MPI_Info_free( &i4 );
>     MPI_Info_free( &i5 );
> 
>     MPI_Finalize( );
>     return 0;
> }
> }}}
> 
> --
> Ticket URL: <https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
> MPI Forum <https://svn.mpi-forum.org/>
> MPI Forum
> 
> 
> -- 
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum

________________________________________________________________________
Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
CASC @ Lawrence Livermore National Laboratory, Livermore, USA







More information about the mpi-forum mailing list