[Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE
Jeff Hammond
jhammond at alcf.anl.gov
Sun Mar 24 14:36:33 CDT 2013
Martin encouraged me to socialize this with the Forum. The idea here
seems broader than just one working group so I'm sending to the entire
Forum for feedback.
See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372. The text
is included below in case it makes it easier to read on your
phone...because I know this is that urgent :-)
Jeff
---------- Forwarded message ----------
From: MPI Forum <mpi-forum at lists.mpi-forum.org>
Date: Sun, Mar 24, 2013 at 1:50 PM
Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE
To:
#372: additional keys for MPI_COMM_SPLIT_TYPE
-------------------------------------+-------------------------------------
Reporter: | Owner:
jhammond | Status: new
Type: | Milestone:
Enhancements to standard | 2013/03/11 Chicago, USA
Priority: | Keywords:
Forum feedback requested | Author: Bill Gropp: 0
Version: MPI | Author: Adam Moody: 0
<next> | Author: Dick Treumann: 0
Implementation status: | Author: George Bosilca: 0
Waiting | Author: Bronis de Supinski: 0
Author: Rich Graham: 0 | Author: Jeff Squyres: 0
Author: Torsten Hoefler: 0 | Author: Rolf Rabenseifner: 0
Author: Jesper Larsson Traeff: 0 |
Author: David Solt: 0 |
Author: Rajeev Thakur: 0 |
Author: Alexander Supalov: 0 |
-------------------------------------+-------------------------------------
{{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
{{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory domains, as
supported by shared memory windows ({{{MPI_WIN_ALLOCATE_SHARED}}}).
This ticket proposes additional keys that appear useful enough to justify
standardization.
The first key address the need for users to have a portable way of
querying properties of the filesystem. This key requires the user to
specify the specific file path of interest using an MPI_Info object. The
communicator returned represents the set of processes that can write to a
single instance of that file path. For a local disk, it is likely (but
not necessary) that this communicator be the same as returned by
{{{MPI_COMM_TYPE_SHARED}}}. On the other hand, globally shared
filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}. When the
implementation cannot determine the answer, the resulting communicator is
{{{MPI_COMM_NULL}}} and users cannot assume any information about the
path.
To be perfectly honest, the choice of {{{key = MPI_COMM_TYPE_SHARED}}} to
be only used for shared-memory is unfortunately, because we could have
instead used this key for anything that could be shared and let the
differences be enumerated by {{{MPI_Info}}}. For exampled, shared memory
windows could use {{{(key,value)=("memory","shared")}}} (granted, this is
a somewhat silly set of options, but it would naturally permit one to use
e.g. {{{(key,value)=("filepath","/home")}}} and
{{{(key,value)=("device","gpu0")}}}. We could implement the new set of
options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
mean "shared memory" but that isn't particularly appealing.
In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated below, I
think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
without specifying the possible options that the {{{MPI_Info}}} can take,
with the exception of noting that, if this key is used with
{{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}. Devices
that implementations could define include coprocessors (aka accelerators
or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any number
of other machine-specific cases. The purpose of standardizing the key is
to encourage implementations to take the most portable route when trying
to implement this type of feature, thus confining all of the non-portable
aspects to the definition of the {{{(key,value)}}} pair in the info
object.
I have also considered {{{MPI_COMM_TYPE_LOCALE}}}, which would also allow
the implementation to define info keys to specific, e.g. subcomms sharing
an IO node on Cray or Blue Gene, but this could just as easily be
implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key. Furthermore, since
devices can be specified as a filepath (they often have an associated with
a {{{/dev/something}}} on Linux systems), there is no compelling reason to
add more than one key. This is, of course, the reason why overloading
{{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
straightforward choice except in regards to backwards compatibility.
What do people think? Can we augment {{{MPI_COMM_TYPE_SHARED}}} with info
keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared memory, do
we break backwards compatibility or do we add a new key that has the
desired catch-all properties?
Here is an example program illustrating the use of the proposed
functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:
{{{
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>
int main( int argc, char *argv[] )
{
int rank;
MPI_Info i1, i2, i3, i4, i5;
MPI_Comm c0, c1, c2, c3, c4, c5;
int result, happy=0;
MPI_Init(&argc,&argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Info_create( &i1 );
MPI_Info_create( &i2 );
MPI_Info_create( &i3 );
MPI_Info_create( &i4 );
MPI_Info_create( &i5 );
MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );
MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
MPI_INFO_NULL, &c0 );
MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i1,
&c1 );
MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i2,
&c2 );
MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i3,
&c3 );
MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i4,
&c4 );
MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i5,
&c5 );
/* a globally visible shared filesystem should result in a comm that
is equivalent to world */
MPI_Comm_compare( MPI_COMM_WORLD, c1, &result );
if ( result == MPI_CONGRUENT) happy++;
/* a process-local filesystem should result in MPI_COMM_SELF */
MPI_Comm_compare( MPI_COMM_SELF, c2, &result );
if ( result == MPI_CONGRUENT) happy++;
/* a filesystem shared within the node is likely to result in a
communicator equivalent
to the one that supports shared memory, provided shared memory is
available */
MPI_Comm_compare( c0, c3, &result );
if ( result == MPI_CONGRUENT) happy++;
/* the /proc-zero-fs is only visible from rank 0 of world... */
if (rank==0) {
MPI_Comm_compare( MPI_COMM_SELF, c4, &result );
if ( result == MPI_CONGRUENT) happy++;
} else {
if ( c4 == MPI_COMM_NULL) happy++;
}
/* the sharable nature of /dev/rand is probably a meaningless concept
so
we expect the implementation to return MPI_COMM_NULL for c5 */
if ( c5 == MPI_COMM_NULL) happy++;
MPI_Comm_free( &c1 );
MPI_Comm_free( &c2 );
MPI_Comm_free( &c3 );
MPI_Comm_free( &c4 );
MPI_Comm_free( &c5 );
MPI_Info_free( &i1 );
MPI_Info_free( &i2 );
MPI_Info_free( &i3 );
MPI_Info_free( &i4 );
MPI_Info_free( &i5 );
MPI_Finalize( );
return 0;
}
}}}
--
Ticket URL: <https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
MPI Forum <https://svn.mpi-forum.org/>
MPI Forum
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
More information about the mpi-forum
mailing list