[Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE

Jeff Hammond jhammond at alcf.anl.gov
Mon Mar 25 12:28:00 CDT 2013


Hi Sayantan,

Per Martin and Torsten, I've moved discussion of this issue to
mpi3-coll at lists.mpi-forum.org.  I'll respond to you on that list.

Best,

Jeff

On Mon, Mar 25, 2013 at 12:24 PM, Sur, Sayantan <sayantan.sur at intel.com> wrote:
> Does it make sense for some of these keys (esp. that deal with devices, multi-rail, paths etc.) to be part of the MPI Side documents?
>
> Sayantan
>
>
>> -----Original Message-----
>> From: mpi-forum-bounces at lists.mpi-forum.org [mailto:mpi-forum-
>> bounces at lists.mpi-forum.org] On Behalf Of Jim Dinan
>> Sent: Monday, March 25, 2013 7:16 AM
>> To: mpi-forum at lists.mpi-forum.org
>> Subject: Re: [Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE
>>
>> Hi Jeff,
>>
>> Please also include ticket #297 in the discussion (merge with your ticket, or
>> otherwise):
>>
>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/297
>>
>> This ticket proposes MPI_COMM_TYPE_NEIGHBORHOOD, which looks similar
>> to MPI_COMM_TYPE_LOCALE.
>>
>>   ~Jim.
>>
>> On 3/24/13 2:36 PM, Jeff Hammond wrote:
>> > Martin encouraged me to socialize this with the Forum.  The idea here
>> > seems broader than just one working group so I'm sending to the entire
>> > Forum for feedback.
>> >
>> > See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372.  The text
>> > is included below in case it makes it easier to read on your
>> > phone...because I know this is that urgent :-)
>> >
>> > Jeff
>> >
>> >
>> > ---------- Forwarded message ----------
>> > From: MPI Forum <mpi-forum at lists.mpi-forum.org>
>> > Date: Sun, Mar 24, 2013 at 1:50 PM
>> > Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE
>> > To:
>> >
>> >
>> > #372: additional keys for MPI_COMM_SPLIT_TYPE
>> > -------------------------------------+--------------------------------
>> > -------------------------------------+-----
>> >                       Reporter:       |                       Owner:
>> >    jhammond                           |                      Status:  new
>> >                           Type:       |                   Milestone:
>> >    Enhancements to standard           |  2013/03/11 Chicago, USA
>> >                       Priority:       |                    Keywords:
>> >    Forum feedback requested           |          Author: Bill Gropp:  0
>> >                        Version:  MPI  |          Author: Adam Moody:  0
>> >    <next>                             |       Author: Dick Treumann:  0
>> >          Implementation status:       |      Author: George Bosilca:  0
>> >    Waiting                            |  Author: Bronis de Supinski:  0
>> >            Author: Rich Graham:  0    |        Author: Jeff Squyres:  0
>> >        Author: Torsten Hoefler:  0    |   Author: Rolf Rabenseifner:  0
>> > Author: Jesper Larsson Traeff:  0    |
>> >             Author: David Solt:  0    |
>> >          Author: Rajeev Thakur:  0    |
>> >      Author: Alexander Supalov:  0    |
>> > -------------------------------------+--------------------------------
>> > -------------------------------------+-----
>> >   {{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
>> >   {{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory
>> domains, as
>> >   supported by shared memory windows
>> ({{{MPI_WIN_ALLOCATE_SHARED}}}).
>> >
>> >   This ticket proposes additional keys that appear useful enough to justify
>> >   standardization.
>> >
>> >   The first key address the need for users to have a portable way of
>> >   querying properties of the filesystem.  This key requires the user to
>> >   specify the specific file path of interest using an MPI_Info object.  The
>> >   communicator returned represents the set of processes that can write to
>> a
>> >   single instance of that file path.  For a local disk, it is likely (but
>> >   not necessary) that this communicator be the same as returned by
>> >   {{{MPI_COMM_TYPE_SHARED}}}.  On the other hand, globally shared
>> >   filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}.  When
>> the
>> >   implementation cannot determine the answer, the resulting
>> communicator is
>> >   {{{MPI_COMM_NULL}}} and users cannot assume any information about
>> the
>> >   path.
>> >
>> >   To be perfectly honest, the choice of {{{key =
>> MPI_COMM_TYPE_SHARED}}} to
>> >   be only used for shared-memory is unfortunately, because we could have
>> >   instead used this key for anything that could be shared and let the
>> >   differences be enumerated by {{{MPI_Info}}}.  For exampled, shared
>> memory
>> >   windows could use {{{(key,value)=("memory","shared")}}} (granted, this
>> is
>> >   a somewhat silly set of options, but it would naturally permit one to use
>> >   e.g. {{{(key,value)=("filepath","/home")}}} and
>> >   {{{(key,value)=("device","gpu0")}}}.  We could implement the new set of
>> >   options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
>> >   mean "shared memory" but that isn't particularly appealing.
>> >
>> >   In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated
>> below, I
>> >   think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
>> >   without specifying the possible options that the {{{MPI_Info}}} can take,
>> >   with the exception of noting that, if this key is used with
>> >   {{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}.
>> Devices
>> >   that implementations could define include coprocessors (aka accelerators
>> >   or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any number
>> >   of other machine-specific cases.  The purpose of standardizing the key is
>> >   to encourage implementations to take the most portable route when
>> trying
>> >   to implement this type of feature, thus confining all of the non-portable
>> >   aspects to the definition of the {{{(key,value)}}} pair in the info
>> >   object.
>> >
>> >   I have also considered  {{{MPI_COMM_TYPE_LOCALE}}}, which would also
>> allow
>> >   the implementation to define info keys to specific, e.g. subcomms sharing
>> >   an IO node on Cray or Blue Gene, but this could just as easily be
>> >   implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key.
>> Furthermore, since
>> >   devices can be specified as a filepath (they often have an associated with
>> >   a {{{/dev/something}}} on Linux systems), there is no compelling reason
>> to
>> >   add more than one key.  This is, of course, the reason why overloading
>> >   {{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
>> >   straightforward choice except in regards to backwards compatibility.
>> >
>> >   What do people think?  Can we augment {{{MPI_COMM_TYPE_SHARED}}}
>> with info
>> >   keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared
>> memory, do
>> >   we break backwards compatibility or do we add a new key that has the
>> >   desired catch-all properties?
>> >
>> >   Here is an example program illustrating the use of the proposed
>> >   functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:
>> >   {{{
>> >
>> >   #include <stdio.h>
>> >   #include <stdlib.h>
>> >   #include <string.h>
>> >   #include <mpi.h>
>> >
>> >   int main( int argc, char *argv[] )
>> >   {
>> >       int rank;
>> >       MPI_Info i1, i2, i3, i4, i5;
>> >       MPI_Comm c0, c1, c2, c3, c4, c5;
>> >       int result, happy=0;
>> >
>> >       MPI_Init(&argc,&argv);
>> >       MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> >
>> >       MPI_Info_create( &i1 );
>> >       MPI_Info_create( &i2 );
>> >       MPI_Info_create( &i3 );
>> >       MPI_Info_create( &i4 );
>> >       MPI_Info_create( &i5 );
>> >
>> >       MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
>> >       MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
>> >       MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
>> >       MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
>> >       MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );
>> >
>> >       MPI_Comm_split_type( MPI_COMM_WORLD,
>> MPI_COMM_TYPE_SHARED, 0,
>> >   MPI_INFO_NULL, &c0 );
>> >
>> >       MPI_Comm_split_type( MPI_COMM_WORLD,
>> MPI_COMM_TYPE_FILEPATH, 0, i1,
>> >   &c1 );
>> >       MPI_Comm_split_type( MPI_COMM_WORLD,
>> MPI_COMM_TYPE_FILEPATH, 0, i2,
>> >   &c2 );
>> >       MPI_Comm_split_type( MPI_COMM_WORLD,
>> MPI_COMM_TYPE_FILEPATH, 0, i3,
>> >   &c3 );
>> >       MPI_Comm_split_type( MPI_COMM_WORLD,
>> MPI_COMM_TYPE_FILEPATH, 0, i4,
>> >   &c4 );
>> >       MPI_Comm_split_type( MPI_COMM_WORLD,
>> MPI_COMM_TYPE_FILEPATH, 0, i5,
>> >   &c5 );
>> >
>> >       /* a globally visible shared filesystem should result in a comm that
>> >   is equivalent to world */
>> >       MPI_Comm_compare( MPI_COMM_WORLD, c1, &result );
>> >       if ( result == MPI_CONGRUENT) happy++;
>> >
>> >       /* a process-local filesystem should result in MPI_COMM_SELF */
>> >       MPI_Comm_compare( MPI_COMM_SELF, c2, &result );
>> >       if ( result == MPI_CONGRUENT) happy++;
>> >
>> >       /* a filesystem shared within the node is likely to result in a
>> >   communicator equivalent
>> >           to the one that supports shared memory, provided shared memory is
>> >   available */
>> >       MPI_Comm_compare( c0, c3, &result );
>> >       if ( result == MPI_CONGRUENT) happy++;
>> >
>> >       /* the /proc-zero-fs is only visible from rank 0 of world... */
>> >       if (rank==0) {
>> >          MPI_Comm_compare( MPI_COMM_SELF, c4, &result );
>> >          if ( result == MPI_CONGRUENT) happy++;
>> >       } else {
>> >          if ( c4 == MPI_COMM_NULL) happy++;
>> >       }
>> >
>> >       /* the sharable nature of /dev/rand is probably a meaningless concept
>> >   so
>> >           we expect the implementation to return MPI_COMM_NULL for c5 */
>> >       if ( c5 == MPI_COMM_NULL) happy++;
>> >
>> >       MPI_Comm_free( &c1 );
>> >       MPI_Comm_free( &c2 );
>> >       MPI_Comm_free( &c3 );
>> >       MPI_Comm_free( &c4 );
>> >       MPI_Comm_free( &c5 );
>> >
>> >       MPI_Info_free( &i1 );
>> >       MPI_Info_free( &i2 );
>> >       MPI_Info_free( &i3 );
>> >       MPI_Info_free( &i4 );
>> >       MPI_Info_free( &i5 );
>> >
>> >       MPI_Finalize( );
>> >       return 0;
>> >   }
>> >   }}}
>> >
>> > --
>> > Ticket URL: <https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
>> > MPI Forum <https://svn.mpi-forum.org/> MPI Forum
>> >
>> >
>> _______________________________________________
>> mpi-forum mailing list
>> mpi-forum at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond



More information about the mpi-forum mailing list