[mpi3-coll] [Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE

Torsten Hoefler htor at inf.ethz.ch
Sun Mar 24 15:15:27 CDT 2013


Martin,

Sure, as Jeff suggests, we shouldn't limit the discussion but we can 
discuss it in the context of the colls&topo WG.

All the Best,
	Torsten

On 03/24/2013 09:00 PM, Schulz, Martin wrote:
> Hi Jeff, all,
>
> On Mar 24, 2013, at 12:36 PM, Jeff Hammond<jhammond at alcf.anl.gov>  wrote:
>
>> Martin encouraged me to socialize this with the Forum.  The idea here
>> seems broader than just one working group so I'm sending to the entire
>> Forum for feedback.
>
> I still think it would be good to have one WG take responsibility for this discussion. It seems to fit best into the collectives WG (and/or the communicator/groups chapter). Torsten: as the lead for the collectives group, does this sound reasonable to you?
>
> Thanks,
>
> Martin
>
>
>>
>> See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372.  The text
>> is included below in case it makes it easier to read on your
>> phone...because I know this is that urgent :-)
>>
>> Jeff
>>
>>
>> ---------- Forwarded message ----------
>> From: MPI Forum<mpi-forum at lists.mpi-forum.org>
>> Date: Sun, Mar 24, 2013 at 1:50 PM
>> Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE
>> To:
>>
>>
>> #372: additional keys for MPI_COMM_SPLIT_TYPE
>> -------------------------------------+-------------------------------------
>>                      Reporter:       |                       Owner:
>>   jhammond                           |                      Status:  new
>>                          Type:       |                   Milestone:
>>   Enhancements to standard           |  2013/03/11 Chicago, USA
>>                      Priority:       |                    Keywords:
>>   Forum feedback requested           |          Author: Bill Gropp:  0
>>                       Version:  MPI  |          Author: Adam Moody:  0
>>   <next>                              |       Author: Dick Treumann:  0
>>         Implementation status:       |      Author: George Bosilca:  0
>>   Waiting                            |  Author: Bronis de Supinski:  0
>>           Author: Rich Graham:  0    |        Author: Jeff Squyres:  0
>>       Author: Torsten Hoefler:  0    |   Author: Rolf Rabenseifner:  0
>> Author: Jesper Larsson Traeff:  0    |
>>            Author: David Solt:  0    |
>>         Author: Rajeev Thakur:  0    |
>>     Author: Alexander Supalov:  0    |
>> -------------------------------------+-------------------------------------
>> {{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
>> {{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory domains, as
>> supported by shared memory windows ({{{MPI_WIN_ALLOCATE_SHARED}}}).
>>
>> This ticket proposes additional keys that appear useful enough to justify
>> standardization.
>>
>> The first key address the need for users to have a portable way of
>> querying properties of the filesystem.  This key requires the user to
>> specify the specific file path of interest using an MPI_Info object.  The
>> communicator returned represents the set of processes that can write to a
>> single instance of that file path.  For a local disk, it is likely (but
>> not necessary) that this communicator be the same as returned by
>> {{{MPI_COMM_TYPE_SHARED}}}.  On the other hand, globally shared
>> filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}.  When the
>> implementation cannot determine the answer, the resulting communicator is
>> {{{MPI_COMM_NULL}}} and users cannot assume any information about the
>> path.
>>
>> To be perfectly honest, the choice of {{{key = MPI_COMM_TYPE_SHARED}}} to
>> be only used for shared-memory is unfortunately, because we could have
>> instead used this key for anything that could be shared and let the
>> differences be enumerated by {{{MPI_Info}}}.  For exampled, shared memory
>> windows could use {{{(key,value)=("memory","shared")}}} (granted, this is
>> a somewhat silly set of options, but it would naturally permit one to use
>> e.g. {{{(key,value)=("filepath","/home")}}} and
>> {{{(key,value)=("device","gpu0")}}}.  We could implement the new set of
>> options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
>> mean "shared memory" but that isn't particularly appealing.
>>
>> In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated below, I
>> think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
>> without specifying the possible options that the {{{MPI_Info}}} can take,
>> with the exception of noting that, if this key is used with
>> {{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}.  Devices
>> that implementations could define include coprocessors (aka accelerators
>> or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any number
>> of other machine-specific cases.  The purpose of standardizing the key is
>> to encourage implementations to take the most portable route when trying
>> to implement this type of feature, thus confining all of the non-portable
>> aspects to the definition of the {{{(key,value)}}} pair in the info
>> object.
>>
>> I have also considered  {{{MPI_COMM_TYPE_LOCALE}}}, which would also allow
>> the implementation to define info keys to specific, e.g. subcomms sharing
>> an IO node on Cray or Blue Gene, but this could just as easily be
>> implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key.  Furthermore, since
>> devices can be specified as a filepath (they often have an associated with
>> a {{{/dev/something}}} on Linux systems), there is no compelling reason to
>> add more than one key.  This is, of course, the reason why overloading
>> {{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
>> straightforward choice except in regards to backwards compatibility.
>>
>> What do people think?  Can we augment {{{MPI_COMM_TYPE_SHARED}}} with info
>> keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared memory, do
>> we break backwards compatibility or do we add a new key that has the
>> desired catch-all properties?
>>
>> Here is an example program illustrating the use of the proposed
>> functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:
>> {{{
>>
>> #include<stdio.h>
>> #include<stdlib.h>
>> #include<string.h>
>> #include<mpi.h>
>>
>> int main( int argc, char *argv[] )
>> {
>>      int rank;
>>      MPI_Info i1, i2, i3, i4, i5;
>>      MPI_Comm c0, c1, c2, c3, c4, c5;
>>      int result, happy=0;
>>
>>      MPI_Init(&argc,&argv);
>>      MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>
>>      MPI_Info_create(&i1 );
>>      MPI_Info_create(&i2 );
>>      MPI_Info_create(&i3 );
>>      MPI_Info_create(&i4 );
>>      MPI_Info_create(&i5 );
>>
>>      MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
>>      MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
>>      MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
>>      MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
>>      MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );
>>
>>      MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
>> MPI_INFO_NULL,&c0 );
>>
>>      MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i1,
>> &c1 );
>>      MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i2,
>> &c2 );
>>      MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i3,
>> &c3 );
>>      MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i4,
>> &c4 );
>>      MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i5,
>> &c5 );
>>
>>      /* a globally visible shared filesystem should result in a comm that
>> is equivalent to world */
>>      MPI_Comm_compare( MPI_COMM_WORLD, c1,&result );
>>      if ( result == MPI_CONGRUENT) happy++;
>>
>>      /* a process-local filesystem should result in MPI_COMM_SELF */
>>      MPI_Comm_compare( MPI_COMM_SELF, c2,&result );
>>      if ( result == MPI_CONGRUENT) happy++;
>>
>>      /* a filesystem shared within the node is likely to result in a
>> communicator equivalent
>>          to the one that supports shared memory, provided shared memory is
>> available */
>>      MPI_Comm_compare( c0, c3,&result );
>>      if ( result == MPI_CONGRUENT) happy++;
>>
>>      /* the /proc-zero-fs is only visible from rank 0 of world... */
>>      if (rank==0) {
>>         MPI_Comm_compare( MPI_COMM_SELF, c4,&result );
>>         if ( result == MPI_CONGRUENT) happy++;
>>      } else {
>>         if ( c4 == MPI_COMM_NULL) happy++;
>>      }
>>
>>      /* the sharable nature of /dev/rand is probably a meaningless concept
>> so
>>          we expect the implementation to return MPI_COMM_NULL for c5 */
>>      if ( c5 == MPI_COMM_NULL) happy++;
>>
>>      MPI_Comm_free(&c1 );
>>      MPI_Comm_free(&c2 );
>>      MPI_Comm_free(&c3 );
>>      MPI_Comm_free(&c4 );
>>      MPI_Comm_free(&c5 );
>>
>>      MPI_Info_free(&i1 );
>>      MPI_Info_free(&i2 );
>>      MPI_Info_free(&i3 );
>>      MPI_Info_free(&i4 );
>>      MPI_Info_free(&i5 );
>>
>>      MPI_Finalize( );
>>      return 0;
>> }
>> }}}
>>
>> --
>> Ticket URL:<https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
>> MPI Forum<https://svn.mpi-forum.org/>
>> MPI Forum
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at alcf.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>> _______________________________________________
>> mpi-forum mailing list
>> mpi-forum at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
>
> ________________________________________________________________________
> Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
> CASC @ Lawrence Livermore National Laboratory, Livermore, USA
>
>
>
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum



More information about the mpiwg-coll mailing list