[mpi3-coll] [Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE

Jeff Hammond jhammond at alcf.anl.gov
Mon Mar 25 10:08:03 CDT 2013


(moving to MPI-3 Collectives list, per Martin and Torsten...)

Yes, I have added a reference in the ticket text.

One possible reason to keep these separate is that the notion of
neighborhood/locale is perhaps harder to specify that file paths and
thus might be harder to get into the standard.  However, the working
group can sort things out in one ticket and bring one or more
proposals to the Forum if there is contention over these issues.

Best,

Jeff

On Mon, Mar 25, 2013 at 9:16 AM, Jim Dinan <dinan at mcs.anl.gov> wrote:
> Hi Jeff,
>
> Please also include ticket #297 in the discussion (merge with your ticket,
> or otherwise):
>
> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/297
>
> This ticket proposes MPI_COMM_TYPE_NEIGHBORHOOD, which looks similar to
> MPI_COMM_TYPE_LOCALE.
>
>  ~Jim.
>
>
> On 3/24/13 2:36 PM, Jeff Hammond wrote:
>>
>> Martin encouraged me to socialize this with the Forum.  The idea here
>> seems broader than just one working group so I'm sending to the entire
>> Forum for feedback.
>>
>> See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372.  The text
>> is included below in case it makes it easier to read on your
>> phone...because I know this is that urgent :-)
>>
>> Jeff
>>
>>
>> ---------- Forwarded message ----------
>> From: MPI Forum <mpi-forum at lists.mpi-forum.org>
>> Date: Sun, Mar 24, 2013 at 1:50 PM
>> Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE
>> To:
>>
>>
>> #372: additional keys for MPI_COMM_SPLIT_TYPE
>>
>> -------------------------------------+-------------------------------------
>>                       Reporter:       |                       Owner:
>>    jhammond                           |                      Status:  new
>>                           Type:       |                   Milestone:
>>    Enhancements to standard           |  2013/03/11 Chicago, USA
>>                       Priority:       |                    Keywords:
>>    Forum feedback requested           |          Author: Bill Gropp:  0
>>                        Version:  MPI  |          Author: Adam Moody:  0
>>    <next>                             |       Author: Dick Treumann:  0
>>          Implementation status:       |      Author: George Bosilca:  0
>>    Waiting                            |  Author: Bronis de Supinski:  0
>>            Author: Rich Graham:  0    |        Author: Jeff Squyres:  0
>>        Author: Torsten Hoefler:  0    |   Author: Rolf Rabenseifner:  0
>> Author: Jesper Larsson Traeff:  0    |
>>             Author: David Solt:  0    |
>>          Author: Rajeev Thakur:  0    |
>>      Author: Alexander Supalov:  0    |
>>
>> -------------------------------------+-------------------------------------
>>   {{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
>>   {{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory domains, as
>>   supported by shared memory windows ({{{MPI_WIN_ALLOCATE_SHARED}}}).
>>
>>   This ticket proposes additional keys that appear useful enough to
>> justify
>>   standardization.
>>
>>   The first key address the need for users to have a portable way of
>>   querying properties of the filesystem.  This key requires the user to
>>   specify the specific file path of interest using an MPI_Info object.
>> The
>>   communicator returned represents the set of processes that can write to
>> a
>>   single instance of that file path.  For a local disk, it is likely (but
>>   not necessary) that this communicator be the same as returned by
>>   {{{MPI_COMM_TYPE_SHARED}}}.  On the other hand, globally shared
>>   filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}.  When the
>>   implementation cannot determine the answer, the resulting communicator
>> is
>>   {{{MPI_COMM_NULL}}} and users cannot assume any information about the
>>   path.
>>
>>   To be perfectly honest, the choice of {{{key = MPI_COMM_TYPE_SHARED}}}
>> to
>>   be only used for shared-memory is unfortunately, because we could have
>>   instead used this key for anything that could be shared and let the
>>   differences be enumerated by {{{MPI_Info}}}.  For exampled, shared
>> memory
>>   windows could use {{{(key,value)=("memory","shared")}}} (granted, this
>> is
>>   a somewhat silly set of options, but it would naturally permit one to
>> use
>>   e.g. {{{(key,value)=("filepath","/home")}}} and
>>   {{{(key,value)=("device","gpu0")}}}.  We could implement the new set of
>>   options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
>>   mean "shared memory" but that isn't particularly appealing.
>>
>>   In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated below,
>> I
>>   think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
>>   without specifying the possible options that the {{{MPI_Info}}} can
>> take,
>>   with the exception of noting that, if this key is used with
>>   {{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}.  Devices
>>   that implementations could define include coprocessors (aka accelerators
>>   or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any
>> number
>>   of other machine-specific cases.  The purpose of standardizing the key
>> is
>>   to encourage implementations to take the most portable route when trying
>>   to implement this type of feature, thus confining all of the
>> non-portable
>>   aspects to the definition of the {{{(key,value)}}} pair in the info
>>   object.
>>
>>   I have also considered  {{{MPI_COMM_TYPE_LOCALE}}}, which would also
>> allow
>>   the implementation to define info keys to specific, e.g. subcomms
>> sharing
>>   an IO node on Cray or Blue Gene, but this could just as easily be
>>   implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key.  Furthermore,
>> since
>>   devices can be specified as a filepath (they often have an associated
>> with
>>   a {{{/dev/something}}} on Linux systems), there is no compelling reason
>> to
>>   add more than one key.  This is, of course, the reason why overloading
>>   {{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
>>   straightforward choice except in regards to backwards compatibility.
>>
>>   What do people think?  Can we augment {{{MPI_COMM_TYPE_SHARED}}} with
>> info
>>   keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared memory, do
>>   we break backwards compatibility or do we add a new key that has the
>>   desired catch-all properties?
>>
>>   Here is an example program illustrating the use of the proposed
>>   functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:
>>   {{{
>>
>>   #include <stdio.h>
>>   #include <stdlib.h>
>>   #include <string.h>
>>   #include <mpi.h>
>>
>>   int main( int argc, char *argv[] )
>>   {
>>       int rank;
>>       MPI_Info i1, i2, i3, i4, i5;
>>       MPI_Comm c0, c1, c2, c3, c4, c5;
>>       int result, happy=0;
>>
>>       MPI_Init(&argc,&argv);
>>       MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>
>>       MPI_Info_create( &i1 );
>>       MPI_Info_create( &i2 );
>>       MPI_Info_create( &i3 );
>>       MPI_Info_create( &i4 );
>>       MPI_Info_create( &i5 );
>>
>>       MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
>>       MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
>>       MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
>>       MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
>>       MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );
>>
>>       MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
>>   MPI_INFO_NULL, &c0 );
>>
>>       MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i1,
>>   &c1 );
>>       MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i2,
>>   &c2 );
>>       MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i3,
>>   &c3 );
>>       MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i4,
>>   &c4 );
>>       MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i5,
>>   &c5 );
>>
>>       /* a globally visible shared filesystem should result in a comm that
>>   is equivalent to world */
>>       MPI_Comm_compare( MPI_COMM_WORLD, c1, &result );
>>       if ( result == MPI_CONGRUENT) happy++;
>>
>>       /* a process-local filesystem should result in MPI_COMM_SELF */
>>       MPI_Comm_compare( MPI_COMM_SELF, c2, &result );
>>       if ( result == MPI_CONGRUENT) happy++;
>>
>>       /* a filesystem shared within the node is likely to result in a
>>   communicator equivalent
>>           to the one that supports shared memory, provided shared memory
>> is
>>   available */
>>       MPI_Comm_compare( c0, c3, &result );
>>       if ( result == MPI_CONGRUENT) happy++;
>>
>>       /* the /proc-zero-fs is only visible from rank 0 of world... */
>>       if (rank==0) {
>>          MPI_Comm_compare( MPI_COMM_SELF, c4, &result );
>>          if ( result == MPI_CONGRUENT) happy++;
>>       } else {
>>          if ( c4 == MPI_COMM_NULL) happy++;
>>       }
>>
>>       /* the sharable nature of /dev/rand is probably a meaningless
>> concept
>>   so
>>           we expect the implementation to return MPI_COMM_NULL for c5 */
>>       if ( c5 == MPI_COMM_NULL) happy++;
>>
>>       MPI_Comm_free( &c1 );
>>       MPI_Comm_free( &c2 );
>>       MPI_Comm_free( &c3 );
>>       MPI_Comm_free( &c4 );
>>       MPI_Comm_free( &c5 );
>>
>>       MPI_Info_free( &i1 );
>>       MPI_Info_free( &i2 );
>>       MPI_Info_free( &i3 );
>>       MPI_Info_free( &i4 );
>>       MPI_Info_free( &i5 );
>>
>>       MPI_Finalize( );
>>       return 0;
>>   }
>>   }}}
>>
>> --
>> Ticket URL: <https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
>> MPI Forum <https://svn.mpi-forum.org/>
>> MPI Forum
>>
>>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond



More information about the mpiwg-coll mailing list