[mpi3-coll] [Mpi-forum] additional keys for MPI_COMM_SPLIT_TYPE
Torsten Hoefler
htor at inf.ethz.ch
Sun Mar 24 15:15:27 CDT 2013
Martin,
Sure, as Jeff suggests, we shouldn't limit the discussion but we can
discuss it in the context of the colls&topo WG.
All the Best,
Torsten
On 03/24/2013 09:00 PM, Schulz, Martin wrote:
> Hi Jeff, all,
>
> On Mar 24, 2013, at 12:36 PM, Jeff Hammond<jhammond at alcf.anl.gov> wrote:
>
>> Martin encouraged me to socialize this with the Forum. The idea here
>> seems broader than just one working group so I'm sending to the entire
>> Forum for feedback.
>
> I still think it would be good to have one WG take responsibility for this discussion. It seems to fit best into the collectives WG (and/or the communicator/groups chapter). Torsten: as the lead for the collectives group, does this sound reasonable to you?
>
> Thanks,
>
> Martin
>
>
>>
>> See https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372. The text
>> is included below in case it makes it easier to read on your
>> phone...because I know this is that urgent :-)
>>
>> Jeff
>>
>>
>> ---------- Forwarded message ----------
>> From: MPI Forum<mpi-forum at lists.mpi-forum.org>
>> Date: Sun, Mar 24, 2013 at 1:50 PM
>> Subject: [MPI Forum] #372: additional keys for MPI_COMM_SPLIT_TYPE
>> To:
>>
>>
>> #372: additional keys for MPI_COMM_SPLIT_TYPE
>> -------------------------------------+-------------------------------------
>> Reporter: | Owner:
>> jhammond | Status: new
>> Type: | Milestone:
>> Enhancements to standard | 2013/03/11 Chicago, USA
>> Priority: | Keywords:
>> Forum feedback requested | Author: Bill Gropp: 0
>> Version: MPI | Author: Adam Moody: 0
>> <next> | Author: Dick Treumann: 0
>> Implementation status: | Author: George Bosilca: 0
>> Waiting | Author: Bronis de Supinski: 0
>> Author: Rich Graham: 0 | Author: Jeff Squyres: 0
>> Author: Torsten Hoefler: 0 | Author: Rolf Rabenseifner: 0
>> Author: Jesper Larsson Traeff: 0 |
>> Author: David Solt: 0 |
>> Author: Rajeev Thakur: 0 |
>> Author: Alexander Supalov: 0 |
>> -------------------------------------+-------------------------------------
>> {{{MPI_COMM_SPLIT_TYPE}}} currently supports only on key,
>> {{{MPI_COMM_TYPE_SHARED}}}, which refers to shared memory domains, as
>> supported by shared memory windows ({{{MPI_WIN_ALLOCATE_SHARED}}}).
>>
>> This ticket proposes additional keys that appear useful enough to justify
>> standardization.
>>
>> The first key address the need for users to have a portable way of
>> querying properties of the filesystem. This key requires the user to
>> specify the specific file path of interest using an MPI_Info object. The
>> communicator returned represents the set of processes that can write to a
>> single instance of that file path. For a local disk, it is likely (but
>> not necessary) that this communicator be the same as returned by
>> {{{MPI_COMM_TYPE_SHARED}}}. On the other hand, globally shared
>> filesystems will return a duplicate of {{{MPI_COMM_WORLD}}}. When the
>> implementation cannot determine the answer, the resulting communicator is
>> {{{MPI_COMM_NULL}}} and users cannot assume any information about the
>> path.
>>
>> To be perfectly honest, the choice of {{{key = MPI_COMM_TYPE_SHARED}}} to
>> be only used for shared-memory is unfortunately, because we could have
>> instead used this key for anything that could be shared and let the
>> differences be enumerated by {{{MPI_Info}}}. For exampled, shared memory
>> windows could use {{{(key,value)=("memory","shared")}}} (granted, this is
>> a somewhat silly set of options, but it would naturally permit one to use
>> e.g. {{{(key,value)=("filepath","/home")}}} and
>> {{{(key,value)=("device","gpu0")}}}. We could implement the new set of
>> options using the existing key if we standardize {{{MPI_INFO_NULL}}} to
>> mean "shared memory" but that isn't particularly appealing.
>>
>> In addition to {{{MPI_COMM_TYPE_FILEPATH}}}, which is illustrated below, I
>> think that {{{MPI_COMM_TYPE_DEVICE}}} is useful to standardize as a key
>> without specifying the possible options that the {{{MPI_Info}}} can take,
>> with the exception of noting that, if this key is used with
>> {{{MPI_INFO_NULL}}}, the result is always {{{MPI_COMM_NULL}}}. Devices
>> that implementations could define include coprocessors (aka accelerators
>> or GPUs), NICs (e.g. in the case of multi-rail IB systems) and any number
>> of other machine-specific cases. The purpose of standardizing the key is
>> to encourage implementations to take the most portable route when trying
>> to implement this type of feature, thus confining all of the non-portable
>> aspects to the definition of the {{{(key,value)}}} pair in the info
>> object.
>>
>> I have also considered {{{MPI_COMM_TYPE_LOCALE}}}, which would also allow
>> the implementation to define info keys to specific, e.g. subcomms sharing
>> an IO node on Cray or Blue Gene, but this could just as easily be
>> implemented using the {{{MPI_COMM_TYPE_DEVICE}}} key. Furthermore, since
>> devices can be specified as a filepath (they often have an associated with
>> a {{{/dev/something}}} on Linux systems), there is no compelling reason to
>> add more than one key. This is, of course, the reason why overloading
>> {{{MPI_COMM_TYPE_SHARED}}} via info objects seems like the most
>> straightforward choice except in regards to backwards compatibility.
>>
>> What do people think? Can we augment {{{MPI_COMM_TYPE_SHARED}}} with info
>> keys and leave the case of {{{MPI_INFO_NULL}}} to mean shared memory, do
>> we break backwards compatibility or do we add a new key that has the
>> desired catch-all properties?
>>
>> Here is an example program illustrating the use of the proposed
>> functionality for {{{MPI_COMM_TYPE_FILEPATH}}}:
>> {{{
>>
>> #include<stdio.h>
>> #include<stdlib.h>
>> #include<string.h>
>> #include<mpi.h>
>>
>> int main( int argc, char *argv[] )
>> {
>> int rank;
>> MPI_Info i1, i2, i3, i4, i5;
>> MPI_Comm c0, c1, c2, c3, c4, c5;
>> int result, happy=0;
>>
>> MPI_Init(&argc,&argv);
>> MPI_Comm_rank(MPI_COMM_WORLD,&rank);
>>
>> MPI_Info_create(&i1 );
>> MPI_Info_create(&i2 );
>> MPI_Info_create(&i3 );
>> MPI_Info_create(&i4 );
>> MPI_Info_create(&i5 );
>>
>> MPI_Info_set( i1, (char*)"path", (char*)"/global-shared-fs" );
>> MPI_Info_set( i2, (char*)"path", (char*)"/proc-local-fs" );
>> MPI_Info_set( i3, (char*)"path", (char*)"/node-local-fs" );
>> MPI_Info_set( i4, (char*)"path", (char*)"/proc-zero-fs" );
>> MPI_Info_set( i5, (char*)"path", (char*)"/dev/rand" );
>>
>> MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_SHARED, 0,
>> MPI_INFO_NULL,&c0 );
>>
>> MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i1,
>> &c1 );
>> MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i2,
>> &c2 );
>> MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i3,
>> &c3 );
>> MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i4,
>> &c4 );
>> MPI_Comm_split_type( MPI_COMM_WORLD, MPI_COMM_TYPE_FILEPATH, 0, i5,
>> &c5 );
>>
>> /* a globally visible shared filesystem should result in a comm that
>> is equivalent to world */
>> MPI_Comm_compare( MPI_COMM_WORLD, c1,&result );
>> if ( result == MPI_CONGRUENT) happy++;
>>
>> /* a process-local filesystem should result in MPI_COMM_SELF */
>> MPI_Comm_compare( MPI_COMM_SELF, c2,&result );
>> if ( result == MPI_CONGRUENT) happy++;
>>
>> /* a filesystem shared within the node is likely to result in a
>> communicator equivalent
>> to the one that supports shared memory, provided shared memory is
>> available */
>> MPI_Comm_compare( c0, c3,&result );
>> if ( result == MPI_CONGRUENT) happy++;
>>
>> /* the /proc-zero-fs is only visible from rank 0 of world... */
>> if (rank==0) {
>> MPI_Comm_compare( MPI_COMM_SELF, c4,&result );
>> if ( result == MPI_CONGRUENT) happy++;
>> } else {
>> if ( c4 == MPI_COMM_NULL) happy++;
>> }
>>
>> /* the sharable nature of /dev/rand is probably a meaningless concept
>> so
>> we expect the implementation to return MPI_COMM_NULL for c5 */
>> if ( c5 == MPI_COMM_NULL) happy++;
>>
>> MPI_Comm_free(&c1 );
>> MPI_Comm_free(&c2 );
>> MPI_Comm_free(&c3 );
>> MPI_Comm_free(&c4 );
>> MPI_Comm_free(&c5 );
>>
>> MPI_Info_free(&i1 );
>> MPI_Info_free(&i2 );
>> MPI_Info_free(&i3 );
>> MPI_Info_free(&i4 );
>> MPI_Info_free(&i5 );
>>
>> MPI_Finalize( );
>> return 0;
>> }
>> }}}
>>
>> --
>> Ticket URL:<https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/372>
>> MPI Forum<https://svn.mpi-forum.org/>
>> MPI Forum
>>
>>
>> --
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at alcf.anl.gov / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>> _______________________________________________
>> mpi-forum mailing list
>> mpi-forum at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
>
> ________________________________________________________________________
> Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
> CASC @ Lawrence Livermore National Laboratory, Livermore, USA
>
>
>
>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
More information about the mpiwg-coll
mailing list