[Mpi3-rma] non-contiguous support in RMA & one-sided pack/unpack (?)
wgropp at uiuc.edu
Wed Sep 16 06:48:43 CDT 2009
The issue of implementation efficiency is an important one - all too
often, APIs are designed with the attitude of "the vendor will make it
run fast". And a common error is to say "just put the progress in a
separate thread," leaving out the "regardless of the impact on the
performance of those classes of application that are latency-bound". So
I urge us all to pay attention to these implementation issues.
Jeff Hammond wrote:
> I must be blind for missing that. Sorry.
> I understand that it is not MPI Forum's responsibility to ensure
> efficient implementations of the standard, but I am still concerned
> about the performance of even simple non-contiguous operations based
> upon what I see with ARMCI. I guess I'll have to wait and see what
> the various groups/vendors produce.
> On Tue, Sep 15, 2009 at 6:18 PM, Vinod tipparaju <tipparajuv at hotmail.com> wrote:
>> Please see the RMA wiki page, it has a draft proposal (as a file
>> attachment). It should include datatypes.
>> Vinod Tipparaju ^ http://ft.ornl.gov/~vinod ^ 1-865-241-1802
>>> Date: Tue, 15 Sep 2009 17:19:51 -0500
>>> From: jeff.science at gmail.com
>>> To: mpi3-rma at lists.mpi-forum.org
>>> Subject: [Mpi3-rma] non-contiguous support in RMA & one-sided pack/unpack
>>> The arguments to MPI_RMA_xfer make no reference to datatype,
>>> suggesting that only contiguous patches of primitive types will be
>>> supported. Do I understand this correctly? I queried old emails and
>>> cannot find an answer to this question if it already exists. I
>>> apologize for my inadequate search skills if I missed it.
>>> It seems there are a few possibilities for non-contiguous support in RMA:
>>> 1. RMA is decidedly low-level and makes no attempt to support
>>> non-contiguous data
>>> 2. RMA supports arbitrary datatypes, including derived datatypes for
>>> non-contiguous patches
>>> 3. RMA supports non-contiguous patches via a few simple mechanisms -
>>> strided, etc. - like ARMCI
>>> 4. RMA supports non-contiguous patches implicitly using one-sided
>>> pack/unpack functionality, presumably implemented with active messages
>>> 5. RMA stipulated non-contiguous support but is vague enough to allow
>>> a variety of implementations
>>> It is not my intent to request any or all of the aforementioned
>>> features, but merely to suggest them as possible ideas to be discussed
>>> and adopted or eliminated based upon their relative merits and the
>>> philosophical preferences of the principles (e.g. Vinod).
>>> (4) seems rather challenging, but potentially desirable in certain
>>> contexts where a large number of sub-MPI calls impedes performance.
>>> Of course, one-sided unpack may result in very negative behavior if
>>> implemented or used incorrectly and is perhaps too risky to consider.
>>> One practical motivation for my thinking about this is the
>>> non-blocking performance (rather, lack thereof) of Global Arrays on
>>> BlueGene/P due to the need to explicitly advance the DCMF messenger
>>> for every contiguous segment, which cannot be done asynchronously due
>>> to the lack of thread-spawning capability. I understand there may be
>>> similar issues on the Cray XT5 (message injection limits in CNL?), but
>>> I don't know enough about the technical details to elaborate.
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> jhammond at mcs.anl.gov / (630) 252-5381
>>> mpi3-rma mailing list
>>> mpi3-rma at lists.mpi-forum.org
>> mpi3-rma mailing list
>> mpi3-rma at lists.mpi-forum.org
More information about the mpiwg-rma