[Mpi3-rma] FW: draft of a proposal for RMA interfaces
jeff.science at gmail.com
Tue Dec 9 13:15:59 CST 2008
On Tue, Dec 9, 2008 at 7:51 AM, Vinod tipparaju <tipparajuv at hotmail.com> wrote:
>> inside of GA, I have to manually do everything that ARMCI_Malloc did previously?
> yes the user has to. Do you think that is a bad idea?
I think it depends on the user. I see the following possibilities:
1. MPI RMA includes functionality for allocating different types of
memory segments (shared, private, etc.) just as ARMCI did, and perhaps
more functionality than that. MPI knows when users are abusing memory
and can address this directly via useful error messages, etc.
2. A separate library which makes managing the complexities of on-node
memory in a heterogeneous and/or multicore context is developed
independently of MPI.
3. Users roll there own in all cases...
a. ...resulting in properly executing code which runs at top speed
because the user knows what they are doing or their usage is simple
enough that it doesn't matter.
b. ...correctly but with limited performance because a naive
approach is used.
c. ...leading to catastrophic problems because they don't know
what they're doing.
I know of enough evil hacking among application programmers to greatly
fear (3c). The negative consequences of (3b) depend on the machine.
For example, if one uses basic malloc and all memory is
process-private, then on a big SMP node like Ranger at TACC, the
performance will suffer because the RMA will not completely bypass the
communication protocol, as GA/ARMCI currently does. Even when MPI RMA
shortcuts when it knows it is staying within a node, there still must
be a memcpy between the two segments of private memory that would not
occur if the user uses shmalloc. Of course, some users may want this
for consistency but for others it may kill performance.
I don't believe I understand all the issues clearly enough to say much
more, but if MPI aims to support heterogeneous platforms, the
complexity of on-node memory hierarchies that paradigm introduces,
along with the complexity of dealing with different types of multicore
CPUs, suggests that MPI users would benefit greatly from an allocator
which encapsulates these various paradigms into a single, portable
framework which is in sync with MPI's RMA protocol.
I imagine a mix of (2) and (3) is the most likely scenario but (2)
won't fully address the problem unless it focuses directly on being
paired with MPI RMA. As a user, and a stupid and lazy one at that,
(1) is the best I can hope for.
The University of Chicago
More information about the mpiwg-rma