[MPI3 Fortran] MPI Data types
Jeff Squyres
jsquyres at cisco.com
Thu May 7 12:30:20 CDT 2009
On May 7, 2009, at 12:54 PM, N.M. Maclaren wrote:
> C's array model is confused, but the issue is its semantic aspect.
> A C
> array is simply a pointer to the first element of a contiguous
> sequence
> of an unspecified number of elements. Nothing more. A void * pointer
> is the first byte of an unspecified length of storage of unspecified
> type. It is assumed that it is of appropriate alignment and size
> for how
> it is used, but that information is not associated with an array
> argument.
>
> A Fortran array is strongly typed (the first difference), and it
> comes in
> many forms. Most of them are contiguous, but the most important
> Fortran 90
> one (plain assumed size) is not. All but assumed shape (a Fortran 77
> feature) have a strongly associated size. It is the user's
> responsibility
> to never pass a smaller array to a largerr argument.
>
> Pretty well all combinations are allowed in argument passing, and the
> compiler is expected to use copy-in/copy-out when needed. In
> particular,
> it is needed (in general) when passing an assumed-shape array to an
> assumed
> size argument (which is the form that matches C's model). That can be
> locked out, but that means that callers cannot use assumed shape
> arrays,
> which (as I said) are the main Fortran 90 mechanism.
>
Ok. FWIW, MPI would be [very] happy with never copy in/copy out. Let
the MPI make the copy if it wants/needs to.
We talked about this in the last Chicago meeting (I fear I'm going to
get out of my depth here and either say something wrong or use
incorrect terminology): if the MPI can see the underlying Fortran
descriptor and resolve it to an accurate memory layout map of the data
(even if it's an array subset, or a subset of a subset, or ...), then
I think we can do something with that.
However, we get into murky territory here (Rolf, Hubert, back me up
here): what *exactly* would MPI do with that? It's at least somewhat
redundant with the MPI_Datatype argument.
More on this below.
(BTW, do we lose the valuable descriptor information / exact memory
layout if choice buffers are passed through IGNORE_TKR kinds of
interfaces?)
> The data layout is part of the MPI datatype, so there is a problem
> when
> passing a Fortran 90 assumed shape REAL array (for example). If the
> MPI
> datatype is REAL, the MPI send / receive needs to handle discontiguous
> data (which it doesn't). If the MPI datatype is an MPI derived
> type, there
> is no direct equivalence between the Fortran and C interfaces.
>
So the problem here is that we have two constructs that are at least
somewhat overlapping in semantic scope: Fortran parameters carry meta
data with them about their type, size, and shape, and the (count,
MPI_Datatype) tuple specify the same thing.
This brings up the struggle I brought before: why does Fortran need
the (count, MPI_Datatype) tuple? Because it's part of MPI, that's why
-- it's the standard. And we've made a choice that the bindings
should be more or less 1:1 with C (same happened in C++). But then if
the Fortran descriptor and the (count, MPI_Datatype) tuple are mostly
(or wholly?) redundant, aren't we just annoying users? Probably
so. :-\ So -- what to do?
I guess this is the $5M question.
Here's one possible answer that wouldn't be wholly incongruous with
the MPI C bindings...
What if the (count, MPI_Datatype) tuple is "applied" to the user's
buffer?
- In C, the user's buffer is just a starting address and all
contiguous bytes after that
- In Fortran, the user's buffer is described by the descriptor
An MPI implementation will therefore need to track whether buffers
come from C or Fortran and basically meld the definition of the
"buffer" with the (count, MPI_Datatype) tuple to come up with the
final memory map.
Hence, you could pass a Fortran N-dimensional array through MPI_SEND
with just MPI_REAL and a total count of how many REAL items are being
sent. Or you could have a more complex datatype that cherry-picks
some of the REALs out of that N-dimensional array. Or you could use
Fortran subsetting to cherry pick the same REALs out of the array
(obviously, with a smaller MPI count value). I.e., you could pick the
same values out either way -- via MPI or via Fortran.
--
Jeff Squyres
Cisco Systems
More information about the mpiwg-fortran
mailing list