[MPI3 Fortran] MPI Data types
Craig Rasmussen
crasmussen at newmexicoconsortium.org
Thu May 7 13:16:33 CDT 2009
On May 7, 2009, at 11:30 AM, Jeff Squyres wrote:
> On May 7, 2009, at 12:54 PM, N.M. Maclaren wrote:
>
>> C's array model is confused, but the issue is its semantic aspect.
>> A C
>> array is simply a pointer to the first element of a contiguous
>> sequence
>> of an unspecified number of elements. Nothing more. A void *
>> pointer
>> is the first byte of an unspecified length of storage of unspecified
>> type. It is assumed that it is of appropriate alignment and size
>> for how
>> it is used, but that information is not associated with an array
>> argument.
>>
>> A Fortran array is strongly typed (the first difference), and it
>> comes in
>> many forms. Most of them are contiguous, but the most important
>> Fortran 90
>> one (plain assumed size) is not. All but assumed shape (a Fortran 77
>> feature) have a strongly associated size. It is the user's
>> responsibility
>> to never pass a smaller array to a largerr argument.
>>
>> Pretty well all combinations are allowed in argument passing, and the
>> compiler is expected to use copy-in/copy-out when needed. In
>> particular,
>> it is needed (in general) when passing an assumed-shape array to an
>> assumed
>> size argument (which is the form that matches C's model). That can
>> be
>> locked out, but that means that callers cannot use assumed shape
>> arrays,
>> which (as I said) are the main Fortran 90 mechanism.
>>
>
> Ok. FWIW, MPI would be [very] happy with never copy in/copy out.
> Let the MPI make the copy if it wants/needs to.
>
> We talked about this in the last Chicago meeting (I fear I'm going
> to get out of my depth here and either say something wrong or use
> incorrect terminology): if the MPI can see the underlying Fortran
> descriptor and resolve it to an accurate memory layout map of the
> data (even if it's an array subset, or a subset of a subset,
> or ...), then I think we can do something with that.
>
> However, we get into murky territory here (Rolf, Hubert, back me up
> here): what *exactly* would MPI do with that? It's at least
> somewhat redundant with the MPI_Datatype argument.
>
> More on this below.
>
> (BTW, do we lose the valuable descriptor information / exact memory
> layout if choice buffers are passed through IGNORE_TKR kinds of
> interfaces?)
[For Jeff, yes valuable descriptor stuff lost with IGNORE_TKR
directives]
>
>
>> The data layout is part of the MPI datatype, so there is a problem
>> when
>> passing a Fortran 90 assumed shape REAL array (for example). If
>> the MPI
>> datatype is REAL, the MPI send / receive needs to handle
>> discontiguous
>> data (which it doesn't). If the MPI datatype is an MPI derived
>> type, there
>> is no direct equivalence between the Fortran and C interfaces.
>>
>
>
> So the problem here is that we have two constructs that are at least
> somewhat overlapping in semantic scope: Fortran parameters carry
> meta data with them about their type, size, and shape, and the
> (count, MPI_Datatype) tuple specify the same thing.
>
> This brings up the struggle I brought before: why does Fortran need
> the (count, MPI_Datatype) tuple? Because it's part of MPI, that's
> why -- it's the standard. And we've made a choice that the bindings
> should be more or less 1:1 with C (same happened in C++). But then
> if the Fortran descriptor and the (count, MPI_Datatype) tuple are
> mostly (or wholly?) redundant, aren't we just annoying users?
> Probably so. :-\ So -- what to do?
>
> I guess this is the $5M question.
>
> Here's one possible answer that wouldn't be wholly incongruous with
> the MPI C bindings...
>
> What if the (count, MPI_Datatype) tuple is "applied" to the user's
> buffer?
>
> - In C, the user's buffer is just a starting address and all
> contiguous bytes after that
> - In Fortran, the user's buffer is described by the descriptor
>
> An MPI implementation will therefore need to track whether buffers
> come from C or Fortran and basically meld the definition of the
> "buffer" with the (count, MPI_Datatype) tuple to come up with the
> final memory map.
>
> Hence, you could pass a Fortran N-dimensional array through MPI_SEND
> with just MPI_REAL and a total count of how many REAL items are
> being sent. Or you could have a more complex datatype that cherry-
> picks some of the REALs out of that N-dimensional array. Or you
> could use Fortran subsetting to cherry pick the same REALs out of
> the array (obviously, with a smaller MPI count value). I.e., you
> could pick the same values out either way -- via MPI or via Fortran.
>
Yup, the $5M question. Good question about whether the MPI_Datatype
can be applied to a Fortran array. It should be and would be runtime
checkable. Let's discuss this at the next MPI meeting.
-craig
More information about the mpiwg-fortran
mailing list