[MPI3 Fortran] MPI Data types

Thu May 7 12:30:20 CDT 2009

On May 7, 2009, at 12:54 PM, N.M. Maclaren wrote:

> C's array model is confused, but the issue is its semantic aspect.   
> A C
> array is simply a pointer to the first element of a contiguous  
> sequence
> of an unspecified number of elements.  Nothing more.  A void * pointer
> is the first byte of an unspecified length of storage of unspecified
> type.  It is assumed that it is of appropriate alignment and size  
> for how
> it is used, but that information is not associated with an array  
> argument.
>
> A Fortran array is strongly typed (the first difference), and it  
> comes in
> many forms.  Most of them are contiguous, but the most important  
> Fortran 90
> one (plain assumed size) is not.  All but assumed shape (a Fortran 77
> feature) have a strongly associated size.  It is the user's  
> responsibility
> to never pass a smaller array to a largerr argument.
>
> Pretty well all combinations are allowed in argument passing, and the
> compiler is expected to use copy-in/copy-out when needed.  In  
> particular,
> it is needed (in general) when passing an assumed-shape array to an  
> assumed
> size argument (which is the form that matches C's model).  That can be
> locked out, but that means that callers cannot use assumed shape  
> arrays,
> which (as I said) are the main Fortran 90 mechanism.
>

Ok.  FWIW, MPI would be [very] happy with never copy in/copy out.  Let  
the MPI make the copy if it wants/needs to.

We talked about this in the last Chicago meeting (I fear I'm going to  
get out of my depth here and either say something wrong or use  
incorrect terminology): if the MPI can see the underlying Fortran  
descriptor and resolve it to an accurate memory layout map of the data  
(even if it's an array subset, or a subset of a subset, or ...), then  
I think we can do something with that.

However, we get into murky territory here (Rolf, Hubert, back me up  
here): what *exactly* would MPI do with that?  It's at least somewhat  
redundant with the MPI_Datatype argument.

More on this below.

(BTW, do we lose the valuable descriptor information / exact memory  
layout if choice buffers are passed through IGNORE_TKR kinds of  
interfaces?)

> The data layout is part of the MPI datatype, so there is a problem  
> when
> passing a Fortran 90 assumed shape REAL array (for example).  If the  
> MPI
> datatype is REAL, the MPI send / receive needs to handle discontiguous
> data (which it doesn't).  If the MPI datatype is an MPI derived  
> type, there
> is no direct equivalence between the Fortran and C interfaces.
>

So the problem here is that we have two constructs that are at least  
somewhat overlapping in semantic scope: Fortran parameters carry meta  
data with them about their type, size, and shape, and the (count,  
MPI_Datatype) tuple specify the same thing.

This brings up the struggle I brought before: why does Fortran need  
the (count, MPI_Datatype) tuple?  Because it's part of MPI, that's why  
-- it's the standard.  And we've made a choice that the bindings  
should be more or less 1:1 with C (same happened in C++).  But then if  
the Fortran descriptor and the (count, MPI_Datatype) tuple are mostly  
(or wholly?) redundant, aren't we just annoying users?  Probably  
so.  :-\  So -- what to do?

I guess this is the $5M question.

Here's one possible answer that wouldn't be wholly incongruous with  
the MPI C bindings...

What if the (count, MPI_Datatype) tuple is "applied" to the user's  
buffer?

- In C, the user's buffer is just a starting address and all  
contiguous bytes after that
- In Fortran, the user's buffer is described by the descriptor

An MPI implementation will therefore need to track whether buffers  
come from C or Fortran and basically meld the definition of the  
"buffer" with the (count, MPI_Datatype) tuple to come up with the  
final memory map.

Hence, you could pass a Fortran N-dimensional array through MPI_SEND  
with just MPI_REAL and a total count of how many REAL items are being  
sent.  Or you could have a more complex datatype that cherry-picks  
some of the REALs out of that N-dimensional array.  Or you could use  
Fortran subsetting to cherry pick the same REALs out of the array  
(obviously, with a smaller MPI count value).  I.e., you could pick the  
same values out either way -- via MPI or via Fortran.

-- 
Jeff Squyres
Cisco Systems