[MPI3 Fortran] MPI Data types

Thu May 7 13:16:33 CDT 2009

On May 7, 2009, at 11:30 AM, Jeff Squyres wrote:

> On May 7, 2009, at 12:54 PM, N.M. Maclaren wrote:
>
>> C's array model is confused, but the issue is its semantic aspect.   
>> A C
>> array is simply a pointer to the first element of a contiguous  
>> sequence
>> of an unspecified number of elements.  Nothing more.  A void *  
>> pointer
>> is the first byte of an unspecified length of storage of unspecified
>> type.  It is assumed that it is of appropriate alignment and size  
>> for how
>> it is used, but that information is not associated with an array  
>> argument.
>>
>> A Fortran array is strongly typed (the first difference), and it  
>> comes in
>> many forms.  Most of them are contiguous, but the most important  
>> Fortran 90
>> one (plain assumed size) is not.  All but assumed shape (a Fortran 77
>> feature) have a strongly associated size.  It is the user's  
>> responsibility
>> to never pass a smaller array to a largerr argument.
>>
>> Pretty well all combinations are allowed in argument passing, and the
>> compiler is expected to use copy-in/copy-out when needed.  In  
>> particular,
>> it is needed (in general) when passing an assumed-shape array to an  
>> assumed
>> size argument (which is the form that matches C's model).  That can  
>> be
>> locked out, but that means that callers cannot use assumed shape  
>> arrays,
>> which (as I said) are the main Fortran 90 mechanism.
>>
>
> Ok.  FWIW, MPI would be [very] happy with never copy in/copy out.   
> Let the MPI make the copy if it wants/needs to.
>
> We talked about this in the last Chicago meeting (I fear I'm going  
> to get out of my depth here and either say something wrong or use  
> incorrect terminology): if the MPI can see the underlying Fortran  
> descriptor and resolve it to an accurate memory layout map of the  
> data (even if it's an array subset, or a subset of a subset,  
> or ...), then I think we can do something with that.
>
> However, we get into murky territory here (Rolf, Hubert, back me up  
> here): what *exactly* would MPI do with that?  It's at least  
> somewhat redundant with the MPI_Datatype argument.
>
> More on this below.
>
> (BTW, do we lose the valuable descriptor information / exact memory  
> layout if choice buffers are passed through IGNORE_TKR kinds of  
> interfaces?)

[For Jeff, yes valuable descriptor stuff lost with IGNORE_TKR  
directives]

>
>
>> The data layout is part of the MPI datatype, so there is a problem  
>> when
>> passing a Fortran 90 assumed shape REAL array (for example).  If  
>> the MPI
>> datatype is REAL, the MPI send / receive needs to handle  
>> discontiguous
>> data (which it doesn't).  If the MPI datatype is an MPI derived  
>> type, there
>> is no direct equivalence between the Fortran and C interfaces.
>>
>
>
> So the problem here is that we have two constructs that are at least  
> somewhat overlapping in semantic scope: Fortran parameters carry  
> meta data with them about their type, size, and shape, and the  
> (count, MPI_Datatype) tuple specify the same thing.
>
> This brings up the struggle I brought before: why does Fortran need  
> the (count, MPI_Datatype) tuple?  Because it's part of MPI, that's  
> why -- it's the standard.  And we've made a choice that the bindings  
> should be more or less 1:1 with C (same happened in C++).  But then  
> if the Fortran descriptor and the (count, MPI_Datatype) tuple are  
> mostly (or wholly?) redundant, aren't we just annoying users?   
> Probably so.  :-\  So -- what to do?
>
> I guess this is the $5M question.
>
> Here's one possible answer that wouldn't be wholly incongruous with  
> the MPI C bindings...
>
> What if the (count, MPI_Datatype) tuple is "applied" to the user's  
> buffer?
>
> - In C, the user's buffer is just a starting address and all  
> contiguous bytes after that
> - In Fortran, the user's buffer is described by the descriptor
>
> An MPI implementation will therefore need to track whether buffers  
> come from C or Fortran and basically meld the definition of the  
> "buffer" with the (count, MPI_Datatype) tuple to come up with the  
> final memory map.
>
> Hence, you could pass a Fortran N-dimensional array through MPI_SEND  
> with just MPI_REAL and a total count of how many REAL items are  
> being sent.  Or you could have a more complex datatype that cherry- 
> picks some of the REALs out of that N-dimensional array.  Or you  
> could use Fortran subsetting to cherry pick the same REALs out of  
> the array (obviously, with a smaller MPI count value).  I.e., you  
> could pick the same values out either way -- via MPI or via Fortran.
>

Yup, the $5M question.  Good question about whether the MPI_Datatype  
can be applied to a Fortran array.  It should be and would be runtime  
checkable.  Let's discuss this at the next MPI meeting.

-craig