10-237r2 To: J3 From: Nick Maclaren/Malcolm Cohen Subject: Interop TR: TYPE(*) and MPI Support Date: 2010 October 14 Reference: 10-165r2, N1766, 10-233, 10-234, 10-237 1. Introduction --------------- This paper attempts to pull together the various requirements of MPI-3 and others for C interoperability that have been so far inadequately addressed by the draft TR. There are a number of technical choices in the design. These could be changed, but the edits would need further work in that case. 2. Requirements summary ----------------------- (1) Support for passing assumed-size arrays to assumed-rank arguments. Due to historical reasons, existing MPI code makes much use of assumed-size arrays, and due to the way generic overloading works, for maximum usefulness with minimal code changes it is required to pass these to assumed-rank dummies. (2) Support for CHARACTER(*) dummy arguments in BIND(C) routines. Again, the existing facilities (basically passing to CHARACTER(*) arrays and passing the length as a separate explicit argument, or appending a NUL character to each argument - or element in the case of array arguments) would require excessive code changes to MPI code. It seems to be the desire to have all of TYPE(*) and DIMENSION(..) and CHARACTER(*) in the same interface that is driving this issue. (3) Actual arguments that are not otherwise interoperable should be considered interoperable with a TYPE(*) argument, except for derived types with type-bound procedures, final procedures, or type parameters. For further discussion of the background behind these requirements see 10-237 (omitted from the r1). 3. Technical Choices -------------------- It is taken as a principle that needs no explanation that permission to perform such actions as passing assumed-size to assumed-rank dummies not be restricted to BIND(C) procedures with extremely good reasons for such an inconsistency. (1) Assumed-size and assumed-rank. It must be possible to discover from C that a CFI_cdesc_t* formal parameter is associated with an assumed-size array. This COULD be done with a new CFI_attribute_t value. As a matter of principle, it should be possible to discover this directly from Fortran without resorting to a C wrapper to enquire the CFI_cdesc_t value. Furthermore, in Fortran the LBOUND, UBOUND, SHAPE, and SIZE intrinsics can be applied to assumed-rank arguments. The semantics of these need to be properly defined when the association is with an assumed-size dummy argument. Note that an assumed-size array could be described as having an "undefined" extent in its final dimension. Therefore, this proposal specifies: (a) that the UBOUND of the final dimension of an assumed-size array be 2 less than its LBOUND. Note that for a zero extent dimension, LBOUND=1 and UBOUND=0. It would be perfectly reasonable for an undefined extent dimension to be LBOUND=1 and UBOUND=-1. It is also be reasonable to pass the actual lower bound. Since LBOUND of an assumed-size array returns the actual lower bound, for simplicity we will say the same of an assumed-rank argument that is associated (at some level of argument association with an assumed-size array). (b) For the convenience of the C programmer, a new CFI_attribute_t value will also be chosen? The alternative to using UBOUND in this was to indicate would be to add a new intrinsic inquiry IS_ASSUMED_SIZE(variable) and to forbid inquiry of the upper bound or size of the final dimension when this returns true; this seems far more intrusive and harder to use than the UBOUND=LBOUND-2 solution. (2) CHARACTER(*) The obvious implementation choices are to pass the length separately, or to include it in CFI_cdesc_t either as an extra member in the struct, or as an extra (first) dimension. Passing the length separately might appear to have the advantage that many processors use this method for intra-Fortran calls with CHARACTER(*): this is an illusion though, as different processors vary as to the size and position (and presence!) of this hidden argument. We also want to have well-defined results when the user passes a CHARACTER(LEN/=1) actual argument to a TYPE(*) dummy - it is not exactly a surprise that some want to pass CHARACTER data in messages. Given the existence of multibyte character sets, the length member of CFI_cdesc_t should indicate not the character length but the size of an individual character. The disadvantage of dimension-fiddling is that this reduces the effective maximum rank of a CHARACTER actual argument from 15 to 14. This is not considered to be an actual hardship, but if it were, adding an extra member to CFI_cdesc_t (NOT using "length"!) would be the preferred solution. (4) CHARACTER(:) MPI has not asked for support of deferred-length CHARACTER, however from personal experience with modules interfacing to C libraries that return variable-length character strings I can say that this would be very useful. It also comes "for free" with inclusion of character length in CFI_cdesc_t, and is particularly easy with the dimension-fiddling version. (3) TYPE(*) and interoperability of arguments Interoperability of derived types with type-bound procedures or final procedures is probably not technically difficult at all, but might be "philosophically" difficult. This paper does not propose to address this at this time (one cannot build Rome in a day). Derived types with type parameters would however pose difficult implementation issues with CFI_cdesc_t so they should probably not be reconsidered at a future date. For other normally-non-interoperable types, this feature is "a mere matter of exposition". It is, however, slightly tricky, and the wording suggested will probably need careful review, probably at a later date. (5) I take it as a matter of principle that passing an assumed-size array to a C routine via a DIMENSION(..) argument has identical effect to passing it to a Fortran routine with DIMENSION(..) and then to a C routine with DIMENSION(..). (6) CHARACTER(LEN=?) With allocatable and pointer arguments, and character length/=1 via dimension fiddling we get the C routine being able to change the character length. The obvious way to cope with that is to require the Fortran one to be changeable too, i.e. CHARACTER(LEN=:). Note that this is ONLY for ALLOCATABLE and POINTER arguments. There is no real objection to allowing CHARACTER(LEN=something else) for ALLOCATABLE or POINTER, but to enable robust code writing we would then want to have some way to enquire whether the CHARACTER length was changeable. I believe that in general people doing this will either want to have true variable (i.e. deferred) character length, or won't mind doing a single additional pointer assignment if they don't want deferred length in the rest of their program. 4. High-level Proposal Summary ------------------------------ (1) An assumed-size array may be used as an actual argument to an assumed-rank dummy. (2) The UBOUND of an assumed-rank array is defined to be LBOUND-2. The SIZE of that dimension is defined to be -1, making the SIZE of the whole thing negative unless zero-sized. (3) A dummy argument of a BIND(C) routine of type CHARACTER may have CHARACTER(LEN=*) iff (if and only if) it is assumed-rank. (4) A dummy argument of a BIND(C) routine of type CHARACTER may have CHARACTER(LEN=:) iff it is assumed-rank. (5) Any type/kind/length of actual argument is interoperable with TYPE(*), except for derived types with final procedures, type-bound procedures, or type parameters. The interoperability with otherwise non-interoperable types however is limited to copying and use as arguments (and targets). 5. Edits to 10-165r2 -------------------- [2:2] "is" -> "specifies". {EDITORIAL: It doesn't describe all the unchanged stuff from 1539-1, so although it specifies an "upwardly compatible" (whatever that means) extension, it is not in itself the extended 1539-1.} [3:10+] 2.1p2+, insert new text: "An assumed-type object is unlimited polymorphic. \begin{UTI}[TR1] Note that "unlimited polymorphic" just means that its dynamic type is not limited. It does not mean that it is CLASS(*), or can be used everywhere that CLASS(*) can be used, because we already have a rule that limits the places that TYPE(*) can be used. It might prove less confusing to use a new term e.g. "unknown polymorphic" but that probably needs even more edits. \end{UTI}" {FATAL exposition error: sorry guys, but otherwise you run slap-bang into contradictions. This is the simplest fix. It also gets us the "type compatible with everything" feature with no further work.} [5:11-12] 3.2p2, replace with "An assumed-type dummy argument shall not correspond to an actual argument that is of a derived type that has type parameters, type-bound procedures, or final procedures." {This restriction is required. Here. The restriction against type-bound and final procedures is probably unnecessary but that should be discussed.} [5:22+5+] 3.3+, insert new subclause "3.4 Intrinsic procedures 3.4.1 SHAPE The description of SHAPE in ISO/IEC 1539-1:2010 is changed for an assumed-rank array that is associated with an assumed-size array; an assumed-size array has no shape, but in this case the result has a value of [ (SIZE (ARRAY, I), I=1, RANK (ARRAY)) ] 3.4.2 SIZE The description of SIZE in ISO/IEC 1539-1:2010 is changed in the following cases: - for an assumed-rank object that is associated with an assumed-size array, the result has a value of -1 if DIM is present and equal to the rank of ARRAY, and a negative value that is equal to PRODUCT ( [ (SIZE (ARRAY, I), I=1, RANK (ARRAY)) ] ) if DIM is not present; - for an assumed-rank object that is associated with a scalar, the result has a value of 1. 3.4.3 UBOUND The description of UBOUND in ISO/IEC 1539-1:2010 is changed for an assumed-rank object that is associated with an assumed-size array; the result has a value of LBOUND (ARRAY, RANK (ARRAY)) - 2." {NOTE to the EDITOR: "-1" is written "$-1$" in LaTeX, and the minus sign in the expression immediately above is written $-$. Note also careful spacing of the expressions: if there is a bad line break you can use ~ instead of space...} {Further note: These could instead be placed in 4 by splitting 4 into 4.1 Changes to intrinsic procedures and 4.2 New intrinsic procedure, or some such, but this placement seems workable for now.} [7:1] "Intrinsic" -> "New intrinsic". {MINOR EDITORIAL: We do want to say stuff about existing intrinsic procedures elsewhere, so this is a bit misleading.} OPTIONAL? [9:33] Replace "or assumed-shape" by ", assumed-shape or assumed-size". [10:12+] 5.2.3, at the end, append new text: "If the actual argument is of type CHARACTER, or is of assumed type eventually associated with an actual argument of type CHARACTER, the member elem_len shall contain the sizeof() of a variable of character length 1 of that type and kind. The first element of member dim shall contain a lower bound of 1 with a stride equal to elem_len and upper bound equal to the character length of the actual argument; all other elements shall correspond to a dimension one less than for non-CHARACTER types. \begin{UTI}[TR2] This approach has not achieved consensus; people want the rank field to match the RANK intrinsic. The other main contenders are: (1) fold the character length into elem_len, (2) add an additional character length member. In any case more edits are required. \end{UTI} If any actual argument associated with the dummy argument is an assumed-size array, the array shall be simply contiguous, the member attribute shall be CFI_attribute_unknown_size and the member extent of the last dimension of member dim is equal to (CFI_index_t)-2." {Actually in this case, since we are providing an "attribute" flag, I don't mind making it undefined. Note casting probably not needed since we made this a signed integer type, but left for clarify.} [10:24+5+] 5.2.5, table 5.1, insert new row after CFI_attribute_pointer "CFI_attribute_unknown_size & assumed-size \\". [10:27] 5.2.5p4, Change "or a pointer" To ", a pointer, or associated with an assumed-size argument. CFI_attribute_unknown_size specifies an object that is, or is argument-associated with, an assumed-size dummy argument". [12:38+] 5.2.7p2+, insert new paragraph "A C descriptor that describes an object of type CHARACTER shall have rank $\geq$ 1, and dim[0].sm = elem_len." {The C user will be making up character stuff; require the first dimension to be contiguous so the Fortran doesn't get an unpleasant surprise.} [13:14+] 5.2.8p2, item (6), insert new subitem after item (a), "(a2) the dummy argument is a nonallocatable, nonpointer variable of type CHARACTER with assumed length, and corresponds to a formal parameter of type CFI_cdesc_t," {Allow the obvious simple use of CHARACTER(*).} [13:18+] 5.2.8p2+ insert new paragraph "If a dummy argument in an interoperable interface is of type CHARACTER and is allocatable or a pointer, its character length shall be deferred." {Note the existing rules in this subclause were SERIOUSLY broken; this repairs the character length hole. There might be other holes remaining!} 6. Outstanding issues --------------------- (1) I don't *think* we need to say this, but if we do we should say it with more specificity as to which interoperability requirements we are overriding. "Notwithstanding 15.3.2 paragraph 1, if a dummy argument of a procedure with the BIND attribute has type TYPE(*), the actual argument is considered to be interoperable. NOTE: This implies that derived types with components of any intrinsic type are also interoperable with TYPE(*) dummy arguments." (2) If we must have a "type" member, it needs a value something like "CFI_type_non_interoperable a non-interoperable data type" with some appropriate witter. But since this type member has other existing problems that are best handled by deleting it, this paper does not propose adding this entry to the table (or other such fixes) at this time. ===END===