[MPI3 Fortran] Fwd: Serious problem/bug in MPI libraries with the alignment of MPI_DOUBLE_PRECISION

Tue Sep 27 16:40:10 CDT 2011

Here the summary of the feedback and a proposal
to solve this inconsistency.

----- Forwarded Message -----
From: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
To: "Jeff Squyres" <jsquyres at cisco.com>, "Shinji Sumimoto" <s-sumi at labs.fujitsu.com>, "Hubert Ritzdorf" <hubert.ritzdorf at emea.nec.com>, "Howard Pritchard" <howardp at cray.com>, "Brian Smith" <smithbr at us.ibm.com>, "Charles J Archer" <archerc at us.ibm.com>, "Rajeev Thakur" <thakur at mcs.anl.gov>, "Fab Tillier" <ftillier at microsoft.com>, "Bill Long" <longb at cray.com>, "Bill Gropp" <wgropp at uiuc.edu>, "Richard Graham" <rlgraham at ornl.gov>
Cc: "N.M. Maclaren" <nmm1 at cam.ac.uk>, "Iain Bason" <iain.bason at oracle.com>
Sent: Tuesday, September 27, 2011 11:38:45 PM
Subject: Re: Serious problem/bug in MPI libraries with the alignment of MPI_DOUBLE_PRECISION

Dear all,

thank you very much for the fast answers.
The result is maximal bad,
i.e., about 50% of the MPIs use SEQUENCE alignments and other
about 50% use BIND(C) based calculation of the alignment.

Here the results (also attached as summary_out.txt).
Please read carefully my proposal after this summary sheet.

In all protocols:
----------------- 
 Size of one Fortran REAL             =            4
 Size of one Fortran DOUBLE PRECISION =            8

Results:
--------

 a / b / c = Alignments of:
 - DOUBLE PRECISION within a SEQUENCE derived type = a
 - DOUBLE PRECISION within a BIND(C)  derived type = b
 - MPI_DOUBLE_PRECISION (= k_i in MPI-2.2 p.78:45) = c

A) The 4 / 8 / 4 group = alignment of MPI_DOUBLE_PRECISION is correct for (old-style) SEQUENCE derived types 

JeffSquyres_OpenMP_linux-x86-64-ifort.txt:    Intel     + OpenMPI  on x86-64      Results: 4 / 8 / 4
FabTillier_Intel+MicrosoftMPI_out.txt:        Intel     + MS-MPI   on ?           Results: 4 / 8 / 4

B) The 4 / 8 / 8 group = alignment of MPI_DOUBLE_PRECISION is correct for (modern) BIND(C) derived types 

Rajeev_mpich_out.ifort.x86-64.txt:            Intel     + mpich2   on x86-64      Results: 4 / 8 / 8
Rolf_asama+cray_output.txt:                   Intel     + NEC-MPI? on asama       Results: 4 / 8 / 8
BillLong_Cray_5compiler_out.txt:              Intel     + CrayMPI  on Cray XE6    Results: 4 / 8 / 8
BrianSmith-IBM_outlog-bgp-xl.txt:             Intel     + IBM-XL   on BGP         Results: 4 / 8 / 8
BrianSmith-IBM_outlog-xlf.txt:                xlf       + IBM-XL   on ?           Results: 4 / 8 / 8

C) No problems because all DOUBLE PRECISION alignments are identical = 8

BillLong_Cray_5compiler_out.txt:              PGI       + CrayMPI  on Cray XE6    Results: 8 / 8 / 8
BillLong_Cray_5compiler_out.txt:              Cray      + CrayMPI  on Cray XE6    Results: 8 / 8 / 8
BillLong_Cray_5compiler_out.txt:              gfortran  + CrayMPI  on Cray XE6    Results: 8 / 8 / 8
BillLong_Cray_5compiler_out.txt:              pathscale + CrayMPI  on Cray XE6    Results: 8 / 8 / 8
BrianSmith-IBM_outlog-gcc.txt:                gfortran  + IBM-XL   on BGQ         Results: 8 / 8 / 8
JeffSquyres_OpenMP_linux-x86-64-gfortran.txt: gfortran  + OpenMPI  on x86-64      Results: 8 / 8 / 8
JeffSquyres_OpenMP_osx-x86-84-gfortran.txt:   gfortran  + OpenMPI  on osx-x86-64  Results: 8 / 8 / 8
Rajeev_gnu_laptop_out.txt:                    gfortran  + mpich2   on laptop      Results: 8 / 8 / 8
Rajeev_mpich_out.gfortran.x86-64.txt:         gfortran  + mpich2   on x86-64      Results: 8 / 8 / 8

D) No problems because all DOUBLE PRECISION alignments are identical = 4

Rajeev_mpich_out.gfortran.ia32.txt:           gfortran  + mpich2   on ia32        Results: 4 / 4 / 4
Rajeev_mpich_out.ifort.ia32.txt:              Intel     + mpich2   on ia32        Results: 4 / 4 / 4

Proposal:
---------

The goal of a standard is the possibility to write portable code.
Therefore the answer should be standardized and not implementation-specific,
or in other words, not defining it is the worst solution.

I expect only a few Fortran codes are using arrays of derived types
in Fortran.
The oldest library is mpich.
Therefore I would propose to standardize Fortran alignments are based
on BIND(C) which should be the same as in C.

This would imply that IBM-XL and mpich2 and derivatives need no change.
But it also implies that the younger OpenMPI and Microsoft-MPI 
need a change. These implementation may add an environment switch
to keep the old behavior if this switch is set,
but I would expect that there is nearly no user 
who needs or want to use this switch.

Fab and Jeff, could you live with this proposal?

Best regards
Rolf

PS: Additional remarks:
    Yes, some compiler detect that DOUBLE PR. on 4 byte alignment is inefficient,
    but this is only an information.
    Other compilers detect, that we use DOUBLE PRECISION in BIND(C) derived types
    instead of using the C double through these new intrinsic C types in Fortran.
    This is also okay and all compiler work as expected, they use the C alignment.

> Rolf Rabenseifner wrote on Tue, 27 Sep 2011 at 09:14:33
> 
> > In my output from Cray, the compilers
> >  - gnu
> >  - pgi
> >  - cray
> > all use 8 byte double alignment in C and Fortran.
> > ==> no problem with those compilers on the tested platform.
> > Maybe we have differences between IA64 and 32 architectures
> > for the same compiler.
> >
> > Important is to check MPI with those compilers that have
> > Fortran 4 byte double precision alignment (e.g. Intel).
> >
> > Best regards
> > Rolf
> >

> >> On Sep 27, 2011, at 10:21 AM, Rolf Rabenseifner wrote:
> >>
> >>> Dear all, (now with attachment)
> >>>
> >>> as far as I know, you represent some MPI libraries that are
> >>> not directly based on another one in the list:
> >>> - mpich2: Rajeev
> >>> - OpenMPI: Jeff
> >>> - IBM: Rich Treumann
> >>> - NEC: Hubert
> >>> - Fujitsu: Shinji Sumimoto
> >>> - Microsoft: Fab
> >>> (Which independent library is missing on this list?)
> >>> (Who of the list uses already mich2 or OpenMPI as bases
> >>> or does not provide a Fortran binding,
> >>> and can be therefore removed from this list?)
> >>>
> >>> The problem is simple and has at the end only three numbers:
> >>>
> >>>  Alignments of:
> >>>   - DOUBLE PRECISION within a SEQUENCE derived type = 4 (e.g. with
> >>>   Intel)
> >>>   - DOUBLE PRECISION within a BIND(C) derived type = 8
> >>>   - MPI_DOUBLE_PRECISION (= k_i in MPI-2.2 p.78:45) = 4 or 8
> >>> The details:
> >>> Which is the alignment of Fortran DOUBLE PRECISION according to
> >>> k_i on MPI-2.2, page 78 line 45 and page 96 line 42.
> >>>
> >>> Since 1977 (Fortran 77) in Fortran COMMON blocks and 1990 (Fortran
> >>> 90)
> >>> in Fortran SEQUENCE derived types (which have the same memory
> >>> layout
> >>> according to the SEQUENCE-rules), the MPI alignment and the
> >>> Fortran
> >>> alignment in such constructs should be the same, e.g., the Intel
> >>> compiler has alignment 4 !!!
> >>>
> >>> This may be significantly different to C, where Intel uses
> >>> alignment
> >>> 8!
> >>>
> >>> With Fortran 2003, we got the Fortran BIND(C) derived types.
> >>> Here, a Fortran DOUBLE PRECISION has with Intel's compiler
> >>> the alignment 8.
> >>>
> >>> As far as I understand, mpich2 and Cray produce following results
> >>> with
> >>> an Intel compiler:
> >>>
> >>> Alignments of:
> >>> - DOUBLE PRECISION within a SEQUENCE derived type = 4
> >>> - DOUBLE PRECISION within a BIND(C) derived type = 8
> >>> - MPI_DOUBLE_PRECISION (= k_i in MPI-2.2 p.78:45) = 8
> >>>
> >>> This is definitely wrong compared to MPI-1.1 and MPI-2.0
> >>> but may be helpful for MPI-3.0 when we like to switch
> >>> to a definition that is based on BIND(C).
> >>>
> >>> Question:
> >>> What is the output of the attached test program when running
> >>> with
> >>> - your MPI library
> >>> - and your compiler
> >>> - and with Intel's compiler
> >>>
> >>> Please can you send me the output of such mpiruns?
> >>>
> >>> Thanks in advance and best regards
> >>> Rolf
> >>>
> >>> PS: We need not to care about 8/8/8.
> >>>    Only 4/8/x causes problems:
> >>>    If all those have x=8 then I'll propose to
> >>>    adopt the MPI-3.0 to the reality.
> >>>    If we have some implementations with 4/8/8 and
> >>>    others with 4/8/4 then we have a serious problem
> >>>    because only one answer can be correct,
> >>>    and the question would be whether the historical
> >>>    answer (x=4) or the modern answer (x=8)
> >>>    should be chosen.

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: summary_out.txt
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-fortran/attachments/20110927/4650cf62/attachment-0001.txt>