[MPIWG Fortran] MPI-3 ticket 349: Fortran question

Fri Dec 20 10:57:39 CST 2013

[The previous version of this was held for moderation, so I subscribed and disabled email delivery.  If you want me to participate in this thread, please keep me CCed.]

On Dec 20, 2013, at 8:22 AM, Jim Dinan <james.dinan at gmail.com> wrote:

> Jeff,
> 
> I don't think that anyone is making an argument that is specific to 32b
> architectures.  The concern is that the MPI standard currently requires:
> 
> (1) a two's complement architecture where overflow/underflow of signed
> arithmetic on an unsigned quantity always results in the right unsigned
> value

I think the hardware support for two's complement requirement is probably not a major problem, since sign-and-magnitude and one's complement machines seem unlikely to make a resurgence outside of some weird DSPs.  Though the C standard certainly supports all three representations, this aspect is largely academic and not a real issue for the MPI community.

The larger issue is that the C standard explicitly declares signed integer arithmetic overflow to be undefined.  There are two reasons this is a real concern:

A) Compilers really do perform optimizations related to signed integer overflow: http://www.airs.com/blog/archives/120

B) Some platforms/compilers can/will trap on signed integer overflow, e.g., GCC with "-ftrapv".

> - OR -
> (2) an MPI_Aint type that is 2x the size of an address to ensure that
> overflow/underflow never occurs (because MPI_BOTTOM is an arbitrary address
> in Fortran)
> 
> Given that every system I know of falls into category (1) and there is a
> solution (2) for all other systems, I am not trying to recall why we felt
> that this problem needs to be solved through new arithmetic routines.  I
> believe there was a concern that 128-bit integers are not standard and not
> well supported by most languages/architectures, so (2) may not be a
> workable solution.  Maybe Dave G [added to cc] remembers our rationale?

Systems don't seem to have easily-used, portable language support for >64-bit integer math right now.  Even in C11 the largest minimum sizes are only 64-bit; anything else is an extension or a choice to define "long long" as larger.  I just checked a 64-bit x86_64 Linux system here at Cisco and couldn't find an obvious 128-bit signed integer type in a couple of minutes of looking.

Also, though the MPI_BOTTOM issue is annoying, it's irrelevant.  There's nothing to prevent an operating system from giving you an address with the most significant bit set, so you have the problem even if MPI_BOTTOM==0 everywhere.  Indeed, I just saw a bug report fly by the other day with a such an address (0xffffffff7d3ca461) on Solaris: http://www.open-mpi.org/community/lists/users/2013/12/23149.php

So I agree with Jim that (2) is not a workable solution.

-Dave