[Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements
Rolf Rabenseifner
rabenseifner at hlrs.de
Tue Oct 29 04:19:30 CDT 2019
Dear Jim and all,
I'm not sure whether I'm really able to understand your email.
I take the MPI view:
(1) An absolute address can stored in an MPI_Aint variable
with and only with MPI_Get_address or MPI_Aint_add.
(2) A positive or negative number of bytes or a relative address
which is by definition the amount of bytes between two locations
in a MPI "sequential storage" (MPI-3.1 page 115)
can be assigned with any method to an MPI_Aint variable
as long as the original value fits into MPI_Aint.
In both languages automatic type cast (i.e., sign expansion)
is done.
(3) If users misuse MPI_Aint for storing anything else into MPI_Aint
variable then this is out of scope of MPI.
If such values are used in a minus operation then it is
out of the scope of MPI whether this makes sense.
If the user is sure that the new value falls into category (2)
then all is fine as long as the user is correct.
I expect that your => is not a "greater or equal than".
I expect that you noticed assignments.
> intptr_t => MPI_Aint
"intptr_t: integer type capable of holding a pointer."
> uintptr_t => ??? (Anyone remember the MPI_Auint "golden Aint" proposal?)
"uintptr_t: unsigned integer type capable of holding a pointer."
may fall exactly exactly into (3) when used for pointers.
Especially on a 64 bit system the user may have in the future exactly
the problems (a), (a1), (a2) and (b) as described below.
But here, the user is responsible, to for example implement (a3),
whereas for MPI_Get_address, the implementors of the MPI library
are responsible and the MPI Forum may be responsible for giving
the correct advices.
By the way, the golden MPI_Auint was never golden.
Such need was "resolved" by introducing MPI_Aint_diff and MPI_Aint_add
in MPI-3.1.
> ptrdiff_t => MPI_Aint
"std::ptrdiff_t is the signed integer type of the result of subtracting two pointers."
may perfectly fit to (2).
All of the following falls into category (2):
> size_t (sizeof) => MPI_Count, int
"sizeof( type ) (1)
sizeof expression (2)
Both versions are constant expressions of type std::size_t."
> size_t (offsetof) => MPI_Aint, int
"Defined in header <cstddef>
#define offsetof(type, member) /*implementation-defined*/
The macro offsetof expands to an integral constant expression
of type std::size_t, the value of which is the offset, in bytes,
from the beginning of an object of specified type to ist
specified member, including padding if any."
Note that this offsetof has nothing to do with MPI_Offset.
On a system with less than 2*31 byte and 4-byte int, it is guaranteed
that size_t => int works.
On a system with less than 2*63 byte and 8-byte MPI_Aint, it is guaranteed
that size_t => MPI_Aint works.
Problem: size_t is unsigned, int and MPI_Aint are signed.
MPI_Count should be defined in a way that on systems with more than
2**63 Bytes of disc space, that MPI_Count can hold such values,
because
int .LE. {MPI_Aint, MPI_Offset} .LE. MPI_Count
Therefore size_t => MPI_Count should always work.
> ssize_t => Mostly for error handling. Out of scope for MPI?
"In short, ssize_t is the same as size_t, but is a signed type -
read ssize_t as “signed size_t”. ssize_t is able to represent
the number -1, which is returned by several system calls
and library functions as a way to indicate error.
For example, the read and write system calls: ...
ssize_t read(int fildes, void *buf, size_t nbyte); ..."
ssize_t fits therefore better to MPI_Aint, because both
are signed types that can hold byte counts, but
the value -1 in a MPI_Aint variable stands for a
byte displacement of -1 bytes and not for an error code -1.
All use of (2) is in principle no problem.
------------------------------------------
All the complex discussiuon of the last days is about (1):
(1) An absolute address can stored in an MPI_Aint variable
with and only with MPI_Get_address or MPI_Aint_add.
In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be not portable),
we also allow
MPI_Aint variable := absolute address in MPI_Aint variable
+ or -
a number of bytes (in any integer type).
The result is then still in category (1).
For the difference of two absolute addresses,
MPI_Aint_diff can be used. The result is than MPI_Aint of category (2)
In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be not portable),
we also allow
MPI_Aint variable := absolute address in MPI_Aint variable
- absolute address in MPI_Aint variable.
The result is then in category (2).
The problems we discuss the last days are about systems
that internally use unsigned addresses and the MPI library stores
these addresses into MPI_Aint variables and
(a) a sequential storage can have virtual addresses that
are both in the area with highest bit =0 and other addresses
in the same sequential storage (i.e., same array or structure)
with highest bit =1.
or
(b) some higher bits contain segment addresses.
(b) is not a problem as long as a sequential storage resides
always within one Segment.
Therefore, we only have to discuss (a).
The two problems that we have is
(a1) that for the minus operations an integer overflow will
happen and must be ignored.
(a2) if such addresses are expanded to larger variables,
e.g., MPI_Count with more bits in MPI_Count than in MPI_Aint,
sign expansion will result in completely wring results.
And here, the most simple trick is,
(a3) that MPI_Get_address really shall
map the contiguous unsigned range from 0 to 2**64-1 to the
signed (and also contiguous) range from -2**63 to 2**63-1
by simple subtracting 2**63.
With this simple trick in MPI_Get_address, Problems
8a1) and (a2) are resolved.
It looks like that (a) and therefore (a1) and (a2)
may be far in the future.
But they may be less far in the future, if a system may
map the whole applications cluster address space
into virtual memory (not cache coherent, but accessible).
And all this is never or only partial written into the
MPI Standard, also all is (well) known by the MPI Forum,
with the following exceptions:
- (a2) is new.
- (a1) is solved in MPI-3.1 only for MPI_Aint_diff and
MPI_Aint_add, but not for the operators - and +
if a user will switch on integer overflow detection
in the future when we will have such large systems.
- (a3) is new and in principle solves the problem also
for + and - operators.
At lease (a1)+(a2) should be added as rationale to MPI-4.0
and (a3) as advice to implementors within the framework
of big count, because (a2) is newly coming with big count.
I hope this helps a bit if you took the time to read
this long email.
Best regards
Rolf
----- Original Message -----
> From: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> To: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> Cc: "Jim Dinan" <james.dinan at gmail.com>, "James Dinan" <james.dinan at intel.com>
> Sent: Monday, October 28, 2019 5:07:37 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements
> Still not sure I see the issue. MPI's memory-related integers should map to
> types that serve the same function in C. If the base language is broken for
> segmented addressing, we won't be able to fix it in a library. Looking at the
> mapping below, I don't see where we would have broken it:
>
> intptr_t => MPI_Aint
> uintptr_t => ??? (Anyone remember the MPI_Auint "golden Aint" proposal?)
> ptrdiff_t => MPI_Aint
> size_t (sizeof) => MPI_Count, int
> size_t (offsetof) => MPI_Aint, int
> ssize_t => Mostly for error handling. Out of scope for MPI?
>
> It sounds like there are some places where we used MPI_Aint in place of size_t
> for sizes. Not great, but MPI_Aint already needs to be at least as large as
> size_t, so this seems benign.
>
> ~Jim.
>
> On Fri, Oct 25, 2019 at 8:25 PM Dinan, James via mpiwg-large-counts < [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
>
>
>
>
>
> Jeff, thanks so much for opening up these old wounds. I’m not sure I have enough
> context to contribute to the discussion. Where can I read up on the issue with
> MPI_Aint?
>
>
>
> I’m glad to hear that C signed integers will finally have a well-defined
> representation.
>
>
>
> ~Jim.
>
>
>
>
> From: Jeff Hammond < [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> >
> Date: Thursday, October 24, 2019 at 7:03 PM
> To: "Jeff Squyres (jsquyres)" < [ mailto:jsquyres at cisco.com | jsquyres at cisco.com
> ] >
> Cc: MPI BigCount Working Group < [ mailto:mpiwg-large-counts at lists.mpi-forum.org
> | mpiwg-large-counts at lists.mpi-forum.org ] >, "Dinan, James" < [
> mailto:james.dinan at intel.com | james.dinan at intel.com ] >
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts,
> sizes, and byte and nonbyte displacements
>
>
>
>
>
> Jim (cc) suffered the most in MPI 3.0 days because of AINT_DIFF and AINT_SUM, so
> maybe he wants to create this ticket.
>
>
>
>
>
> Jeff
>
>
>
>
>
> On Thu, Oct 24, 2019 at 2:41 PM Jeff Squyres (jsquyres) < [
> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] > wrote:
>
>
>
>
>
> Not opposed to ditching segmented addressing at all. We'd need a ticket for this
> ASAP, though.
>
>
>
>
>
> This whole conversation is predicated on:
>
>
>
>
>
> - MPI supposedly supports segmented addressing
>
>
> - MPI_Aint is not sufficient for modern segmented addressing (i.e., representing
> an address that may not be in main RAM and is not mapped in to the current
> process' linear address space)
>
>
>
>
>
> If we no longer care about segmented addressing, that makes a whole bunch of
> BigCount stuff a LOT easier. E.g., MPI_Aint can basically be a
> non-segment-supporting address integer. AINT_DIFF and AINT_SUM can go away,
> too.
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Oct 24, 2019, at 5:35 PM, Jeff Hammond via mpiwg-large-counts < [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
>
>
>
>
>
> Rolf:
>
>
>
> Before anybody spends any time analyzing how we handle segmented addressing, I
> want you to provide an example of a platform where this is relevant. What
> system can you boot today that needs this and what MPI libraries have expressed
> an interest in supporting it?
>
>
>
>
>
> For anyone who didn't hear, ISO C and C++ have finally committed to
> twos-complement integers ( [
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html |
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html ] , [
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm |
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm ] ) because modern
> programmers should not be limited by hardware designs from the 1960s. We should
> similarly not waste our time on obsolete features like segmentation.
>
>
>
>
>
> Jeff
>
>
>
>
>
> On Thu, Oct 24, 2019 at 10:13 AM Rolf Rabenseifner via mpiwg-large-counts < [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
>
>
>
>
>> I think that changes the conversation entirely, right?
>
> Not the first part, the state-of-current-MPI.
>
> It may change something for the future, or a new interface may be needed.
>
> Please, can you describe how MPI_Get_address can work with the
> different variables from different memory segments.
>
> Or whether a completely new function or a set of functions is needed.
>
> If we can still express variables from all memory segments as
> input to MPI_Get_address, there may be still a way to flatten
> the result of some internal address-iquiry into a flattened
> signed integer with the same behavior as MPI_Aint today.
>
> If this is impossible, then new way of thinking and solution
> may be needed.
>
> I really want to see examples for all current stuff as you
> mentioned in your last email.
>
> Best regards
> Rolf
>
> ----- Original Message -----
>> From: "Jeff Squyres" < [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
>> To: "Rolf Rabenseifner" < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ]
>> >
>> Cc: "mpiwg-large-counts" < [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
>> mpiwg-large-counts at lists.mpi-forum.org ] >
>> Sent: Thursday, October 24, 2019 5:27:31 PM
>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts,
>> sizes, and byte and nonbyte displacements
>
>> On Oct 24, 2019, at 11:15 AM, Rolf Rabenseifner
>> < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] <mailto: [
>> mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] >> wrote:
>>
>> For me, it looked like that there was some misunderstanding
>> of the concept that absolute and relative addresses
>> and number of bytes that can be stored in MPI_Aint.
>>
>> ...with the caveat that MPI_Aint -- as it is right now -- does not support
>> modern segmented memory systems (i.e., where you need more than a small number
>> of bits to indicate the segment where the memory lives).
>>
>> I think that changes the conversation entirely, right?
>>
>> --
>> Jeff Squyres
>> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ] <mailto: [
>> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
>
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email [ mailto:rabenseifner at hlrs.de |
> rabenseifner at hlrs.de ] .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . [ http://www.hlrs.de/people/rabenseifner |
> www.hlrs.de/people/rabenseifner ] .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
>
>
>
>
>
>
>
>
> --
>
>
> Jeff Hammond
> [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
>
>
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
>
>
>
>
>
>
> --
> Jeff Squyres
> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ]
>
>
>
>
>
>
>
>
>
>
>
> --
>
>
> Jeff Hammond
> [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
>
> _______________________________________________
> mpiwg-large-counts mailing list
> mpiwg-large-counts at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
--
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
More information about the mpiwg-large-counts
mailing list