[Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

Rolf Rabenseifner rabenseifner at hlrs.de
Fri Oct 25 07:20:22 CDT 2019


Dear Jeff,

> If we no longer care about segmented addressing, that makes a whole bunch of
> BigCount stuff a LOT easier. E.g., MPI_Aint can basically be a
> non-segment-supporting address integer. 

> AINT_DIFF and AINT_SUM can go away, too.

Both statements are -- in my opinion -- incorrect.
And the real problem is really ugly, see below.

After we seem to agree that MPI_Ainit is used as it is used, i.e.,
to currently store
 - absolute addresses (which means the bits of a 64-bit-unsigned address 
   interpreted as a signed twos-complement 64 bit integer
   i.e., values between -2**63 and + 2**63-1
   (and only here is the discussion about whether some higher bits may 
    be used to address segments)
 - relative addresses between -2**63 and + 2**63-1
 - byte counts between 0 and 2**63-1

And that for two absolute addresses within the same "sequential storage"
(defined in MPI-3.1 Sect. 4.1.12 page 115 lines 17-19), it is allowed
to use a minus operator (as Long as integer overflow detection is 
switched off) or MPI_Aint_diff.

In principle, the MPI standard is not fully consistent with that:

MPI-3.1 page 102 lines 45-46 tell:
 "To ensure portability, arithmetic on MPI addresses 
  must 
  be performed using the MPI_AINT_ADD and MPI_AINT_DIFF functions."
and 
> > ... MPI-3.1 2.5.6 "Absolute
> > Addresses and Relative Address Displacements" p16:39-43:
> > 
> > "For retrieving absolute addresses or any calculation with absolute addresses, one
> > should
> > use the routines and functions provided in Section 4.1.5. Section
> > 4.1.12 provides additional rules for the correct use of absolute addresses. For
> > expressions with relative displacements or other usage without absolute
> > addresses, intrinsic operators (e.g., +, -, *) can be used."

And now about large counts, especially if we want to extent routines
that currently use MPI_Aint to something larger, i.e., MPI_Aint_x or MPI_Count.

Here, the major problem is the automatic cast within an assignment
 
  MPI_Aint addr;
  MPI_Aint_x (or MPI_Count) addr_x;

  MPI_Get_address(...., &addr);
  addr_x = addr;  // ***this Statement is the problem****

let's take my example fro a previous email (using 8 bit MPI_Aint)

  addr1 01111111 = 127 (signed int) = 127 (unsigned int)  
  addr2 10000001 = -127  (signed int) = 129 (unsigned int)
       
Internally the addreses are viewed by the hardware and OS as unsigned.
MPI_Aint is interpreting the same bits as signed int.

addr2-addr1 = 129 -127 = 2 (as unsigned int)
but in a real application code with "-" operator:
            = -127 -127 = -254
  --> signed int Overflow because 8 bit can express only -128 .. +127
  --> detected or automatically corrected with +256 --> -254+256 = 2  

And now with 12 bit MPI_Aint_x

  addr1_x := addr1  results in (by sign propagation)
  addr1_x = 000001111111 = 127 (signed int) = 127 (unsigned int) 

  addr2_x := addr2  results in (by sign propagation) 
  addr2_x = 111110000001 = -127  (signed int) = 129 (unsigned int)

and then
  addr2_x - addr1_x = -127 - 127 = -254 
which is a normal integer within 12bit, 
and therefore ***NO*** overflow correction!!!!!! 

And therefore a completely ***wrong*** result.

Using two different types for absolute addresses seems to be a 
real problem in my opinion.


And of course signed 64bit MPI_Aint does allow to specify only
2**63-1 bytes, which is about 8*1000**6 Bytes,
which is only 8 Exabyte.

On systems with less than 8 Exabyte per MPI process, this is not
a problem for message passing, but it is a problem for I/O,
and therefore for derived datatypes.
And derived datayptes use MPI_Aint at several locations,
and some of them with the possibility of providing absolute addresses.  


A solution of this problem seems to be not trivial, or is there one?

And always doing MPI_Aint with more than 8 bytes is also a no-option, 
based on the ABI discussion, and is also a waste of memory.


Best regards
Rolf


----- Original Message -----
> From: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> To: "Jeff Squyres" <jsquyres at cisco.com>
> Cc: "Jeff Hammond" <jeff.science at gmail.com>, "James Dinan" <james.dinan at intel.com>, "mpiwg-large-counts"
> <mpiwg-large-counts at lists.mpi-forum.org>
> Sent: Friday, October 25, 2019 1:02:35 AM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

> Jim (cc) suffered the most in MPI 3.0 days because of AINT_DIFF and AINT_SUM, so
> maybe he wants to create this ticket.
> 
> Jeff


> On Thu, Oct 24, 2019 at 2:41 PM Jeff Squyres (jsquyres) < [
> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] > wrote:
> 
> 
> Not opposed to ditching segmented addressing at all. We'd need a ticket for this
> ASAP, though.
> 
> This whole conversation is predicated on:
> 
> - MPI supposedly supports segmented addressing
> - MPI_Aint is not sufficient for modern segmented addressing (i.e., representing
> an address that may not be in main RAM and is not mapped in to the current
> process' linear address space)
> 
> If we no longer care about segmented addressing, that makes a whole bunch of
> BigCount stuff a LOT easier. E.g., MPI_Aint can basically be a
> non-segment-supporting address integer. AINT_DIFF and AINT_SUM can go away,
> too.


> On Oct 24, 2019, at 5:35 PM, Jeff Hammond via mpiwg-large-counts < [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> 
> Rolf:
> 
> Before anybody spends any time analyzing how we handle segmented addressing, I
> want you to provide an example of a platform where this is relevant. What
> system can you boot today that needs this and what MPI libraries have expressed
> an interest in supporting it?
> 
> For anyone who didn't hear, ISO C and C++ have finally committed to
> twos-complement integers ( [
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html |
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html ] , [
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm |
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm ] ) because modern
> programmers should not be limited by hardware designs from the 1960s. We should
> similarly not waste our time on obsolete features like segmentation.
> 
> Jeff
> 
> On Thu, Oct 24, 2019 at 10:13 AM Rolf Rabenseifner via mpiwg-large-counts < [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> 
> 
>> I think that changes the conversation entirely, right?
> 
> Not the first part, the state-of-current-MPI.
> 
> It may change something for the future, or a new interface may be needed.
> 
> Please, can you describe how MPI_Get_address can work with the
> different variables from different memory segments.
> 
> Or whether a completely new function or a set of functions is needed.
> 
> If we can still express variables from all memory segments as
> input to MPI_Get_address, there may be still a way to flatten
> the result of some internal address-iquiry into a flattened
> signed integer with the same behavior as MPI_Aint today.
> 
> If this is impossible, then new way of thinking and solution
> may be needed.
> 
> I really want to see examples for all current stuff as you
> mentioned in your last email.
> 
> Best regards
> Rolf


----- Original Message -----
> From: "HOLMES Daniel" <d.holmes at epcc.ed.ac.uk>
> To: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> Cc: "Rolf Rabenseifner" <rabenseifner at hlrs.de>, "Jeff Squyres" <jsquyres at cisco.com>
> Sent: Thursday, October 24, 2019 6:41:34 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

> Hi Rolf & Jeff,
> 
> I think this wiki article is instructive on this topic also:
> https://en.wikipedia.org/wiki/X86_memory_segmentation
> 
> This seems like a crazy memory addressing system to me personally, but it is a
> (historic) example of a segmented addressing approach that MPI_Aint can
> support.
> 
> The “strange properties” for arithmetic are strange indeed, depending on what
> the MPI_Aint stores and how.
> 
> If MPI_Aint was 20 bits long and stores only the address, then it cannot be used
> to determine uniquely which segment is being used or what the offset is within
> that segment (there are 4096 possible answers). Does MPI need that more
> detailed information? Probably - because segments were a way of implementing
> memory protection, i.e. accessing a segment you did not have permission to
> access led to a “segmentation fault” error. I do not know enough about these
> old architectures to say whether an attempt to access the *same byte* using two
> different segment:offset pairs that produce the *same* address could result in
> different behaviour. That is, if I have access permissions for segment 3 but
> not for segment 4, I can access {seg=3,offset=2^16-16} but can I access
> {segment=4,offset=2^16-32}, which is the same byte? If not, then MPI needs to
> store segment and offset inside MPI_Aint to be able to check and to set
> registers correctly.
> 
> If MPI_Aint is 32 bits long and stores the segment in the first 16 bits and the
> offset in the last 16 bits, then the 20 bit address can be computed in a single
> simple instruction and both segment and offset are immediately retrievable.
> However, doing ordinary arithmetic with this bitwise representation is unwise
> because it is a compound structure type. Let us subtract 1 from an MPI_Aint of
> this layout which stores offset of 0 and some non-zero segment. We get offset
> (2^16-1) in segment (s-1), which is not 1 byte before the previous MPI_Aint
> because segments overlap. The same happens when adding and overflowing the
> offset portion - it changes the segment in an incorrect way. Segment++ moves
> the address forward only 16 bytes, not 2^16 bytes.
> 
> The wrap-around from the end of the address space back to the beginning is also
> a source of strange properties for arithmetic.
> 
> One of the key statements from that wiki page is this:
> 
> The root of the problem is that no appropriate address-arithmetic instructions
> suitable for flat addressing of the entire memory range are available.[citation
> needed] Flat addressing is possible by applying multiple instructions, which
> however leads to slower programs.
> 
> Cheers,
> Dan.
>> Dr Daniel Holmes PhD
> Architect (HPC Research)
> d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
> Phone: +44 (0) 131 651 3465
> Mobile: +44 (0) 7940 524 088
> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
>> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
>
 
> ----- Original Message -----
>> From: "Jeff Squyres" < [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
>> To: "Rolf Rabenseifner" < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ]
>> >
>> Cc: "mpiwg-large-counts" < [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
>> mpiwg-large-counts at lists.mpi-forum.org ] >
>> Sent: Thursday, October 24, 2019 5:27:31 PM
>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts,
>> sizes, and byte and nonbyte displacements
> 
>> On Oct 24, 2019, at 11:15 AM, Rolf Rabenseifner
>> < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] <mailto: [
>> mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] >> wrote:
>> 
>> For me, it looked like that there was some misunderstanding
>> of the concept that absolute and relative addresses
>> and number of bytes that can be stored in MPI_Aint.
>> 
>> ...with the caveat that MPI_Aint -- as it is right now -- does not support
>> modern segmented memory systems (i.e., where you need more than a small number
>> of bits to indicate the segment where the memory lives).
>> 
>> I think that changes the conversation entirely, right?
>> 
>> --
>> Jeff Squyres
>> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ] <mailto: [
>> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email [ mailto:rabenseifner at hlrs.de |
> rabenseifner at hlrs.de ] .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . [ http://www.hlrs.de/people/rabenseifner |
> www.hlrs.de/people/rabenseifner ] .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> 
> 
> --
> Jeff Hammond
> [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> 
> 
> --
> Jeff Squyres
> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ]
> 
> 
> 
> --
> Jeff Hammond
> [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
> 
> _______________________________________________
> mpiwg-large-counts mailing list
> mpiwg-large-counts at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .


More information about the mpiwg-large-counts mailing list