[Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

Rolf Rabenseifner rabenseifner at hlrs.de
Tue Nov 5 11:10:25 CST 2019


Hi Dan,

our mailers do not want to work together.
I cannot find your "lengthy comment on that issue" in this email.

As you can see below, it is flattened.

Originally , it was with huge indents, but without any ">".
Text looked then líke

                                        d
                                        d
                                        r
                                        e
                                        s
                                        s
                                        e
                                        s


In the telcon, we completely aggreed that the large count versions MPI...._l 
should keep all MPI_Aint and should add MPI_Aint where it was missing
by being wrong (MPI_Alltoall_w and MPI_(Un)pack.

We completely agreed that all the MPI_Aint discussion is done.
MPI_Aint is an signed integer mis-used to also store absoulte
adresses whatever there bits may mean. And we do not change this
nor we introduce any new such strange type. 
 
And we agreed that there should be for the few derived datatype routines 
with MPI_Aint an additional set with MPI_Count (or MPI_Offset, which would 
be wierd) instead of MPI_Aint.
These MPI_Count versions do not allow the use of absolute addresses.
These MPI_Count versions are needed for the case that sizeof(MPI_Aint) 
is less than sizeof(MPI_Count), may de 4:8 or 8.12 or 8:16,
depending on the memory size per MPI process and the filesystem size.

I only proposed in my latest email, that exactly two of them already exist:
 
   MPI_Type_get_extent_x and MPI_Type_get_true_extent_x

and we should keep them and take the postfix _x for this set of routines
instead of throwing them away and reinvernting them again.


For ... I would say, 
 - MPI_TYPE_SIZE_X
      the _l Version should have MPI_Aint size because it is a byte size.
 - MPI_GET_ELEMENTS_X and MPI_STATUS_SET_ELEMENTS_X
      in my opinion, we should keep _X, and the _l version 
      is then a duplicate of it.

Best regards
Rolf

----- Original Message -----
> From: "HOLMES Daniel" <d.holmes at epcc.ed.ac.uk>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Cc: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> Sent: Tuesday, November 5, 2019 3:18:13 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

> Hi Rolf (et al),
> 
> I wrote a lengthy comment on that issue to capture my current understanding of
> your “really wrong” assertion.
> 
> Broadly, we agree - I just wanted to write down the reasoning and nuances of
> that outcome.
> 
> Cheers,
> Dan.
>> Dr Daniel Holmes PhD
> Architect (HPC Research)
> d.holmes at epcc.ed.ac.uk<mailto:d.holmes at epcc.ed.ac.uk>
> Phone: +44 (0) 131 651 3465
> Mobile: +44 (0) 7940 524 088
> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
>> The University of Edinburgh is a charitable body, registered in Scotland, with
> registration number SC005336.
>> 
> On 2 Nov 2019, at 09:13, Rolf Rabenseifner via mpiwg-large-counts
> <mpiwg-large-counts at lists.mpi-forum.org<mailto:mpiwg-large-counts at lists.mpi-forum.org>>
> wrote:
> 
> After the Telcon it seems that this ticket is really wrong.
> Some or all of the routines may/should be kept.
> And it these routines arean essential part of the future large count concept.
> 
> Thank you very much for pointing us to this ticket.
> Rolf
> 
> ----- Jeff Hammond <jeff.science at gmail.com<mailto:jeff.science at gmail.com>>
> wrote:
> Rolf:
> 
> Have you looked at
> https://github.com/mpiwg-large-count/large-count-issues/issues/6?
> 
> Jeff
> 
> On Fri, Nov 1, 2019 at 1:00 AM Rolf Rabenseifner <rabenseifner at hlrs.de>
> wrote:
> 
> A small comment on the result of our telcon:
> - Postfix _l for int -> MPI_Count
> - Postfix _x for additionally
>  MPI_Aint -> MPI_Count
> I.e., the additional routines in the derived datatype chapter.
> 
> Two of them already exist
> MPI_Type_get_(true)extent_x
> 
> In Fortran we will have then for the
> same routine two aliases:
> 
>  - the overload one without _x
>  - and the explicit one with _x
> 
> For both ones, the internal function name is the same, with _x.
> 
> Best regards
> Rolf
> 
> 
> ----- Jeff Hammond <jeff.science at gmail.com> wrote:
> On Thu, Oct 31, 2019 at 7:48 AM Rolf Rabenseifner <rabenseifner at hlrs.de>
> wrote:
> 
> Dear all,
> 
> here my summary as input for our telcon today.
> 
> In principle, it is a very simple question:
> 
> with large Counts, do we
> - keep all MPI_Aint
> - or do we substitute MPI_Aint by MPI_Count?
> 
> 
> I haven't been involved as much lately but did we not use MPI_Count for
> count and element displacements in the large count proposal?  We need to
> use MPI_Aint for offsets into memory because that is what this type is
> for.
> 
> Jeff
> 
> 
> 
> In principle, the MPI Forum answered this question already
> for MPI-3.0 in 2012 with a clear YES:
> 
> int MPI_Type_get_extent(MPI_Datatype datatype,
>     MPI_Aint *lb,  MPI_Aint *extent)
> int MPI_Type_get_extent_x(MPI_Datatype datatype,
>     MPI_Count *lb, MPI_Count *extent)
> 
> About Jeff H. question:
> If we limit the API to not support MPI_Count
> means that an MPI implementation has not really such quality options
> when using I/O fileviews, because the API is restricted to
> MPI_Aint (which should be implemented based on the, e.g.,
> 64bit memory system).
> 
> About Jim's comment:
> 
> Apologies, it's been a while since I looked at the I/O interfaces.
> If
> I/O
> only needs relative displacements that have normal integer
> semantics,
> then
> I don't see why MPI_Count would not work for this purpose. If you
> have
> an
> MPI_Aint that contains a relative displacement, it also has normal
> integer
> semantics and can be converted to an MPI_Count.
> 
> Yes, but this automatically implies that the datatypes must also
> be able to handle MPI_Count.
> 
> The only case we really
> need to look out for is when an integer type contains an absolute
> address.
> In those cases, the quantity in the variable cannot be treated as a
> normal
> integer and we need special routines to work with it.
> 
> Yes, this happens when we extend MPI_Aint in the derived datatype
> routines
> to MPI_Count.
> 
> But in principle, this is not a big Problem, as you all could see in
> the previous emails:
> 
> - We must do for MPI_Count the same as we did for MPI_Aint,
> i.e., we'll have long versions of the routines
>  MPI_Get_address, MPI_Aint_diff, MPI_Aint_add
> 
> - And we must ensure that the type cast from MPI_Aint to
> MPI_Count works, which is a small new advice to implementors
> for MPI_Det_address.
> 
> Therefore again my 4 questions:
> 
> - Should the new large count routines be prepared for
> more than 10 or 20 Exabyte files where we need 64/65 or
> or 65/66 unsigned/signed integers for relative byte
> displacements or byte counts?
> If yes, then all MPI_Aint arguments must be substituted by MPI_Count.
> 
> (In other words, do we want to be prepared for another 25 years of
> MPI?
> :-)
> 
> As stated above, the MPI-Forum already decided 2012 with a YES.
> 
> - Should we allow that these new routines are also used for memory
> description,
> where we typically need only the large MPI_Count "count" arguments?
> (or should we provide two different new routines for each routine
> that
>  currently has int Count/... and MPI_Aint disp/... arguments)
> 
> I expect, that nobody wants to have two different large versions of
> for example MPI_Type_create_struct.
> 
> - Should we allow a mix of old and new routines, especially for
> memory-based
> usage, that old-style MPI_Get_address is used to retrieve an absolute
> address and then, e.g., new style MPI_Type_create_struct with
> MPI_Count blocklength and displacements is used?
> 
> I expect that forbidding such a mix would be a problem for Software
> development.
> Often old-style modules must work together with new-style modules.
> 
> - Do we want to require for this type cast of MPI_Aint addr into
> MPI_Count
> that it is allowed to do this cast with a normal assignment, rather
> than
> a special MPI function?
> 
> I expect yes, because for must usage of MPI_Aint and MPI_Count,
> it is for relative displacements or byte counts, i.e. for normal
> integers and therefore automatic type cast between MPI_Aint
> and MPI_Count is a must.
> 
> With yes to all four questions, the proposed solution above is
> the easiest way.
> 
> Hope to see/hear you today in our telcon.
> 
> Best regards
> Rolf
> 
> 
> ----- Original Message -----
> From: "Jeff Hammond" <jeff.science at gmail.com>
> To: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> Cc: "Rolf Rabenseifner" <rabenseifner at hlrs.de>, "Jim Dinan" <
> james.dinan at gmail.com>
> Sent: Thursday, October 31, 2019 5:58:30 AM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
> counts, sizes, and byte and nonbyte displacements
> 
> What if we just decided not to support IO displacements bigger than
> 2^63?
> What use case would that break?  If the underlying filesystem uses
> 128b
> displacements, fine, then MPI will promote into those before using
> the
> system APIs.
> 
> We already limit all sorts of things.  For example, posting 17
> billion
> Isends is not guaranteed to work.  Maybe it does, but that's a
> quality of
> implementation issue.  No sane person is going to have a data type
> spanning
> 8 exabyte increments.  Not now, not in 2030, not in 2040, not ever.
> 
> Jeff
> 
> On Wed, Oct 30, 2019 at 9:10 AM Jim Dinan via mpiwg-large-counts <
> mpiwg-large-counts at lists.mpi-forum.org> wrote:
> 
> Apologies, it's been a while since I looked at the I/O interfaces.
> If
> I/O
> only needs relative displacements that have normal integer
> semantics,
> then
> I don't see why MPI_Count would not work for this purpose.  If you
> have
> an
> MPI_Aint that contains a relative displacement, it also has normal
> integer
> semantics and can be converted to an MPI_Count.  The only case we
> really
> need to look out for is when an integer type contains an absolute
> address.
> In those cases, the quantity in the variable cannot be treated as a
> normal
> integer and we need special routines to work with it.  If MPI never
> treats
> an MPI_Count quantity as an absolute address then MPI_Count should
> always
> have normal integer semantics via the MPI interfaces and doesn't
> need
> special treatment.  Unless, of course, we want to enable MPI_Count
> that
> is
> large enough to need special support for basic operations, but
> that's a
> different can of worms.
> 
> ~Jim.
> 
> On Wed, Oct 30, 2019 at 11:02 AM Rolf Rabenseifner <
> rabenseifner at hlrs.de>
> wrote:
> 
> Dear Jim,
> 
> This sounds to me like it is creating again the same problem we
> have
> with
> MPI_Aint --- one type doing too many things.  If MPI_Aint can't
> accommodate
> absolute addresses in the I/O interfaces,
> 
> I/O has no absolute addresses. Only relative one, i.e., byte
> displacements
> and byte sizes.
> But they can be huge.
> 
> The same routines are used for message passing, for example
> - MPI_TYPE_CREATE_STRUCT or
> - MPI_TYPE_CREATE_RESIZED
> 
> we should consider adding a new
> type like MPI_Faint (file address int) for this quantity and
> include
> accessor routines to ensure manipulations of file addresses
> respect
> the
> implementation defined meaning of the bits.
> 
> Yes, you are right, there are two possibilities:
> Substitute MPI_Aint in the large count version by
> - MPI_Count or
> - or by a new type MPI_Laint (for Long Aint)
> 
> Others on this list have already expressed that they never want to
> see
> such a MPI_Laint
> 
> Even in C, it is not portable
> to do arithmetic on intptr_t because the integer representation
> of an
> address is implementation defined.  We were careful in the
> definition of
> MPI_Aint_add and diff to describe them in terms of casting the
> absolute
> address arguments back to pointers before performing arithmetic.
> 
> Yes, therefore, for this longer Version of MPI_Aint, let's name it
> for the Moment XXX, we Need
> MPI_XXX_diff and MPI_XXX_add,
> i.e. MPI_Laint_diff and _add or MPI_Count_diff and _add,
> which should be used only if the corresponding addresses
> are returned from MPI_Get_address_l.
> Or form MPI_Get_address, and with this we have again the
> type casting problem between MPI_Aint and MPI_Count or MPI_Laint.
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
> From: "Jim Dinan" <james.dinan at gmail.com>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Cc: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org
> 
> Sent: Wednesday, October 30, 2019 3:45:01 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles
> for
> counts, sizes, and byte and nonbyte displacements
> 
> This sounds to me like it is creating again the same problem we
> have
> with
> MPI_Aint --- one type doing too many things.  If MPI_Aint can't
> accommodate
> absolute addresses in the I/O interfaces, we should consider
> adding a
> new
> type like MPI_Faint (file address int) for this quantity and
> include
> accessor routines to ensure manipulations of file addresses
> respect
> the
> implementation defined meaning of the bits.  Even in C, it is not
> portable
> to do arithmetic on intptr_t because the integer representation
> of an
> address is implementation defined.  We were careful in the
> definition of
> MPI_Aint_add and diff to describe them in terms of casting the
> absolute
> address arguments back to pointers before performing arithmetic.
> 
> ~Jim.
> 
> On Wed, Oct 30, 2019 at 5:18 AM Rolf Rabenseifner <
> rabenseifner at hlrs.de
> 
> wrote:
> 
> Dear all and Jim,
> 
> Jim asked:
> When you assign an MPI_Aint to an MPI_Count, there are two
> cases
> depending
> on what the bits in the MPI_Aint represent: absolute address
> and
> relative
> displacements.  The case where you assign an address to a
> count
> doesn't
> make sense to me.  Why would one do this and why should MPI
> support
> it?
> The case where you assign a displacement to a count seems
> fine,
> you
> would
> want sign extension to happen.
> 
> The answer is very simple:
> All derived datatype routines serve describing of memory **and**
> file
> space.
> 
> Therefore, the large count working group should decide:
> - Should the new large count routines be prepared for more than
> 10
> or
> 20
> Exabyte
> files where we need 64/65 or 65/66 unsigned/signed integers
> for
> relative
> byte
> displacements or byte counts?
> If yes, then all MPI_Aint arguments must be substituted by
> MPI_Count.
> (In other words, do we want to be prepared for another 25
> years of
> MPI?
> :-)
> - Should we allow that these new routines are also used for
> memory
> description,
> where we typically need only the large MPI_Count "count"
> arguments?
> (or should we provide two different new routines for each
> routine
> that
>  currently has int Count/... and MPI_Aint disp/... arguments)
> - Should we allow a mix of old and new routines, especially for
> memory-based
> usage, that old-style MPI_Get_address is used to retrieve an
> absolute
> address and then, e.g., new style MPI_Type_create_struct with
> MPI_Count blocklength and displacements is used?
> - Do we want to require for this type cast of MPI_Aint addr into
> MPI_Count
> that it is allowed to do this cast with a normal assignment,
> rather
> than
> 
> a special MPI function?
> 
> If we answer all four questions with yes (and in my opinion, we
> must)
> then Jim's question
> "Why would one do this [assign an address to a Count]
> and why should MPI support it?"
> is answered with this set of reasons.
> 
> I would say, that this is the most complex decision that the
> large count working group has to decide.
> A wrong decision would be hard to be fixed in the future.
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
> From: "Jim Dinan" <james.dinan at gmail.com>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Cc: "mpiwg-large-counts" <
> mpiwg-large-counts at lists.mpi-forum.org>
> Sent: Tuesday, October 29, 2019 10:28:46 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the
> principles for
> counts, sizes, and byte and nonbyte displacements
> 
> If you do pointer arithmetic, the compiler will ensure that
> the
> result is
> correct.  If you convert a pointer into an integer and then
> do the
> arithmetic, the compiler can't help you and the result is not
> portable.
> This is why MPI_Aint_add describes what it does in terms of
> pointer
> arithmetic.  The confusing and frustrating thing about
> MPI_Aint is
> that
> it's one type for two very different purposes.  Allowing
> direct
> +/-
> on
> MPI_Aint values that represent addresses is not portable and
> is a
> mistake
> that we tried to correct with MPI_Aint_add/diff (I am happy to
> strengthen
> should to must if needed).  It's perfectly fine to do
> arithmetic
> on
> MPI_Aint values that are displacements.
> 
> When you assign an MPI_Aint to an MPI_Count, there are two
> cases
> depending
> on what the bits in the MPI_Aint represent: absolute address
> and
> relative
> displacements.  The case where you assign an address to a
> count
> doesn't
> make sense to me.  Why would one do this and why should MPI
> support
> it?
> The case where you assign a displacement to a count seems
> fine,
> you
> would
> want sign extension to happen.
> 
> ~Jim.
> 
> On Tue, Oct 29, 2019 at 4:52 PM Rolf Rabenseifner <
> rabenseifner at hlrs.de>
> wrote:
> 
> Dear Jim,
> 
> (a3) Section 4.1.5 of MPI 3.1 states "To ensure
> portability,
> arithmetic
> on
> absolute addresses should not be performed with the
> intrinsic
> operators
> \-"
> and \+".
> 
> The major problem is, that we decided "should" and not
> "maust" or
> "shall",
> because there is such many existing MPI-1 ... MPI-3.0 code
> that
> must
> have
> used + or - operators.
> 
> The only objective, that is true from the beginning, that MPI
> addresses
> must be
> retrieved with MPI_Get_address.
> 
> And the second also Major Problem is the new assigment of an
> MPI_Aint
> value
> into an MPI_Count variable with MPI_Count larger than
> MPI_Aint.
> 
> Therefore, I would prefere, that we keep this "should" and
> design in
> long
> term
> MPI_Get_address in a way that in principle MPI_Aint_diff and
> _add
> need not to do anythin else as the + or - operator.
> 
> And this depends on the meaning of the unsigned addresses,
> i.e.,
> what is the sequence of addresses (i.e., is it really going
> from
> 0 to FFFF...FFFF) and than mapping these addreses to the
> mathematical
> sequence
> of MPI_Aint which starts at -2**(n-1) and ends at 2**(n-1)-1.
> 
> Thats all. For the moment, as far as the web and some emails
> told
> us,
> we are fare away from this contiguous 64-bit address space
> (0 to
> FFFF...FFFF).
> 
> But we should be correctly prepared.
> 
> Or in other words:
> (a2) Should be solved by MPI_Aint_add/diff.
> In my opinion no, it must be solved by MPI_Get_addr
> and MPI_Aint_add/diff can stay normal + or - operators.
> 
> I should also mention, that of course all MPI routines that
> accept MPI_BOOTOM must reverse the work of MPI_Get_address
> to get back the real "unsigned" virtual addresses of the OS.
> 
> The same what we already had if an implementation has chosen
> to use the address of an MPI common block as base for
> MPI_BOTTOM.
> Here, the MPI lib had the freedom to revert the mapping
> within MPI_Get_addr or within all functions called with
> MPI_BOTTOM.
> 
> Best regards
> Rolf
> 
> 
> 
> ----- Original Message -----
> From: "Jim Dinan" <james.dinan at gmail.com>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
> Cc: "mpiwg-large-counts" <
> mpiwg-large-counts at lists.mpi-forum.org>
> Sent: Tuesday, October 29, 2019 3:58:18 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the
> principles
> for
> counts, sizes, and byte and nonbyte displacements
> 
> Hi Rolf,
> 
> (a1) seems to me like another artifact of storing an
> unsigned
> quantity
> in a
> signed variable, i.e., the quantity in an MPI_Aint can be
> an
> unsigned
> address or a signed displacement.  Since we don't have an
> unsigned
> type
> for
> addresses, the user can't portably fix this above MPI.  We
> will
> need
> to
> add
> functions to deal with combinations of MPI_Aint and
> MPI_Counts.
> This
> is
> essentially why we needed MPI_Aint_add/diff.  Or ... the
> golden
> (Au is
> gold) int ... MPI_Auint.
> 
> (a2) Should be solved by MPI_Aint_add/diff.
> 
> (a3) Section 4.1.5 of MPI 3.1 states "To ensure
> portability,
> arithmetic
> on
> absolute addresses should not be performed with the
> intrinsic
> operators
> \-"
> and \+".  MPI_Aint_add was written carefully to indicate
> that
> the
> "base"
> argument is treated as an unsigned address and the "disp"
> argument is
> treated as a signed displacement.
> 
> ~Jim.
> 
> On Tue, Oct 29, 2019 at 5:19 AM Rolf Rabenseifner <
> rabenseifner at hlrs.de>
> wrote:
> 
> Dear Jim and all,
> 
> I'm not sure whether I'm really able to understand your
> email.
> 
> I take the MPI view:
> 
> (1) An absolute address can stored in an MPI_Aint variable
>   with and only with MPI_Get_address or MPI_Aint_add.
> 
> (2) A positive or negative number of bytes or a relative
> address
>   which is by definition the amount of bytes between two
> locations
>   in a MPI "sequential storage" (MPI-3.1 page 115)
>   can be assigned with any method to an MPI_Aint
> variable
>   as long as the original value fits into MPI_Aint.
>   In both languages automatic type cast (i.e., sign
> expansion)
>   is done.
> 
> (3) If users misuse MPI_Aint for storing anything else
> into
> MPI_Aint
>   variable then this is out of scope of MPI.
>   If such values are used in a minus operation then it
> is
>   out of the scope of MPI whether this makes sense.
>   If the user is sure that the new value falls into
> category
> (2)
>   then all is fine as long as the user is correct.
> 
> I expect that your => is not a "greater or equal than".
> I expect that you noticed assignments.
> 
> intptr_t => MPI_Aint
> "intptr_t:  integer type capable of holding a pointer."
> 
> uintptr_t => ??? (Anyone remember the MPI_Auint "golden
> Aint"
> proposal?)
> "uintptr_t:  unsigned integer type capable of holding a
> pointer."
> 
> may fall exactly exactly into (3) when used for pointers.
> 
> 
> Especially on a 64 bit system the user may have in the
> future
> exactly
> the problems (a), (a1), (a2) and (b) as described below.
> But here, the user is responsible, to for example
> implement
> (a3),
> whereas for MPI_Get_address, the implementors of the MPI
> library
> are responsible and the MPI Forum may be responsible for
> giving
> the correct advices.
> 
> By the way, the golden MPI_Auint was never golden.
> Such need was "resolved" by introducing MPI_Aint_diff and
> MPI_Aint_add
> in MPI-3.1.
> 
> 
> ptrdiff_t => MPI_Aint
> "std::ptrdiff_t is the signed integer type of the result
> of
> subtracting
> two pointers."
> 
> may perfectly fit to (2).
> 
> All of the following falls into category (2):
> 
> size_t (sizeof) => MPI_Count, int
> "sizeof( type )  (1)
> sizeof expression   (2)
> Both versions are constant expressions of type
> std::size_t."
> 
> size_t (offsetof) => MPI_Aint, int
> "Defined in header <cstddef>
> #define offsetof(type, member) /*implementation-defined*/
> The macro offsetof expands to an integral constant
> expression
> of type std::size_t, the value of which is the offset, in
> bytes,
> from the beginning of an object of specified type to ist
> specified member, including padding if any."
> 
> Note that this offsetof has nothing to do with MPI_Offset.
> 
> On a system with less than 2*31 byte and 4-byte int, it is
> guaranteed
> that  size_t => int  works.
> 
> On a system with less than 2*63 byte and 8-byte MPI_Aint,
> it
> is
> guaranteed
> that  size_t => MPI_Aint  works.
> 
> Problem: size_t is unsigned, int and MPI_Aint are signed.
> 
> MPI_Count should be defined in a way that on systems with
> more
> than
> 2**63 Bytes of disc space, that MPI_Count can hold such
> values,
> because
> int .LE. {MPI_Aint, MPI_Offset} .LE. MPI_Count
> 
> Therefore  size_t => MPI_Count  should always work.
> 
> ssize_t => Mostly for error handling. Out of scope for
> MPI?
> "In short, ssize_t is the same as size_t, but is a signed
> type -
> read ssize_t as “signed size_t”. ssize_t is able to
> represent
> the number -1, which is returned by several system calls
> and library functions as a way to indicate error.
> For example, the read and write system calls: ...
> ssize_t read(int fildes, void *buf, size_t nbyte); ..."
> 
> ssize_t fits therefore better to MPI_Aint, because both
> are signed types that can hold byte counts, but
> the value -1 in a MPI_Aint variable stands for a
> byte displacement of -1 bytes and not for an error code
> -1.
> 
> 
> All use of (2) is in principle no problem.
> ------------------------------------------
> 
> All the complex discussiuon of the last days is about (1):
> 
> (1) An absolute address can stored in an MPI_Aint variable
>   with and only with MPI_Get_address or MPI_Aint_add.
> 
> In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be
> not
> portable),
> we also allow
> MPI_Aint variable := absolute address in MPI_Aint
> variable
>                      + or -
>                     a number of bytes (in any integer
> type).
> 
> The result is then still in category (1).
> 
> 
> For the difference of two absolute addresses,
> MPI_Aint_diff can be used. The result is than MPI_Aint of
> category
> (2)
> 
> In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be
> not
> portable),
> we also allow
> MPI_Aint variable := absolute address in MPI_Aint
> variable
>                     - absolute address in MPI_Aint
> variable.
> 
> The result is then in category (2).
> 
> 
> The problems we discuss the last days are about systems
> that internally use unsigned addresses and the MPI library
> stores
> these addresses into MPI_Aint variables and
> 
> (a) a sequential storage can have virtual addresses that
>   are both in the area with highest bit =0 and other
> addresses
>   in the same sequential storage (i.e., same array or
> structure)
>   with highest bit =1.
> 
> or
> (b) some higher bits contain segment addresses.
> 
> (b) is not a problem as long as a sequential storage
> resides
>   always within one Segment.
> 
> Therefore, we only have to discuss (a).
> 
> The two problems that we have is
> (a1) that for the minus operations an integer overflow
> will
>    happen and must be ignored.
> (a2) if such addresses are expanded to larger variables,
>    e.g., MPI_Count with more bits in MPI_Count than in
> MPI_Aint,
>    sign expansion will result in completely wring
> results.
> 
> And here, the most simple trick is,
> (a3) that MPI_Get_address really shall
> map the contiguous unsigned range from 0 to 2**64-1 to the
> signed (and also contiguous) range from -2**63 to 2**63-1
> by simple subtracting 2**63.
> With this simple trick in MPI_Get_address, Problems
> 8a1) and (a2) are resolved.
> 
> It looks like that (a) and therefore (a1) and (a2)
> may be far in the future.
> But they may be less far in the future, if a system may
> map the whole applications cluster address space
> into virtual memory (not cache coherent, but accessible).
> 
> 
> And all this is never or only partial written into the
> MPI Standard, also all is (well) known by the MPI Forum,
> with the following exceptions:
> - (a2) is new.
> - (a1) is solved in MPI-3.1 only for MPI_Aint_diff and
>      MPI_Aint_add, but not for the operators - and +
>      if a user will switch on integer overflow detection
>      in the future when we will have such large systems.
> - (a3) is new and in principle solves the problem also
>      for + and - operators.
> 
> At lease (a1)+(a2) should be added as rationale to MPI-4.0
> and (a3) as advice to implementors within the framework
> of big count, because (a2) is newly coming with big count.
> 
> I hope this helps a bit if you took the time to read
> this long email.
> 
> Best regards
> Rolf
> 
> 
> 
> ----- Original Message -----
> From: "mpiwg-large-counts" <
> mpiwg-large-counts at lists.mpi-forum.org
> 
> To: "mpiwg-large-counts" <
> mpiwg-large-counts at lists.mpi-forum.org>
> Cc: "Jim Dinan" <james.dinan at gmail.com>, "James Dinan"
> <
> james.dinan at intel.com>
> Sent: Monday, October 28, 2019 5:07:37 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the
> principles
> for
> counts, sizes, and byte and nonbyte displacements
> 
> Still not sure I see the issue. MPI's memory-related
> integers
> should
> map
> to
> types that serve the same function in C. If the base
> language
> is
> broken
> for
> segmented addressing, we won't be able to fix it in a
> library.
> Looking
> at the
> mapping below, I don't see where we would have broken
> it:
> 
> intptr_t => MPI_Aint
> uintptr_t => ??? (Anyone remember the MPI_Auint "golden
> Aint"
> proposal?)
> ptrdiff_t => MPI_Aint
> size_t (sizeof) => MPI_Count, int
> size_t (offsetof) => MPI_Aint, int
> ssize_t => Mostly for error handling. Out of scope for
> MPI?
> 
> It sounds like there are some places where we used
> MPI_Aint
> in
> place
> of
> size_t
> for sizes. Not great, but MPI_Aint already needs to be
> at
> least as
> large
> as
> size_t, so this seems benign.
> 
> ~Jim.
> 
> On Fri, Oct 25, 2019 at 8:25 PM Dinan, James via
> mpiwg-large-counts <
> [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> 
> 
> 
> 
> 
> Jeff, thanks so much for opening up these old wounds.
> I’m
> not
> sure
> I
> have enough
> context to contribute to the discussion. Where can I
> read up
> on the
> issue with
> MPI_Aint?
> 
> 
> 
> I’m glad to hear that C signed integers will finally
> have a
> well-defined
> representation.
> 
> 
> 
> ~Jim.
> 
> 
> 
> 
> From: Jeff Hammond < [ mailto:jeff.science at gmail.com |
> jeff.science at gmail.com ]
> 
> Date: Thursday, October 24, 2019 at 7:03 PM
> To: "Jeff Squyres (jsquyres)" < [ mailto:
> jsquyres at cisco.com
>|
> jsquyres at cisco.com
> ] >
> Cc: MPI BigCount Working Group < [ mailto:
> mpiwg-large-counts at lists.mpi-forum.org
>| mpiwg-large-counts at lists.mpi-forum.org ] >, "Dinan,
> James"
> < [
> mailto:james.dinan at intel.com | james.dinan at intel.com ]
> 
> Subject: Re: [Mpiwg-large-counts] Large Count - the
> principles
> for
> counts,
> sizes, and byte and nonbyte displacements
> 
> 
> 
> 
> 
> Jim (cc) suffered the most in MPI 3.0 days because of
> AINT_DIFF and
> AINT_SUM, so
> maybe he wants to create this ticket.
> 
> 
> 
> 
> 
> Jeff
> 
> 
> 
> 
> 
> On Thu, Oct 24, 2019 at 2:41 PM Jeff Squyres (jsquyres)
> < [
> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
> wrote:
> 
> 
> 
> 
> 
> Not opposed to ditching segmented addressing at all.
> We'd
> need
> a
> ticket
> for this
> ASAP, though.
> 
> 
> 
> 
> 
> This whole conversation is predicated on:
> 
> 
> 
> 
> 
> - MPI supposedly supports segmented addressing
> 
> 
> - MPI_Aint is not sufficient for modern segmented
> addressing
> (i.e.,
> representing
> an address that may not be in main RAM and is not
> mapped in
> to
> the
> current
> process' linear address space)
> 
> 
> 
> 
> 
> If we no longer care about segmented addressing, that
> makes
> a
> whole
> bunch of
> BigCount stuff a LOT easier. E.g., MPI_Aint can
> basically
> be a
> non-segment-supporting address integer. AINT_DIFF and
> AINT_SUM
> can
> go
> away,
> too.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Oct 24, 2019, at 5:35 PM, Jeff Hammond via
> mpiwg-large-counts <
> [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> 
> 
> 
> 
> 
> Rolf:
> 
> 
> 
> Before anybody spends any time analyzing how we handle
> segmented
> addressing, I
> want you to provide an example of a platform where this
> is
> relevant.
> What
> system can you boot today that needs this and what MPI
> libraries
> have
> expressed
> an interest in supporting it?
> 
> 
> 
> 
> 
> For anyone who didn't hear, ISO C and C++ have finally
> committed to
> twos-complement integers ( [
> 
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html
>|
> 
> 
> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html
> ]
> , [
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm |
> 
> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm
> ] )
> because
> modern
> programmers should not be limited by hardware designs
> from
> the
> 1960s.
> We
> should
> similarly not waste our time on obsolete features like
> segmentation.
> 
> 
> 
> 
> 
> Jeff
> 
> 
> 
> 
> 
> On Thu, Oct 24, 2019 at 10:13 AM Rolf Rabenseifner via
> mpiwg-large-counts < [
> mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> 
> 
> 
> 
> I think that changes the conversation entirely, right?
> 
> Not the first part, the state-of-current-MPI.
> 
> It may change something for the future, or a new
> interface
> may
> be
> needed.
> 
> Please, can you describe how MPI_Get_address can work
> with
> the
> different variables from different memory segments.
> 
> Or whether a completely new function or a set of
> functions
> is
> needed.
> 
> If we can still express variables from all memory
> segments
> as
> input to MPI_Get_address, there may be still a way to
> flatten
> the result of some internal address-iquiry into a
> flattened
> signed integer with the same behavior as MPI_Aint today.
> 
> If this is impossible, then new way of thinking and
> solution
> may be needed.
> 
> I really want to see examples for all current stuff as
> you
> mentioned in your last email.
> 
> Best regards
> Rolf
> 
> ----- Original Message -----
> From: "Jeff Squyres" < [ mailto:jsquyres at cisco.com |
> jsquyres at cisco.com
> ] >
> To: "Rolf Rabenseifner" < [ mailto:
> rabenseifner at hlrs.de |
> rabenseifner at hlrs.de ]
> 
> Cc: "mpiwg-large-counts" < [ mailto:
> mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ] >
> Sent: Thursday, October 24, 2019 5:27:31 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the
> principles for
> counts,
> sizes, and byte and nonbyte displacements
> 
> On Oct 24, 2019, at 11:15 AM, Rolf Rabenseifner
> < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de
> ]
> <mailto:
> [
> mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ]
> 
> wrote:
> 
> For me, it looked like that there was some
> misunderstanding
> of the concept that absolute and relative addresses
> and number of bytes that can be stored in MPI_Aint.
> 
> ...with the caveat that MPI_Aint -- as it is right now
> --
> does not
> support
> modern segmented memory systems (i.e., where you need
> more
> than a
> small
> number
> of bits to indicate the segment where the memory
> lives).
> 
> I think that changes the conversation entirely, right?
> 
> --
> Jeff Squyres
> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ]
> <mailto: [
> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email [
> mailto:
> rabenseifner at hlrs.de |
> rabenseifner at hlrs.de ] .
> High Performance Computing Center (HLRS) . phone
> ++49(0)711/685-65530
> .
> University of Stuttgart . . . . . . . . .. fax
> ++49(0)711 /
> 685-65832
> .
> Head of Dpmt Parallel Computing . . . [
> http://www.hlrs.de/people/rabenseifner |
> www.hlrs.de/people/rabenseifner ] .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . .
> (Office:
> Room
> 1.307)
> .
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
>|
> 
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> Jeff Hammond
> [ mailto:jeff.science at gmail.com |
> jeff.science at gmail.com ]
> [ http://jeffhammond.github.io/ |
> http://jeffhammond.github.io/ ]
> 
> 
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
>|
> 
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> 
> 
> 
> 
> 
> 
> --
> Jeff Squyres
> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ]
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> 
> Jeff Hammond
> [ mailto:jeff.science at gmail.com |
> jeff.science at gmail.com ]
> [ http://jeffhammond.github.io/ |
> http://jeffhammond.github.io/ ]
> _______________________________________________
> mpiwg-large-counts mailing list
> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> mpiwg-large-counts at lists.mpi-forum.org ]
> [
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
>|
> 
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> 
> _______________________________________________
> mpiwg-large-counts mailing list
> mpiwg-large-counts at lists.mpi-forum.org
> 
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> rabenseifner at hlrs.de .
> High Performance Computing Center (HLRS) . phone
> ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711
> /
> 685-65832 .
> Head of Dpmt Parallel Computing . . .
> www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office:
> Room
> 1.307) .
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> rabenseifner at hlrs.de .
> High Performance Computing Center (HLRS) . phone
> ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> 685-65832 .
> Head of Dpmt Parallel Computing . . .
> www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office:
> Room
> 1.307) .
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> rabenseifner at hlrs.de
> .
> High Performance Computing Center (HLRS) . phone
> ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> 685-65832 .
> Head of Dpmt Parallel Computing . . .
> www.hlrs.de/people/rabenseifner
> .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> 1.307) .
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> rabenseifner at hlrs.de
> .
> High Performance Computing Center (HLRS) . phone
> ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 /
> 685-65832 .
> Head of Dpmt Parallel Computing . . .
> www.hlrs.de/people/rabenseifner
> .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room
> 1.307) .
> 
> _______________________________________________
> mpiwg-large-counts mailing list
> mpiwg-large-counts at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
> .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
> .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> 
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> 
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email
> rabenseifner at hlrs.de<mailto:rabenseifner at hlrs.de> .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . .
> www.hlrs.de/people/rabenseifner<http://www.hlrs.de/people/rabenseifner> .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> _______________________________________________
> mpiwg-large-counts mailing list
> mpiwg-large-counts at lists.mpi-forum.org<mailto:mpiwg-large-counts at lists.mpi-forum.org>
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .




More information about the mpiwg-large-counts mailing list