[Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

Jim Dinan james.dinan at gmail.com
Tue Oct 29 09:58:18 CDT 2019


Hi Rolf,

(a1) seems to me like another artifact of storing an unsigned quantity in a
signed variable, i.e., the quantity in an MPI_Aint can be an unsigned
address or a signed displacement.  Since we don't have an unsigned type for
addresses, the user can't portably fix this above MPI.  We will need to add
functions to deal with combinations of MPI_Aint and MPI_Counts.  This is
essentially why we needed MPI_Aint_add/diff.  Or ... the golden (Au is
gold) int ... MPI_Auint.

(a2) Should be solved by MPI_Aint_add/diff.

(a3) Section 4.1.5 of MPI 3.1 states "To ensure portability, arithmetic on
absolute addresses should not be performed with the intrinsic operators \-"
and \+".  MPI_Aint_add was written carefully to indicate that the "base"
argument is treated as an unsigned address and the "disp" argument is
treated as a signed displacement.

 ~Jim.

On Tue, Oct 29, 2019 at 5:19 AM Rolf Rabenseifner <rabenseifner at hlrs.de>
wrote:

> Dear Jim and all,
>
> I'm not sure whether I'm really able to understand your email.
>
> I take the MPI view:
>
> (1) An absolute address can stored in an MPI_Aint variable
>     with and only with MPI_Get_address or MPI_Aint_add.
>
> (2) A positive or negative number of bytes or a relative address
>     which is by definition the amount of bytes between two locations
>     in a MPI "sequential storage" (MPI-3.1 page 115)
>     can be assigned with any method to an MPI_Aint variable
>     as long as the original value fits into MPI_Aint.
>     In both languages automatic type cast (i.e., sign expansion)
>     is done.
>
> (3) If users misuse MPI_Aint for storing anything else into MPI_Aint
>     variable then this is out of scope of MPI.
>     If such values are used in a minus operation then it is
>     out of the scope of MPI whether this makes sense.
>     If the user is sure that the new value falls into category (2)
>     then all is fine as long as the user is correct.
>
> I expect that your => is not a "greater or equal than".
> I expect that you noticed assignments.
>
> > intptr_t => MPI_Aint
> "intptr_t:  integer type capable of holding a pointer."
>
> > uintptr_t => ??? (Anyone remember the MPI_Auint "golden Aint" proposal?)
> "uintptr_t:  unsigned integer type capable of holding a pointer."
>
> may fall exactly exactly into (3) when used for pointers.
>
>
> Especially on a 64 bit system the user may have in the future exactly
> the problems (a), (a1), (a2) and (b) as described below.
> But here, the user is responsible, to for example implement (a3),
> whereas for MPI_Get_address, the implementors of the MPI library
> are responsible and the MPI Forum may be responsible for giving
> the correct advices.
>
> By the way, the golden MPI_Auint was never golden.
> Such need was "resolved" by introducing MPI_Aint_diff and MPI_Aint_add
> in MPI-3.1.
>
>
> > ptrdiff_t => MPI_Aint
> "std::ptrdiff_t is the signed integer type of the result of subtracting
> two pointers."
>
> may perfectly fit to (2).
>
> All of the following falls into category (2):
>
> > size_t (sizeof) => MPI_Count, int
> "sizeof( type )  (1)
>  sizeof expression   (2)
>  Both versions are constant expressions of type std::size_t."
>
> > size_t (offsetof) => MPI_Aint, int
> "Defined in header <cstddef>
>  #define offsetof(type, member) /*implementation-defined*/
>  The macro offsetof expands to an integral constant expression
>  of type std::size_t, the value of which is the offset, in bytes,
>  from the beginning of an object of specified type to ist
>  specified member, including padding if any."
>
> Note that this offsetof has nothing to do with MPI_Offset.
>
> On a system with less than 2*31 byte and 4-byte int, it is guaranteed
> that  size_t => int  works.
>
> On a system with less than 2*63 byte and 8-byte MPI_Aint, it is guaranteed
> that  size_t => MPI_Aint  works.
>
> Problem: size_t is unsigned, int and MPI_Aint are signed.
>
> MPI_Count should be defined in a way that on systems with more than
> 2**63 Bytes of disc space, that MPI_Count can hold such values,
> because
>   int .LE. {MPI_Aint, MPI_Offset} .LE. MPI_Count
>
> Therefore  size_t => MPI_Count  should always work.
>
> > ssize_t => Mostly for error handling. Out of scope for MPI?
> "In short, ssize_t is the same as size_t, but is a signed type -
>  read ssize_t as “signed size_t”. ssize_t is able to represent
>  the number -1, which is returned by several system calls
>  and library functions as a way to indicate error.
>  For example, the read and write system calls: ...
>  ssize_t read(int fildes, void *buf, size_t nbyte); ..."
>
> ssize_t fits therefore better to MPI_Aint, because both
> are signed types that can hold byte counts, but
> the value -1 in a MPI_Aint variable stands for a
> byte displacement of -1 bytes and not for an error code -1.
>
>
> All use of (2) is in principle no problem.
> ------------------------------------------
>
> All the complex discussiuon of the last days is about (1):
>
> (1) An absolute address can stored in an MPI_Aint variable
>     with and only with MPI_Get_address or MPI_Aint_add.
>
> In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be not portable),
> we also allow
>  MPI_Aint variable := absolute address in MPI_Aint variable
>                        + or -
>                       a number of bytes (in any integer type).
>
> The result is then still in category (1).
>
>
> For the difference of two absolute addresses,
> MPI_Aint_diff can be used. The result is than MPI_Aint of category (2)
>
> In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be not portable),
> we also allow
>  MPI_Aint variable := absolute address in MPI_Aint variable
>                       - absolute address in MPI_Aint variable.
>
> The result is then in category (2).
>
>
> The problems we discuss the last days are about systems
> that internally use unsigned addresses and the MPI library stores
> these addresses into MPI_Aint variables and
>
> (a) a sequential storage can have virtual addresses that
>     are both in the area with highest bit =0 and other addresses
>     in the same sequential storage (i.e., same array or structure)
>     with highest bit =1.
>
> or
> (b) some higher bits contain segment addresses.
>
> (b) is not a problem as long as a sequential storage resides
>     always within one Segment.
>
> Therefore, we only have to discuss (a).
>
> The two problems that we have is
> (a1) that for the minus operations an integer overflow will
>      happen and must be ignored.
> (a2) if such addresses are expanded to larger variables,
>      e.g., MPI_Count with more bits in MPI_Count than in MPI_Aint,
>      sign expansion will result in completely wring results.
>
> And here, the most simple trick is,
> (a3) that MPI_Get_address really shall
> map the contiguous unsigned range from 0 to 2**64-1 to the
> signed (and also contiguous) range from -2**63 to 2**63-1
> by simple subtracting 2**63.
> With this simple trick in MPI_Get_address, Problems
> 8a1) and (a2) are resolved.
>
> It looks like that (a) and therefore (a1) and (a2)
> may be far in the future.
> But they may be less far in the future, if a system may
> map the whole applications cluster address space
> into virtual memory (not cache coherent, but accessible).
>
>
> And all this is never or only partial written into the
> MPI Standard, also all is (well) known by the MPI Forum,
> with the following exceptions:
> - (a2) is new.
> - (a1) is solved in MPI-3.1 only for MPI_Aint_diff and
>        MPI_Aint_add, but not for the operators - and +
>        if a user will switch on integer overflow detection
>        in the future when we will have such large systems.
> - (a3) is new and in principle solves the problem also
>        for + and - operators.
>
> At lease (a1)+(a2) should be added as rationale to MPI-4.0
> and (a3) as advice to implementors within the framework
> of big count, because (a2) is newly coming with big count.
>
> I hope this helps a bit if you took the time to read
> this long email.
>
> Best regards
> Rolf
>
>
>
> ----- Original Message -----
> > From: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> > To: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> > Cc: "Jim Dinan" <james.dinan at gmail.com>, "James Dinan" <
> james.dinan at intel.com>
> > Sent: Monday, October 28, 2019 5:07:37 PM
> > Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
> counts, sizes, and byte and nonbyte displacements
>
> > Still not sure I see the issue. MPI's memory-related integers should map
> to
> > types that serve the same function in C. If the base language is broken
> for
> > segmented addressing, we won't be able to fix it in a library. Looking
> at the
> > mapping below, I don't see where we would have broken it:
> >
> > intptr_t => MPI_Aint
> > uintptr_t => ??? (Anyone remember the MPI_Auint "golden Aint" proposal?)
> > ptrdiff_t => MPI_Aint
> > size_t (sizeof) => MPI_Count, int
> > size_t (offsetof) => MPI_Aint, int
> > ssize_t => Mostly for error handling. Out of scope for MPI?
> >
> > It sounds like there are some places where we used MPI_Aint in place of
> size_t
> > for sizes. Not great, but MPI_Aint already needs to be at least as large
> as
> > size_t, so this seems benign.
> >
> > ~Jim.
> >
> > On Fri, Oct 25, 2019 at 8:25 PM Dinan, James via mpiwg-large-counts < [
> > mailto:mpiwg-large-counts at lists.mpi-forum.org |
> > mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> >
> >
> >
> >
> >
> > Jeff, thanks so much for opening up these old wounds. I’m not sure I
> have enough
> > context to contribute to the discussion. Where can I read up on the
> issue with
> > MPI_Aint?
> >
> >
> >
> > I’m glad to hear that C signed integers will finally have a well-defined
> > representation.
> >
> >
> >
> > ~Jim.
> >
> >
> >
> >
> > From: Jeff Hammond < [ mailto:jeff.science at gmail.com |
> jeff.science at gmail.com ]
> > >
> > Date: Thursday, October 24, 2019 at 7:03 PM
> > To: "Jeff Squyres (jsquyres)" < [ mailto:jsquyres at cisco.com |
> jsquyres at cisco.com
> > ] >
> > Cc: MPI BigCount Working Group < [ mailto:
> mpiwg-large-counts at lists.mpi-forum.org
> > | mpiwg-large-counts at lists.mpi-forum.org ] >, "Dinan, James" < [
> > mailto:james.dinan at intel.com | james.dinan at intel.com ] >
> > Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
> counts,
> > sizes, and byte and nonbyte displacements
> >
> >
> >
> >
> >
> > Jim (cc) suffered the most in MPI 3.0 days because of AINT_DIFF and
> AINT_SUM, so
> > maybe he wants to create this ticket.
> >
> >
> >
> >
> >
> > Jeff
> >
> >
> >
> >
> >
> > On Thu, Oct 24, 2019 at 2:41 PM Jeff Squyres (jsquyres) < [
> > mailto:jsquyres at cisco.com | jsquyres at cisco.com ] > wrote:
> >
> >
> >
> >
> >
> > Not opposed to ditching segmented addressing at all. We'd need a ticket
> for this
> > ASAP, though.
> >
> >
> >
> >
> >
> > This whole conversation is predicated on:
> >
> >
> >
> >
> >
> > - MPI supposedly supports segmented addressing
> >
> >
> > - MPI_Aint is not sufficient for modern segmented addressing (i.e.,
> representing
> > an address that may not be in main RAM and is not mapped in to the
> current
> > process' linear address space)
> >
> >
> >
> >
> >
> > If we no longer care about segmented addressing, that makes a whole
> bunch of
> > BigCount stuff a LOT easier. E.g., MPI_Aint can basically be a
> > non-segment-supporting address integer. AINT_DIFF and AINT_SUM can go
> away,
> > too.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > On Oct 24, 2019, at 5:35 PM, Jeff Hammond via mpiwg-large-counts < [
> > mailto:mpiwg-large-counts at lists.mpi-forum.org |
> > mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> >
> >
> >
> >
> >
> > Rolf:
> >
> >
> >
> > Before anybody spends any time analyzing how we handle segmented
> addressing, I
> > want you to provide an example of a platform where this is relevant. What
> > system can you boot today that needs this and what MPI libraries have
> expressed
> > an interest in supporting it?
> >
> >
> >
> >
> >
> > For anyone who didn't hear, ISO C and C++ have finally committed to
> > twos-complement integers ( [
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html |
> > http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html ]
> , [
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm |
> > http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm ] ) because
> modern
> > programmers should not be limited by hardware designs from the 1960s. We
> should
> > similarly not waste our time on obsolete features like segmentation.
> >
> >
> >
> >
> >
> > Jeff
> >
> >
> >
> >
> >
> > On Thu, Oct 24, 2019 at 10:13 AM Rolf Rabenseifner via
> mpiwg-large-counts < [
> > mailto:mpiwg-large-counts at lists.mpi-forum.org |
> > mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
> >
> >
> >
> >
> >> I think that changes the conversation entirely, right?
> >
> > Not the first part, the state-of-current-MPI.
> >
> > It may change something for the future, or a new interface may be needed.
> >
> > Please, can you describe how MPI_Get_address can work with the
> > different variables from different memory segments.
> >
> > Or whether a completely new function or a set of functions is needed.
> >
> > If we can still express variables from all memory segments as
> > input to MPI_Get_address, there may be still a way to flatten
> > the result of some internal address-iquiry into a flattened
> > signed integer with the same behavior as MPI_Aint today.
> >
> > If this is impossible, then new way of thinking and solution
> > may be needed.
> >
> > I really want to see examples for all current stuff as you
> > mentioned in your last email.
> >
> > Best regards
> > Rolf
> >
> > ----- Original Message -----
> >> From: "Jeff Squyres" < [ mailto:jsquyres at cisco.com | jsquyres at cisco.com
> ] >
> >> To: "Rolf Rabenseifner" < [ mailto:rabenseifner at hlrs.de |
> rabenseifner at hlrs.de ]
> >> >
> >> Cc: "mpiwg-large-counts" < [ mailto:
> mpiwg-large-counts at lists.mpi-forum.org |
> >> mpiwg-large-counts at lists.mpi-forum.org ] >
> >> Sent: Thursday, October 24, 2019 5:27:31 PM
> >> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
> counts,
> >> sizes, and byte and nonbyte displacements
> >
> >> On Oct 24, 2019, at 11:15 AM, Rolf Rabenseifner
> >> < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] <mailto: [
> >> mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] >> wrote:
> >>
> >> For me, it looked like that there was some misunderstanding
> >> of the concept that absolute and relative addresses
> >> and number of bytes that can be stored in MPI_Aint.
> >>
> >> ...with the caveat that MPI_Aint -- as it is right now -- does not
> support
> >> modern segmented memory systems (i.e., where you need more than a small
> number
> >> of bits to indicate the segment where the memory lives).
> >>
> >> I think that changes the conversation entirely, right?
> >>
> >> --
> >> Jeff Squyres
> >> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ] <mailto: [
> >> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
> >
> > --
> > Dr. Rolf Rabenseifner . . . . . . . . . .. email [ mailto:
> rabenseifner at hlrs.de |
> > rabenseifner at hlrs.de ] .
> > High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> > University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> > Head of Dpmt Parallel Computing . . . [
> http://www.hlrs.de/people/rabenseifner |
> > www.hlrs.de/people/rabenseifner ] .
> > Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> > _______________________________________________
> > mpiwg-large-counts mailing list
> > [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> > mpiwg-large-counts at lists.mpi-forum.org ]
> > [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> >
> > Jeff Hammond
> > [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> > [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
> >
> >
> > _______________________________________________
> > mpiwg-large-counts mailing list
> > [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> > mpiwg-large-counts at lists.mpi-forum.org ]
> > [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> >
> >
> >
> >
> >
> >
> > --
> > Jeff Squyres
> > [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ]
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --
> >
> >
> > Jeff Hammond
> > [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
> > [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
> > _______________________________________________
> > mpiwg-large-counts mailing list
> > [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
> > mpiwg-large-counts at lists.mpi-forum.org ]
> > [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
> >
> > _______________________________________________
> > mpiwg-large-counts mailing list
> > mpiwg-large-counts at lists.mpi-forum.org
> > https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
>
> --
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-large-counts/attachments/20191029/b743959e/attachment-0001.html>


More information about the mpiwg-large-counts mailing list