[Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

Rolf Rabenseifner rabenseifner at hlrs.de
Wed Oct 30 10:59:13 CDT 2019


Dear all,

> discussion.  The minimum set of "we" probably includes Rolf and Jim.

The Minimum set includes all who are part of the discussion and
decisions about the large versions of

MPI_Get_address
MPI_Type_create_struct   (and all routines mentioned in 4.1.1)
MPI_Type_create_resized  
MPI_Aint_diff
MPI_Aint_add
MPI_Send(MPI_BOTTOM, ...


and the MPI-3.1 sections
2.5.6 Absolute Addresses and Relative Address Displacements
2.5.7 File Offsets
2.5.8 Counts
2.6.4 Functions and Macros
4.1.5 Address and Size Functions
4.1.12 Correct Use of Addresses

Is this list complete?

Less relevant are routines as
- MPI_Type_get(_true)_extent(_x), MPI_Type_get_contents
- MPI_(UN)Pack and MPI_(UN)Pack_external 
  and MPI_Pack_size and MPI_Pack_external_size
- The examples in 4.1.14 Examples
- The datatypes MPI_AINT, MPI_OFFSET, MPI_COUNT
  and the reduction operations for them
- 5.11.3 with a user defined reduction Operation
- MPI_Neighbor_alltoallw
- MPI_Alloc_mem, MPI_Win_create/allocate/allocate_shared, MPI_Win_attach
- MPI_Put and all other RMA
  with MPI_Aint target_disp (only relative address displacements)
- Example 11.23 on page 470
- MPI_File_get_type_extent
- Callback MPI_Datarep_extent_function
- 17.2.7 Attributes

Best regards
Rolf

----- Original Message -----
> From: "Jeff Squyres" <jsquyres at cisco.com>
> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>, "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
> Sent: Wednesday, October 30, 2019 2:55:29 PM
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

> The clock is ticking -- we're running out of time before the December meeting.
> I'd like to propose that we get on a webex and have a higher-bandwidth
> discussion.  The minimum set of "we" probably includes Rolf and Jim.
> 
> Here's a Doodle to find a time that we can all meet (gentle reminder: Europe
> [and others?] changed time last weekend, but the US won't change time until
> this upcoming Sunday Nov 3 -- please be sure to look at the Doodle with
> appropriate timezone enablement):
> 
>    https://doodle.com/poll/2inm4aqakak9kcgy
> 
> Please fill out the Doodle today, and we'll get a webex setup ASAP.
> 
> Thanks!
> 
> 
> 
> 
>> On Oct 30, 2019, at 5:18 AM, Rolf Rabenseifner via mpiwg-large-counts
>> <mpiwg-large-counts at lists.mpi-forum.org> wrote:
>> 
>> Dear all and Jim,
>> 
>> Jim asked:
>>> When you assign an MPI_Aint to an MPI_Count, there are two cases depending
>>> on what the bits in the MPI_Aint represent: absolute address and relative
>>> displacements.  The case where you assign an address to a count doesn't
>>> make sense to me.  Why would one do this and why should MPI support it?
>>> The case where you assign a displacement to a count seems fine, you would
>>> want sign extension to happen.
>> 
>> The answer is very simple:
>> All derived datatype routines serve describing of memory **and** file space.
>> 
>> Therefore, the large count working group should decide:
>> - Should the new large count routines be prepared for more than 10 or 20 Exabyte
>>   files where we need 64/65 or 65/66 unsigned/signed integers for relative byte
>>   displacements or byte counts?
>>   If yes, then all MPI_Aint arguments must be substituted by MPI_Count.
>>   (In other words, do we want to be prepared for another 25 years of MPI? :-)
>> - Should we allow that these new routines are also used for memory description,
>>   where we typically need only the large MPI_Count "count" arguments?
>>   (or should we provide two different new routines for each routine that
>>    currently has int Count/... and MPI_Aint disp/... arguments)
>> - Should we allow a mix of old and new routines, especially for memory-based
>>   usage, that old-style MPI_Get_address is used to retrieve an absolute
>>   address and then, e.g., new style MPI_Type_create_struct with
>>   MPI_Count blocklength and displacements is used?
>> - Do we want to require for this type cast of MPI_Aint addr into MPI_Count
>>   that it is allowed to do this cast with a normal assignment, rather than
>>   a special MPI function?
>> 
>> If we answer all four questions with yes (and in my opinion, we must)
>> then Jim's question
>>  "Why would one do this [assign an address to a Count]
>>   and why should MPI support it?"
>> is answered with this set of reasons.
>> 
>> I would say, that this is the most complex decision that the
>> large count working group has to decide.
>> A wrong decision would be hard to be fixed in the future.
>> 
>> Best regards
>> Rolf
>> 
>> ----- Original Message -----
>>> From: "Jim Dinan" <james.dinan at gmail.com>
>>> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>> Cc: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
>>> Sent: Tuesday, October 29, 2019 10:28:46 PM
>>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts,
>>> sizes, and byte and nonbyte displacements
>> 
>>> If you do pointer arithmetic, the compiler will ensure that the result is
>>> correct.  If you convert a pointer into an integer and then do the
>>> arithmetic, the compiler can't help you and the result is not portable.
>>> This is why MPI_Aint_add describes what it does in terms of pointer
>>> arithmetic.  The confusing and frustrating thing about MPI_Aint is that
>>> it's one type for two very different purposes.  Allowing direct +/- on
>>> MPI_Aint values that represent addresses is not portable and is a mistake
>>> that we tried to correct with MPI_Aint_add/diff (I am happy to strengthen
>>> should to must if needed).  It's perfectly fine to do arithmetic on
>>> MPI_Aint values that are displacements.
>>> 
>>> When you assign an MPI_Aint to an MPI_Count, there are two cases depending
>>> on what the bits in the MPI_Aint represent: absolute address and relative
>>> displacements.  The case where you assign an address to a count doesn't
>>> make sense to me.  Why would one do this and why should MPI support it?
>>> The case where you assign a displacement to a count seems fine, you would
>>> want sign extension to happen.
>>> 
>>> ~Jim.
>>> 
>>> On Tue, Oct 29, 2019 at 4:52 PM Rolf Rabenseifner <rabenseifner at hlrs.de>
>>> wrote:
>>> 
>>>> Dear Jim,
>>>> 
>>>>> (a3) Section 4.1.5 of MPI 3.1 states "To ensure portability, arithmetic
>>>> on
>>>>> absolute addresses should not be performed with the intrinsic operators
>>>> \-"
>>>>> and \+".
>>>> 
>>>> The major problem is, that we decided "should" and not "maust" or "shall",
>>>> because there is such many existing MPI-1 ... MPI-3.0 code that must have
>>>> used + or - operators.
>>>> 
>>>> The only objective, that is true from the beginning, that MPI addresses
>>>> must be
>>>> retrieved with MPI_Get_address.
>>>> 
>>>> And the second also Major Problem is the new assigment of an MPI_Aint
>>>> value
>>>> into an MPI_Count variable with MPI_Count larger than MPI_Aint.
>>>> 
>>>> Therefore, I would prefere, that we keep this "should" and design in long
>>>> term
>>>> MPI_Get_address in a way that in principle MPI_Aint_diff and _add
>>>> need not to do anythin else as the + or - operator.
>>>> 
>>>> And this depends on the meaning of the unsigned addresses, i.e.,
>>>> what is the sequence of addresses (i.e., is it really going from
>>>> 0 to FFFF...FFFF) and than mapping these addreses to the mathematical
>>>> sequence
>>>> of MPI_Aint which starts at -2**(n-1) and ends at 2**(n-1)-1.
>>>> 
>>>> Thats all. For the moment, as far as the web and some emails told us,
>>>> we are fare away from this contiguous 64-bit address space (0 to
>>>> FFFF...FFFF).
>>>> 
>>>> But we should be correctly prepared.
>>>> 
>>>> Or in other words:
>>>>> (a2) Should be solved by MPI_Aint_add/diff.
>>>> In my opinion no, it must be solved by MPI_Get_addr
>>>> and MPI_Aint_add/diff can stay normal + or - operators.
>>>> 
>>>> I should also mention, that of course all MPI routines that
>>>> accept MPI_BOOTOM must reverse the work of MPI_Get_address
>>>> to get back the real "unsigned" virtual addresses of the OS.
>>>> 
>>>> The same what we already had if an implementation has chosen
>>>> to use the address of an MPI common block as base for MPI_BOTTOM.
>>>> Here, the MPI lib had the freedom to revert the mapping
>>>> within MPI_Get_addr or within all functions called with MPI_BOTTOM.
>>>> 
>>>> Best regards
>>>> Rolf
>>>> 
>>>> 
>>>> 
>>>> ----- Original Message -----
>>>>> From: "Jim Dinan" <james.dinan at gmail.com>
>>>>> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de>
>>>>> Cc: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
>>>>> Sent: Tuesday, October 29, 2019 3:58:18 PM
>>>>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
>>>> counts, sizes, and byte and nonbyte displacements
>>>> 
>>>>> Hi Rolf,
>>>>> 
>>>>> (a1) seems to me like another artifact of storing an unsigned quantity
>>>> in a
>>>>> signed variable, i.e., the quantity in an MPI_Aint can be an unsigned
>>>>> address or a signed displacement.  Since we don't have an unsigned type
>>>> for
>>>>> addresses, the user can't portably fix this above MPI.  We will need to
>>>> add
>>>>> functions to deal with combinations of MPI_Aint and MPI_Counts.  This is
>>>>> essentially why we needed MPI_Aint_add/diff.  Or ... the golden (Au is
>>>>> gold) int ... MPI_Auint.
>>>>> 
>>>>> (a2) Should be solved by MPI_Aint_add/diff.
>>>>> 
>>>>> (a3) Section 4.1.5 of MPI 3.1 states "To ensure portability, arithmetic
>>>> on
>>>>> absolute addresses should not be performed with the intrinsic operators
>>>> \-"
>>>>> and \+".  MPI_Aint_add was written carefully to indicate that the "base"
>>>>> argument is treated as an unsigned address and the "disp" argument is
>>>>> treated as a signed displacement.
>>>>> 
>>>>> ~Jim.
>>>>> 
>>>>> On Tue, Oct 29, 2019 at 5:19 AM Rolf Rabenseifner <rabenseifner at hlrs.de>
>>>>> wrote:
>>>>> 
>>>>>> Dear Jim and all,
>>>>>> 
>>>>>> I'm not sure whether I'm really able to understand your email.
>>>>>> 
>>>>>> I take the MPI view:
>>>>>> 
>>>>>> (1) An absolute address can stored in an MPI_Aint variable
>>>>>>     with and only with MPI_Get_address or MPI_Aint_add.
>>>>>> 
>>>>>> (2) A positive or negative number of bytes or a relative address
>>>>>>     which is by definition the amount of bytes between two locations
>>>>>>     in a MPI "sequential storage" (MPI-3.1 page 115)
>>>>>>     can be assigned with any method to an MPI_Aint variable
>>>>>>     as long as the original value fits into MPI_Aint.
>>>>>>     In both languages automatic type cast (i.e., sign expansion)
>>>>>>     is done.
>>>>>> 
>>>>>> (3) If users misuse MPI_Aint for storing anything else into MPI_Aint
>>>>>>     variable then this is out of scope of MPI.
>>>>>>     If such values are used in a minus operation then it is
>>>>>>     out of the scope of MPI whether this makes sense.
>>>>>>     If the user is sure that the new value falls into category (2)
>>>>>>     then all is fine as long as the user is correct.
>>>>>> 
>>>>>> I expect that your => is not a "greater or equal than".
>>>>>> I expect that you noticed assignments.
>>>>>> 
>>>>>>> intptr_t => MPI_Aint
>>>>>> "intptr_t:  integer type capable of holding a pointer."
>>>>>> 
>>>>>>> uintptr_t => ??? (Anyone remember the MPI_Auint "golden Aint"
>>>> proposal?)
>>>>>> "uintptr_t:  unsigned integer type capable of holding a pointer."
>>>>>> 
>>>>>> may fall exactly exactly into (3) when used for pointers.
>>>>>> 
>>>>>> 
>>>>>> Especially on a 64 bit system the user may have in the future exactly
>>>>>> the problems (a), (a1), (a2) and (b) as described below.
>>>>>> But here, the user is responsible, to for example implement (a3),
>>>>>> whereas for MPI_Get_address, the implementors of the MPI library
>>>>>> are responsible and the MPI Forum may be responsible for giving
>>>>>> the correct advices.
>>>>>> 
>>>>>> By the way, the golden MPI_Auint was never golden.
>>>>>> Such need was "resolved" by introducing MPI_Aint_diff and MPI_Aint_add
>>>>>> in MPI-3.1.
>>>>>> 
>>>>>> 
>>>>>>> ptrdiff_t => MPI_Aint
>>>>>> "std::ptrdiff_t is the signed integer type of the result of subtracting
>>>>>> two pointers."
>>>>>> 
>>>>>> may perfectly fit to (2).
>>>>>> 
>>>>>> All of the following falls into category (2):
>>>>>> 
>>>>>>> size_t (sizeof) => MPI_Count, int
>>>>>> "sizeof( type )  (1)
>>>>>>  sizeof expression   (2)
>>>>>>  Both versions are constant expressions of type std::size_t."
>>>>>> 
>>>>>>> size_t (offsetof) => MPI_Aint, int
>>>>>> "Defined in header <cstddef>
>>>>>>  #define offsetof(type, member) /*implementation-defined*/
>>>>>>  The macro offsetof expands to an integral constant expression
>>>>>>  of type std::size_t, the value of which is the offset, in bytes,
>>>>>>  from the beginning of an object of specified type to ist
>>>>>>  specified member, including padding if any."
>>>>>> 
>>>>>> Note that this offsetof has nothing to do with MPI_Offset.
>>>>>> 
>>>>>> On a system with less than 2*31 byte and 4-byte int, it is guaranteed
>>>>>> that  size_t => int  works.
>>>>>> 
>>>>>> On a system with less than 2*63 byte and 8-byte MPI_Aint, it is
>>>> guaranteed
>>>>>> that  size_t => MPI_Aint  works.
>>>>>> 
>>>>>> Problem: size_t is unsigned, int and MPI_Aint are signed.
>>>>>> 
>>>>>> MPI_Count should be defined in a way that on systems with more than
>>>>>> 2**63 Bytes of disc space, that MPI_Count can hold such values,
>>>>>> because
>>>>>>   int .LE. {MPI_Aint, MPI_Offset} .LE. MPI_Count
>>>>>> 
>>>>>> Therefore  size_t => MPI_Count  should always work.
>>>>>> 
>>>>>>> ssize_t => Mostly for error handling. Out of scope for MPI?
>>>>>> "In short, ssize_t is the same as size_t, but is a signed type -
>>>>>>  read ssize_t as “signed size_t”. ssize_t is able to represent
>>>>>>  the number -1, which is returned by several system calls
>>>>>>  and library functions as a way to indicate error.
>>>>>>  For example, the read and write system calls: ...
>>>>>>  ssize_t read(int fildes, void *buf, size_t nbyte); ..."
>>>>>> 
>>>>>> ssize_t fits therefore better to MPI_Aint, because both
>>>>>> are signed types that can hold byte counts, but
>>>>>> the value -1 in a MPI_Aint variable stands for a
>>>>>> byte displacement of -1 bytes and not for an error code -1.
>>>>>> 
>>>>>> 
>>>>>> All use of (2) is in principle no problem.
>>>>>> ------------------------------------------
>>>>>> 
>>>>>> All the complex discussiuon of the last days is about (1):
>>>>>> 
>>>>>> (1) An absolute address can stored in an MPI_Aint variable
>>>>>>     with and only with MPI_Get_address or MPI_Aint_add.
>>>>>> 
>>>>>> In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be not portable),
>>>>>> we also allow
>>>>>>  MPI_Aint variable := absolute address in MPI_Aint variable
>>>>>>                        + or -
>>>>>>                       a number of bytes (in any integer type).
>>>>>> 
>>>>>> The result is then still in category (1).
>>>>>> 
>>>>>> 
>>>>>> For the difference of two absolute addresses,
>>>>>> MPI_Aint_diff can be used. The result is than MPI_Aint of category (2)
>>>>>> 
>>>>>> In MPI-1 to MPI-3.0 and still in MPI-3.1 (here as may be not portable),
>>>>>> we also allow
>>>>>>  MPI_Aint variable := absolute address in MPI_Aint variable
>>>>>>                       - absolute address in MPI_Aint variable.
>>>>>> 
>>>>>> The result is then in category (2).
>>>>>> 
>>>>>> 
>>>>>> The problems we discuss the last days are about systems
>>>>>> that internally use unsigned addresses and the MPI library stores
>>>>>> these addresses into MPI_Aint variables and
>>>>>> 
>>>>>> (a) a sequential storage can have virtual addresses that
>>>>>>     are both in the area with highest bit =0 and other addresses
>>>>>>     in the same sequential storage (i.e., same array or structure)
>>>>>>     with highest bit =1.
>>>>>> 
>>>>>> or
>>>>>> (b) some higher bits contain segment addresses.
>>>>>> 
>>>>>> (b) is not a problem as long as a sequential storage resides
>>>>>>     always within one Segment.
>>>>>> 
>>>>>> Therefore, we only have to discuss (a).
>>>>>> 
>>>>>> The two problems that we have is
>>>>>> (a1) that for the minus operations an integer overflow will
>>>>>>      happen and must be ignored.
>>>>>> (a2) if such addresses are expanded to larger variables,
>>>>>>      e.g., MPI_Count with more bits in MPI_Count than in MPI_Aint,
>>>>>>      sign expansion will result in completely wring results.
>>>>>> 
>>>>>> And here, the most simple trick is,
>>>>>> (a3) that MPI_Get_address really shall
>>>>>> map the contiguous unsigned range from 0 to 2**64-1 to the
>>>>>> signed (and also contiguous) range from -2**63 to 2**63-1
>>>>>> by simple subtracting 2**63.
>>>>>> With this simple trick in MPI_Get_address, Problems
>>>>>> 8a1) and (a2) are resolved.
>>>>>> 
>>>>>> It looks like that (a) and therefore (a1) and (a2)
>>>>>> may be far in the future.
>>>>>> But they may be less far in the future, if a system may
>>>>>> map the whole applications cluster address space
>>>>>> into virtual memory (not cache coherent, but accessible).
>>>>>> 
>>>>>> 
>>>>>> And all this is never or only partial written into the
>>>>>> MPI Standard, also all is (well) known by the MPI Forum,
>>>>>> with the following exceptions:
>>>>>> - (a2) is new.
>>>>>> - (a1) is solved in MPI-3.1 only for MPI_Aint_diff and
>>>>>>        MPI_Aint_add, but not for the operators - and +
>>>>>>        if a user will switch on integer overflow detection
>>>>>>        in the future when we will have such large systems.
>>>>>> - (a3) is new and in principle solves the problem also
>>>>>>        for + and - operators.
>>>>>> 
>>>>>> At lease (a1)+(a2) should be added as rationale to MPI-4.0
>>>>>> and (a3) as advice to implementors within the framework
>>>>>> of big count, because (a2) is newly coming with big count.
>>>>>> 
>>>>>> I hope this helps a bit if you took the time to read
>>>>>> this long email.
>>>>>> 
>>>>>> Best regards
>>>>>> Rolf
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> ----- Original Message -----
>>>>>>> From: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
>>>>>>> To: "mpiwg-large-counts" <mpiwg-large-counts at lists.mpi-forum.org>
>>>>>>> Cc: "Jim Dinan" <james.dinan at gmail.com>, "James Dinan" <
>>>>>> james.dinan at intel.com>
>>>>>>> Sent: Monday, October 28, 2019 5:07:37 PM
>>>>>>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
>>>>>> counts, sizes, and byte and nonbyte displacements
>>>>>> 
>>>>>>> Still not sure I see the issue. MPI's memory-related integers should
>>>> map
>>>>>> to
>>>>>>> types that serve the same function in C. If the base language is
>>>> broken
>>>>>> for
>>>>>>> segmented addressing, we won't be able to fix it in a library. Looking
>>>>>> at the
>>>>>>> mapping below, I don't see where we would have broken it:
>>>>>>> 
>>>>>>> intptr_t => MPI_Aint
>>>>>>> uintptr_t => ??? (Anyone remember the MPI_Auint "golden Aint"
>>>> proposal?)
>>>>>>> ptrdiff_t => MPI_Aint
>>>>>>> size_t (sizeof) => MPI_Count, int
>>>>>>> size_t (offsetof) => MPI_Aint, int
>>>>>>> ssize_t => Mostly for error handling. Out of scope for MPI?
>>>>>>> 
>>>>>>> It sounds like there are some places where we used MPI_Aint in place
>>>> of
>>>>>> size_t
>>>>>>> for sizes. Not great, but MPI_Aint already needs to be at least as
>>>> large
>>>>>> as
>>>>>>> size_t, so this seems benign.
>>>>>>> 
>>>>>>> ~Jim.
>>>>>>> 
>>>>>>> On Fri, Oct 25, 2019 at 8:25 PM Dinan, James via mpiwg-large-counts <
>>>> [
>>>>>>> mailto:mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Jeff, thanks so much for opening up these old wounds. I’m not sure I
>>>>>> have enough
>>>>>>> context to contribute to the discussion. Where can I read up on the
>>>>>> issue with
>>>>>>> MPI_Aint?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> I’m glad to hear that C signed integers will finally have a
>>>> well-defined
>>>>>>> representation.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> ~Jim.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> From: Jeff Hammond < [ mailto:jeff.science at gmail.com |
>>>>>> jeff.science at gmail.com ]
>>>>>>>> 
>>>>>>> Date: Thursday, October 24, 2019 at 7:03 PM
>>>>>>> To: "Jeff Squyres (jsquyres)" < [ mailto:jsquyres at cisco.com |
>>>>>> jsquyres at cisco.com
>>>>>>> ] >
>>>>>>> Cc: MPI BigCount Working Group < [ mailto:
>>>>>> mpiwg-large-counts at lists.mpi-forum.org
>>>>>>> | mpiwg-large-counts at lists.mpi-forum.org ] >, "Dinan, James" < [
>>>>>>> mailto:james.dinan at intel.com | james.dinan at intel.com ] >
>>>>>>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
>>>>>> counts,
>>>>>>> sizes, and byte and nonbyte displacements
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Jim (cc) suffered the most in MPI 3.0 days because of AINT_DIFF and
>>>>>> AINT_SUM, so
>>>>>>> maybe he wants to create this ticket.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Jeff
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Oct 24, 2019 at 2:41 PM Jeff Squyres (jsquyres) < [
>>>>>>> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] > wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Not opposed to ditching segmented addressing at all. We'd need a
>>>> ticket
>>>>>> for this
>>>>>>> ASAP, though.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> This whole conversation is predicated on:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> - MPI supposedly supports segmented addressing
>>>>>>> 
>>>>>>> 
>>>>>>> - MPI_Aint is not sufficient for modern segmented addressing (i.e.,
>>>>>> representing
>>>>>>> an address that may not be in main RAM and is not mapped in to the
>>>>>> current
>>>>>>> process' linear address space)
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> If we no longer care about segmented addressing, that makes a whole
>>>>>> bunch of
>>>>>>> BigCount stuff a LOT easier. E.g., MPI_Aint can basically be a
>>>>>>> non-segment-supporting address integer. AINT_DIFF and AINT_SUM can go
>>>>>> away,
>>>>>>> too.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Oct 24, 2019, at 5:35 PM, Jeff Hammond via mpiwg-large-counts < [
>>>>>>> mailto:mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Rolf:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Before anybody spends any time analyzing how we handle segmented
>>>>>> addressing, I
>>>>>>> want you to provide an example of a platform where this is relevant.
>>>> What
>>>>>>> system can you boot today that needs this and what MPI libraries have
>>>>>> expressed
>>>>>>> an interest in supporting it?
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> For anyone who didn't hear, ISO C and C++ have finally committed to
>>>>>>> twos-complement integers ( [
>>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html
>>>> |
>>>>>>> http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/p0907r1.html
>>>> ]
>>>>>> , [
>>>>>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm |
>>>>>>> http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2218.htm ] ) because
>>>>>> modern
>>>>>>> programmers should not be limited by hardware designs from the 1960s.
>>>> We
>>>>>> should
>>>>>>> similarly not waste our time on obsolete features like segmentation.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Jeff
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Thu, Oct 24, 2019 at 10:13 AM Rolf Rabenseifner via
>>>>>> mpiwg-large-counts < [
>>>>>>> mailto:mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ] > wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> I think that changes the conversation entirely, right?
>>>>>>> 
>>>>>>> Not the first part, the state-of-current-MPI.
>>>>>>> 
>>>>>>> It may change something for the future, or a new interface may be
>>>> needed.
>>>>>>> 
>>>>>>> Please, can you describe how MPI_Get_address can work with the
>>>>>>> different variables from different memory segments.
>>>>>>> 
>>>>>>> Or whether a completely new function or a set of functions is needed.
>>>>>>> 
>>>>>>> If we can still express variables from all memory segments as
>>>>>>> input to MPI_Get_address, there may be still a way to flatten
>>>>>>> the result of some internal address-iquiry into a flattened
>>>>>>> signed integer with the same behavior as MPI_Aint today.
>>>>>>> 
>>>>>>> If this is impossible, then new way of thinking and solution
>>>>>>> may be needed.
>>>>>>> 
>>>>>>> I really want to see examples for all current stuff as you
>>>>>>> mentioned in your last email.
>>>>>>> 
>>>>>>> Best regards
>>>>>>> Rolf
>>>>>>> 
>>>>>>> ----- Original Message -----
>>>>>>>> From: "Jeff Squyres" < [ mailto:jsquyres at cisco.com |
>>>> jsquyres at cisco.com
>>>>>> ] >
>>>>>>>> To: "Rolf Rabenseifner" < [ mailto:rabenseifner at hlrs.de |
>>>>>> rabenseifner at hlrs.de ]
>>>>>>>>> 
>>>>>>>> Cc: "mpiwg-large-counts" < [ mailto:
>>>>>> mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ] >
>>>>>>>> Sent: Thursday, October 24, 2019 5:27:31 PM
>>>>>>>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for
>>>>>> counts,
>>>>>>>> sizes, and byte and nonbyte displacements
>>>>>>> 
>>>>>>>> On Oct 24, 2019, at 11:15 AM, Rolf Rabenseifner
>>>>>>>> < [ mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] <mailto: [
>>>>>>>> mailto:rabenseifner at hlrs.de | rabenseifner at hlrs.de ] >> wrote:
>>>>>>>> 
>>>>>>>> For me, it looked like that there was some misunderstanding
>>>>>>>> of the concept that absolute and relative addresses
>>>>>>>> and number of bytes that can be stored in MPI_Aint.
>>>>>>>> 
>>>>>>>> ...with the caveat that MPI_Aint -- as it is right now -- does not
>>>>>> support
>>>>>>>> modern segmented memory systems (i.e., where you need more than a
>>>> small
>>>>>> number
>>>>>>>> of bits to indicate the segment where the memory lives).
>>>>>>>> 
>>>>>>>> I think that changes the conversation entirely, right?
>>>>>>>> 
>>>>>>>> --
>>>>>>>> Jeff Squyres
>>>>>>>> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ] <mailto: [
>>>>>>>> mailto:jsquyres at cisco.com | jsquyres at cisco.com ] >
>>>>>>> 
>>>>>>> --
>>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email [ mailto:
>>>>>> rabenseifner at hlrs.de |
>>>>>>> rabenseifner at hlrs.de ] .
>>>>>>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>>>> .
>>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>>>> .
>>>>>>> Head of Dpmt Parallel Computing . . . [
>>>>>> http://www.hlrs.de/people/rabenseifner |
>>>>>>> www.hlrs.de/people/rabenseifner ] .
>>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307)
>>>> .
>>>>>>> _______________________________________________
>>>>>>> mpiwg-large-counts mailing list
>>>>>>> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ]
>>>>>>> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> 
>>>>>>> Jeff Hammond
>>>>>>> [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
>>>>>>> [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> mpiwg-large-counts mailing list
>>>>>>> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ]
>>>>>>> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> Jeff Squyres
>>>>>>> [ mailto:jsquyres at cisco.com | jsquyres at cisco.com ]
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> --
>>>>>>> 
>>>>>>> 
>>>>>>> Jeff Hammond
>>>>>>> [ mailto:jeff.science at gmail.com | jeff.science at gmail.com ]
>>>>>>> [ http://jeffhammond.github.io/ | http://jeffhammond.github.io/ ]
>>>>>>> _______________________________________________
>>>>>>> mpiwg-large-counts mailing list
>>>>>>> [ mailto:mpiwg-large-counts at lists.mpi-forum.org |
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org ]
>>>>>>> [ https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts |
>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts ]
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> mpiwg-large-counts mailing list
>>>>>>> mpiwg-large-counts at lists.mpi-forum.org
>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
>>>>>> 
>>>>>> --
>>>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
>>>>>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>>>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>>>>>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
>>>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>>>> 
>>>> --
>>>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
>>>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>>>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>>>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
>>>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>> 
>> --
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>> _______________________________________________
>> mpiwg-large-counts mailing list
>> mpiwg-large-counts at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts
> 
> 
> --
> Jeff Squyres
> jsquyres at cisco.com

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de .
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .


More information about the mpiwg-large-counts mailing list