[Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements

Wed Oct 9 19:13:02 CDT 2019

Hi Rolf,

I can’t find an email from the 23rd - I have one from you on the 20th, 22nd, and then every day from the 24th to the 28th, but none on the 23rd :). Is this the same as below, or am I missing something? Let me take a look at this, but I’ll need a few days.

Martin

—
Prof. Dr. Martin Schulz, Chair of Computer Architecture and Parallel Systems
Department of Informatics, TU-Munich, Boltzmannstraße 3, D-85748 Garching
Member of the Board of Directors at the Leibniz Supercomputing Centre (LRZ)
Email: schulzm at in.tum.de

> On 9. Oct 2019, at 13:44, Rolf Rabenseifner via mpiwg-large-counts <mpiwg-large-counts at lists.mpi-forum.org> wrote:
> 
> 
> ----- Forwarded Message -----
> From: Rolf Rabenseifner <rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de>>
> To: HOLMES Daniel <d.holmes at epcc.ed.ac.uk <mailto:d.holmes at epcc.ed.ac.uk>>
> Cc: Purushotham V. Bangalore <puri at uab.edu <mailto:puri at uab.edu>>, Martin Ruefenacht <m.a.ruefenacht at gmail.com <mailto:m.a.ruefenacht at gmail.com>>, Claudia Blaas-Schenner <claudia.blaas-schenner at tuwien.ac.at <mailto:claudia.blaas-schenner at tuwien.ac.at>>, Jeff Squyres <jsquyres at cisco.com <mailto:jsquyres at cisco.com>>, Anthony Skjellum <tony-skjellum at utc.edu <mailto:tony-skjellum at utc.edu>>
> Sent: Tue, 08 Oct 2019 20:12:18 +0200 (CEST)
> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements
> 
> Hi all,
> 
> (Dan and Puri, on Sep. 23, I sent also a copy of my email to 
> Martin. For me it Looks like that he had no complains)
> 
> All the rest of answers are inline below.
> 
> Best regards
> Rolf
> 
> 
> ----- Original Message -----
>> From: "HOLMES Daniel" <d.holmes at epcc.ed.ac.uk <mailto:d.holmes at epcc.ed.ac.uk>>
>> To: "Rolf Rabenseifner" <rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de>>
>> Cc: "Purushotham V. Bangalore" <puri at uab.edu <mailto:puri at uab.edu>>, "Martin Ruefenacht" <m.a.ruefenacht at gmail.com <mailto:m.a.ruefenacht at gmail.com>>, "Claudia Blaas-Schenner"
>> <claudia.blaas-schenner at tuwien.ac.at <mailto:claudia.blaas-schenner at tuwien.ac.at>>, "Jeff Squyres" <jsquyres at cisco.com <mailto:jsquyres at cisco.com>>, "Anthony Skjellum" <tony-skjellum at utc.edu <mailto:tony-skjellum at utc.edu>>
>> Sent: Monday, October 7, 2019 8:25:03 PM
>> Subject: Re: [Mpiwg-large-counts] Large Count - the principles for counts, sizes, and byte and nonbyte displacements
> 
>> Hi Rolf,
>> 
>> Apologies - I have no record of this email arriving before today. This probably
>> means that I am not a member of te mpiwg-large-counts email list, which seems
>> somewhat sub-optimal.
>>> 
>>> To understand how big and large counts should be implemented in MPI-4,
>>> it is important to understand the count, displacement and size model of in
>>> MPI-3.1.
>>> 
>>> As long as we do not have a common understanding of MPI-3.1,
>>> we will have problems to define MPI-4.
>> 
>> Agreed.
>> 
>>> Therefore my clear question, do we agree on the rules above?
>> 
>> In short, no.
>> In addition, these rules are incomplete.
>> Also, there are breaches in MPI-3.1 for most of these rules.
>> 
>> —
>> 
>> Below this point in this email, I attempt to cover some of the disagreements
>> with your rules and I include a missing rule. However, this is not the sum
>> total of the knowledge/experience gained in/by the WG. Some of the other WG
>> members may wish to disagree and I expect that some points will generate
>> further discussion.
>> 
>>> - an index into such an array, i.e., the number of an element.
>>> Argument name / descriptions [Routine]:
>>> -- array_of_displacements / displacement ..., in multiples of oldtype extent
>>> (... integer) [MPI_TYPE_INDEXED]
>>> -- sdispls / integer array (of length group size). Entry j specifies the
>>> displacement ... [MPI_ALLTOALLV]
>>> C-type in MPI-3.1: int
>>> C-type in _l: MPI_Count
>> 
>> The two examples you give are not the same. The array_of_displacements is a
>> number, correct type MPI_Count (in C), as you state, because it is a “multiple
>> of” <something>. 
> 
> It looks like, that for MPI_TYPE_INDEXED, you agreed.
> 
>> However, each element in the sdispls array is a displacement
>> relative to a memory location, sendbuf. 
>> The correct type is MPI_Aint (in C).
>> Note that it is incorrect in MPI-3.1 for this parameter to be int[] (in C)
>> because int is *not* a smaller version of MPI_Aint. A displacement must not be
>> described as a number of bytes because of segmented address spaces. The phrase
>> “displacement in bytes” is nonsense.
> 
> 
> MPI-3.1 p171:7 clearly writes: sendbuf+sdispls[i]*extent(sendtype)
> 
> This means if sendbuf would be declared as 
>  type_X sendbuf[sendcount[i]];
> then sendbuf[sdispls[i]] would be exactly the same as we have in
> MPI_TYPE_INDEXED: sdispls[i] is in index into an array of elements.
> 
> To compare it again:
> 
> - MPI_TYPE_INDEXED MPI-3.1 p89:9-10 and 15-16
> 
>   IN array_of_displacements displacement for each block, 
>                             in multiples of oldtype extent (array of integer)
> 
>   int MPI_Type_indexed(..., const int array_of_displacements[],
> 
> - MPI_ALLTOALLV MPI-31., p170:23-24 and p171:7
> 
>   int MPI_Alltoallv(..., const int sdispls[],
> 
>   sendbuf+sdispls[i]*extent(sendtype)
> 
> 
> Second reason: When defining MPI_NEIGHBOR_ALLTOALL|V|W, we
> clearly revisited MPI_ALLTOALL|V|W and decided to correct wrong types.
> For MPI(_NEIGHBOR)_ALLTOALL and MPI(_NEIGHBOR)_ALLTOALLV, we decided
> that all is correct, whereas vor MPI(_NEIGHBOR)_ALLTOALLW, we decided
> that the correct type for the displs is MPI_Aint (and not int).
> 
> Therefore, I cannot see any difference, and the MPI Forum also did not
> see this difference when they standardized MPI_NEIGHBOR_ALLTOALLV.
> 
> 
>>> - number of bytes
>>> Argument name / descriptions [Routine]:
>>> -- size / size of window in bytes (non-negative integer) [MPI_WIN_CREATE]
>>> -- outsize / output buffer size, in bytes (integer) [MPI_PACK_EXTERNAL]
>>> C-type in MPI-3.1: MPI_Aint
>>> C-type in _l:      MPI_Aint
>>> (Wrong) C-types in MPI-3.1: int [MPI_PACK, MPI_TYPE_SIZE]
>>> C-type (corrected) in _l:   MPI_Aint [MPI_PACK, MPI_TYPE_SIZE]
>> 
>> A number of <something> is a number, not a displacement. The correct “large”
>> type is MPI_Count (in C). There are places in MPI-3.1 where this is, correctly,
>> int or MPI_Count but there are other places in MPI-3.1 where it is,
>> incorrectly, MPI_Aint.
> 
> I never said that that a number is a displacement.
> MPI_Aint is clearly used for 
> - addresses relative to a buffer begin,
> - absolute addresses (returned by MPI_GET_ADDRESS)
> - number of bytes.
> 
> MPI-3.1 says
> 2.5.6 Absolute Addresses and Relative Address Displacements
> Some MPI procedures use address arguments that represent an absolute address in the calling
> program, or relative displacement arguments that represent differences of two absolute
> addresses. The datatype of such arguments is MPI_Aint in C and INTEGER (KIND=
> MPI_ADDRESS_KIND) in Fortran. 
> 
> Relative or absolute addresses ==> MPI_Aint.
> It is not written MPI_Aint ==> rel. or abs. addresses.
> 
> Examples for number of bytes:
> 
> MPI_TYPE_CREATE_HVECTOR, result of MPI_AINT_DIFF, MPI_TYPE_GET_EXTENT,
> MPI_WIN_CREATE, MPI_WIN_ALLOCATE, MPI_WIN_ALLOCATE_SHARED, 
> MPI_WIN_SHARED_QUERY, MPI_WIN_ATTACH.
> 
> This is a feature, not a bug.
> 
> Example for byte-displacements:
> 
> MPI_TYPE_CREATE_HINDEXED, MPI_TYPE_CREATE_HINDEXED_BLOCK, 
> MPI_TYPE_CREATE_STRUCT, result of MPI_AINT_DIFF. 
> 
> A real exception is MPI_PUT, ... with  MPI_Aint target_disp,
> and target_addr = window_base + target_disp x disp_unit.
> 
> 
>>> - smaller number in bytes
>>> Some argument names: e.g., disp_unit
>>> Description: local unit size for displacements, in bytes (positive integer)
>>> C-type in MPI-3.1: int
>>> C-type in _l: still int? or MPI_Count?
>> 
>> Why must disp_unit be smaller, and is it really a number of bytes? The premise
>> for disp_unit is that a window might consist of a sequence of locations in
>> memory that are used to store values of a particular type (typically
>> represented by an MPI datatype). The disp_unit gives an indication of the
>> extent of the datatype, a quantity by which all offset values for the window
>> will be scaled, in order that address arithmetic adding (offset*disp_unit) to
>> base_address produces a new address that is one of the sequence of locations in
>> memory used to store values, i.e. the location of the beginning of one of the
>> datatypes. This indicates to me that the correct type for disp_unit is MPI_Aint
>> (in C). Note that, if true, this means that using int (as is done in MPI-3.1)
>> is incorrect, even prior to the large count changes because int is *not* a
>> smaller version of MPI_Aint.
> 
> Yes. disp_unit is an extent of a type, which is a number of bytes, which
> should have been MPI_Aint (as in MPI_TYPE_GET_EXTENT).
> 
> Therefore 
>   C-type in _l: should be MPI_Aint.
> 
> 
>> A missing rule concerns "length arguments” (see section 2.5.2 in MPI-3.1 for a
>> definition).
>> The length of the array_of_requests for the MPI_Waitany function is given by a
>> parameter called count, which is of type int (in C). This is a different class
>> of type than any discussed so far. The WG has flip-flopped with regards to
>> whether this class of type should be enlarged or not. Our current position is
>> (I believe) to leave this class of types alone, i.e. they will remain int (in
>> C).
> 
> I have to think about.
> 
> Best regards
> Rolf
> 
> 
>> 
>> Cheers,
>> Dan.
>> —
>> Dr Daniel Holmes PhD
>> Architect (HPC Research)
>> d.holmes at epcc.ed.ac.uk <mailto:d.holmes at epcc.ed.ac.uk><mailto:d.holmes at epcc.ed.ac.uk <mailto:d.holmes at epcc.ed.ac.uk>>
>> Phone: +44 (0) 131 651 3465
>> Mobile: +44 (0) 7940 524 088
>> Address: Room 2.09, Bayes Centre, 47 Potterrow, Central Area, Edinburgh, EH8 9BT
>> —
>> The University of Edinburgh is a charitable body, registered in Scotland, with
>> registration number SC005336.
>> —
>> 
>> On 7 Oct 2019, at 18:09, Rolf Rabenseifner
>> <rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de><mailto:rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de>>> wrote:
>> 
>> Dear all,
>> 
>> I've never seen any answer to my email.
>> Maybe there isn't any answer, maybe it never reached me.
>> 
>> Please give me an advice.
>> 
>> Best regards
>> Rolf
>> 
>> ----- Forwarded Message -----
>> From: Rolf Rabenseifner via mpiwg-large-counts
>> <mpiwg-large-counts at lists.mpi-forum.org <mailto:mpiwg-large-counts at lists.mpi-forum.org><mailto:mpiwg-large-counts at lists.mpi-forum.org <mailto:mpiwg-large-counts at lists.mpi-forum.org>>>
>> To:
>> mpiwg-large-counts at lists.mpi-forum.org <mailto:mpiwg-large-counts at lists.mpi-forum.org><mailto:mpiwg-large-counts at lists.mpi-forum.org <mailto:mpiwg-large-counts at lists.mpi-forum.org>>
>> Cc: Rolf Rabenseifner <rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de><mailto:rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de>>>
>> Sent: Mon, 23 Sep 2019 17:15:09 +0200 (CEST)
>> Subject: [Mpiwg-large-counts] Large Count - the principles for counts, sizes,
>> and byte and nonbyte displacements
>> 
>> Dear all,
>> 
>> To understand how big and large counts should be implemented in MPI-4,
>> it is important to understand the count, displacement and size model of in
>> MPI-3.1.
>> 
>> As long as we do not have a common understanding of MPI-3.1,
>> we will have problems to define MPI-4.
>> 
>> Therefore here my understanding of MPI-3. We have
>> 
>> - number of array elements with elements of a given type (typically represented
>> by an MPI datatype handle).
>> Usual argument name: count
>> Usual description: number of elements in ... buffer (non-negative integer)
>> C-type in MPI-3.1: int
>> C-type in _l: MPI_Count
>> 
>> - an index into such an array, i.e., the number of an element.
>> Argument name / descriptions [Routine]:
>> -- array_of_displacements / displacement ..., in multiples of oldtype extent
>> (... integer) [MPI_TYPE_INDEXED]
>> -- sdispls / integer array (of length group size). Entry j specifies the
>> displacement ... [MPI_ALLTOALLV]
>> C-type in MPI-3.1: int
>> C-type in _l: MPI_Count
>> 
>> - number of bytes
>> Argument name / descriptions [Routine]:
>> -- size / size of window in bytes (non-negative integer) [MPI_WIN_CREATE]
>> -- outsize / output buffer size, in bytes (integer) [MPI_PACK_EXTERNAL]
>> C-type in MPI-3.1: MPI_Aint
>> C-type in _l:      MPI_Aint
>> (Wrong) C-types in MPI-3.1: int [MPI_PACK, MPI_TYPE_SIZE]
>> C-type (corrected) in _l:   MPI_Aint [MPI_PACK, MPI_TYPE_SIZE]
>> 
>> - smaller number in bytes
>> Some argument names: e.g., disp_unit
>> Description: local unit size for displacements, in bytes (positive integer)
>> C-type in MPI-3.1: int
>> C-type in _l: still int? or MPI_Count?
>> 
>> - Position or relative byte displacement within an array of bytes.
>> Such values can be calculated as any sum and product of in, long, long long,
>> and MPI_Aint as long as MPI_Aint value contains a pure integer size value,
>> i.e., an (integer) difference of two absolute addresses within one sequential
>> storage, see MPI-3.1 page 115 line 31, or a MPI datatype extent, retrieved,
>> e.g., with MPI_TYPE_GET_EXTENT.
>> Argument names / description:
>> -- position / current position in buffer, in bytes (integer) [MPI_PACK_EXTERNAL]
>> -- array_of_displacements / byte displacement of each block (array of integer)
>> [MPI_TYPE_CREATE_STRUCT]
>> C-type in MPI-3.1: MPI_Aint
>> C-type in _l:      MPI_Aint
>> (Wrong) C-types in MPI-3.1: int [MPI_PACK, MPI_ALLTOALLW]
>> C-type (corrected) in _l:   MPI_Aint [MPI_PACK, MPI_ALLTOALLW]
>> 
>> - Absolute address values for byte displacements.
>> These values are also valid for all byte displacements in datatype routines
>> and in MPI_NEIGHBOR_ALLTOALLW, provided that they are used in combination
>> with buffer=MPI_BOTTOM.
>> They cannot be used in MPI_ALLTOALLW.
>> With "C-type (corrected) in _l: MPI_Aint [MPI_ALLTOALLW]",
>> they are also usable with MPI_ALLTOALLW.
>> 
>> 
>> I already looked at the Large/Big Count pdf and saw that in the datatype chapter
>> these rules were broken, for example for the ...PACK/UNPACK... routines.
>> 
>> 
>> Therefore my clear question, do we agree on the rules above?
>> 
>> 
>> Already detected bugs in Version from Sep. 13, 2019:
>> 
>> - page 127:
>> No idea why you changed the name from MPI_GET_ELEMENTS to MPI_TYPE_GET_ELEMENTS.
>> Should be reverted.
>> 
>> - MPI_PACK:
>> outsize and position should be handled identical to those in MPI_PACK_EXTERNAL,
>> i.e.,
>> both are MPI_Aint...
>> 
>> - MPI_PACK_SIZE:
>> size should be handled identical to that in MPI_PACK_EXTERNAL_SIZE, i.e.,
>> MPI_Aint...
>> 
>> - MPI_UNPACK:
>> insize and position should be handled identical to those in MPI_UNPACK_EXTERNAL,
>> i.e.,
>> both are MPI_Aint...
>> 
>> - MPI_Type_contiguous: the large count _l version is missing
>> 
>> - MPI_Type_create_darray
>>  -- array_of_distribs must be INTEGER, because it holds enumeration values
>>     and nothing else.
>>  -- array_of_dargs requires significant explanation because it can
>>     hold an enumeration (INTEGER) and also large count values which
>>     can cause not understandable compiler error reports
>>     in Fortran, because using MPI_COUNT_KIND array_of_gsizes values
>>     together with an INTEGER enumeration constant,
>>     here MPI_DISTRIBUTE_DFLT_DARG, would cause a compiler message
>>     like "no matching interface found".
>> 
>>     Two possible text/interface solutions:
>>     - If using the mpi_f08 module and MPI_DISTRIBUTE_DFLT_DARG together
>>       with large Count Version of this procedure, i.e.,
>>       INTEGER(KIND=MPI_COUNT_KIND) array_of_gsizes and array_of_dargs
>>       arguments, then one should use
>>          INT(MPI_DISTRIBUTE_DFLT_DARG, MPI_COUNT_KIND)
>>       instead of
>>          MPI_DISTRIBUTE_DFLT_DARG.
>>     - overloading with two versions (long,normal) and (long,long)
>>       but i would recommend the first solution because it does
>>       not require additional MPI library implementation overhead.
>> 
>> 
>> Before I can continue with reviewing, the principles above must be
>> cleared/discussed/agreed/...
>> 
>> Best regards
>> Rolf
>> 
>> 
>> 
>> --
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email
>> rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de><mailto:rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de>> .
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>> Head of Dpmt Parallel Computing . . .
>> www.hlrs.de/people/rabenseifner <http://www.hlrs.de/people/rabenseifner><http://www.hlrs.de/people/rabenseifner <http://www.hlrs.de/people/rabenseifner>> .
>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
>> _______________________________________________
>> mpiwg-large-counts mailing list
>> mpiwg-large-counts at lists.mpi-forum.org <mailto:mpiwg-large-counts at lists.mpi-forum.org><mailto:mpiwg-large-counts at lists.mpi-forum.org <mailto:mpiwg-large-counts at lists.mpi-forum.org>>
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts <https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts>
>> 
>> --
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de> .
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner <http://www.hlrs.de/people/rabenseifner> .
>> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> 
> -- 
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de> .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner <http://www.hlrs.de/people/rabenseifner> .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> 
> -- 
> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de <mailto:rabenseifner at hlrs.de> .
> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530 .
> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832 .
> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner .
> Nobelstr. 19, D-70550 Stuttgart, Germany . . . . (Office: Room 1.307) .
> _______________________________________________
> mpiwg-large-counts mailing list
> mpiwg-large-counts at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-large-counts

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-large-counts/attachments/20191009/ca60e240/attachment-0001.html>