[Mpi3-bwcompat] Meeting notes from 7/23
Solt, David George
david.solt at [hidden]
Fri Jul 23 17:03:41 CDT 2010
Also available on https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/BackCompatMeetings
Discussion was around the continued effort around MPI_Count. The latest effort (#224) is a re-work of an early 2.2 proposal to introduce MPI_Count but allow implementations to define MPI_Count to be an int in order to retain backward compatibility. Fabian has been working through some of the details, and there are a few "boundary" cases where MPI_Count interfaces with other concepts. In particular MPI_Pack which currently looks like this:
int MPI_Pack(void* inbuf, int incount, MPI_Datatype datatype, void* outbuf, int outsize, int *position, MPI_Comm comm)
This API uses outsize and position, neither of which are well-defined as either an int or an MPI_Count. Several ideas were discussed around making one or other of outsize and position one of the following: MPI_Count, size_t, uint64, MPI_Aint, MPI_Size, MPI_Position. We wrestled with the exact meaning of "what is a position" and how does it relate to other uses of MPI_Aint, etc. We discussed whether there needs to be a distinction between the size of an "offset" in memory vs. within a message vs. within a file. (Notetaker's note: MPI_Offset is generally used for file positions, MPI_Aint for memory positions and MPI_Count is proposed to address positions within a message)
MPI_Pack ties several of these concepts together because MPI_Pack creates something in memory but is also intended to be used as a message or could be used in a file operation. The "result" of MPI_Pack is currently an int (position) which represents a position in memory (therefore should have been MPI_Aint) but is used roughly as a message count in sends/recvs (thus MPI_Count). The number of bytes which can be packed can exceed the size of MPI_Count and thus could force us to make requirements about the relative sizes of MPI_Aint and MPI_Count. We did not want to do that. In the end we agreed that position, which is currently defined as "current position in buffer, in bytes" should be re-defined as a "current count of MPI_PACKED datatypes within the buffer" (something like that) and therefore can be an MPI_Count. This means that MPI_Pack is now restricted to working with buffers which are only capable of being "indexed" with MPI_Count values. The argument really represents the position within a buffer, but w
e have re-framed it in this way to avoid introducing a new type and to ensure that position will be something we can pass as an MPI_Count to communication routines. Another option we considered here was using MPI_Offset for the position argument -- because that's pretty much what it is: an offset. But we can't do that for two reasons: 1) it would break backwards compatibility (because MPI_Offset != int in many current MPI implementations), and 2) you can't pass an MPI_Offset to the count argument of MPI_Send. We will add some text explaining that MPI_Pack is not intended to pack extremely large buffers and that extremely large messages should be pipe-lined with multiple MPI_Pack/MPI_Send operations for performance reasons.
The outsize argument, on the other hand, cannot as easily be converted to an MPI_Count argument. We could say that the position is always in terms of how many counts of MPI_BYTE's would fit into the buffer, but that is a contorted use of MPI_Count. So, we proposed to introduce MPI_Size for use as the outsize argument to MPI_Pack/MPI_Unpack. We did not discuss it much, but I believe it will also get used for MPI_Type_size. Another option we considered here was using MPI_Offset for the position argument -- because that's pretty much what it is: an offset. But we can't do that for two reasons: 1) it would break backwards compatibility (because MPI_Offset != int in many current MPI implementations), and 2) you can't pass an MPI_Offset to the count argument of MPI_Send. MPI_Size will not need to be tied in any way to the size of MPI_Count. If on a system, MPI_Count is smaller than MPI_Size, it only means that we can pass in very large buffers to MPI_Pack, but not very large counts. If MPI_Count is larger than MPI
_Size, it means that we may not be able to pack as large a message as we can send directly with communication routines.
Agreed on a schedule for the next 2 weeks in which Fab will update the ticket based on today's discussion and the other attendees are encouraged to start dividing up the chapters to search for example code which should be changed to reflect the changes being proposed.
The plan is to polish off ticket #224 and have an implementation for the October, 2010 meeting.
More information about the Mpi3-bwcompat
mailing list