[Mpi-forum] MPI_Count

Mon Jan 25 09:33:36 CST 2010

Keep in mind that what was proposed in ATL is not incompatible with what you mention.

1. Fix the IO functions now (because more than one person demonstrated an immediate technical need) by adding MPI_File_write<suffix> versions with an MPI_Count type argument (instead of int).

2. If/when someone demonstrates a technical need for the communication functions, we can add MPI_Send<suffix> versions with an MPI_Count type argument.

There was profound discomfort in the ATL room with the idea of fixing all the communication functions just to make them match the "fixed" IO functions for the following reasons:

- only a small number of users are asking for this
- workarounds exist to MPI_Send >2B items (e.g., datatypes)
- the increase in implementor testing load was deemed "acceptable" for the small number of new File functions, but "oh-my-GAWD! scary" for all the pt2pt/collective functions -- especially when workarounds currently exist
- we can always add the MPI_Send<suffix> functions later if desired/necessary

That being said, there were only a dozen or two people in the ATL room.  So yes, we had an immediate/passionate discussion, but the whole Forum was certainly not represented.  OTOH, only vocal Forum members tend to chime in here on the mailing list.  So neither is a perfect discussion medium...

On Jan 25, 2010, at 2:59 AM, Stephen Poole wrote:

> So, as Ricky states, the assumption of stagnant code and limited scale
> applications is probably not the best thing for MPI going forward. If we
> do it for I/O, we should keep a consistent model as well as assume in
> the future, if we look for at least a doubling of memory every year, as
> Marc states, then having 64bits will hopefully suffice. Having 128bits
> might be nice, but at this point, would certainly be a tad expensive.
> 
> Best
> Steve...
> 
> Kendall, Ricky A. wrote:
> > Memory per core is assumes that applications will always use one process per core.  This is not necessarily a good assumption for a generic application.  Node memory will be growing and in the lifetime of MPI3 there are likely to be machines with lots of memory per node definitely greater than 2GB (we have them now!).  I don't see any difference in the need for I/O having this and the communication layer having this.  Any algorithm that has a shared buffer > 2GB for I/O may want to communicate those buffers to other Nodes for some sort of computation.  I agree with Mark Snir's comments that making the interface that uses specific things from the languages used is the right idea.  I'm not an MPI implementation guy so I don't know what problems this will cause those folks, but I know that the more 'consistent" it is the better application folks will like it.  We are working with several codes that are going the hybrid MPI/OpenMP route and moving large buffer data is on the pl
> ate for a few of them.  Having to manually break it  up because the MPI interface can only address < 2GB chunks will be a bone of contention for those application folks.  The reasons for aggregating I/O on a node level are somewhat because its often a bad idea to have every process per core doing I/O (think funnel).  On modern networks this is less of an issue because many collectives in MPI implementations understand the node architecture so there is a mitigation of the funnel for some communication patterns.  Having the MPI implementation cripple algorithmic choices of application folks by limiting an operation to < 2GB seems wrong to me.  An MPI implementation can choose not to support it but should the standard have that limitation from the beginning???  I don't know. 
> > Regards,
> > Ricky
> >
> >
> > Ricky A. Kendall
> > Group Leader, Scientific Computing
> > Oak Ridge Leadership Computing Facility (OLCF)
> > Oak Ridge National Laboratory
> > Phone: (865) 576-6905
> > Cell: (865) 356-3461
> > Email:  kendallra _at_ ornl.gov
> > AIM and Yahoo: rickyakendall
> > Gmail: rickyk
> >
> >
> > -----Original Message-----
> > From: mpi-forum-bounces at lists.mpi-forum.org [mailto:mpi-forum-bounces at lists.mpi-forum.org] On Behalf Of Jeff Hammond
> > Sent: Sunday, January 24, 2010 11:48 PM
> > To: Main MPI Forum mailing list
> > Subject: Re: [Mpi-forum] MPI_Count
> >
> > Compatibility with quantum computing is more pressing than 128-bit
> > support, if my physicist colleagues are to be believed.
> >
> > As someone almost entirely in the applications world, I hold
> > steadfastly to the notion that >2GB contiguous messages are
> > unnecessary.  I cannot think of a single useful operation that would
> > utilize such a feature.  Memory per core is not likely to exceed 2 GB
> > for some time.
> >
> > Jeff
> >
> > On Sun, Jan 24, 2010 at 8:52 PM, Snir, Marc <snir at illinois.edu> wrote:
> >> at least 64 bits would work, but I would not worry about 128 bits. Running out of 64 bit addresses will take 64 years, if we assume memory doubling every 2 years.
> >>
> >> On Jan 24, 2010, at 8:33 PM, Bronis R. de Supinski wrote:
> >>
> >>> Marc:
> >>>
> >>> I was thinking about this before the email discussion. I
> >>> came to the conclusion that we should require them to be
> >>> at LEAST 64 bit integers, However, the hope was to avoid
> >>> this type of problem in the future by using the MPI_Count
> >>> type. That way, we do not need to change the interface in
> >>> some many years from now when we want 128 bit integers;
> >>> we only need to update the standard to require at LEAST
> >>> 128 bit integers...
> >>>
> >>> Bronis
> >>>
> >>>
> >>> On Sun, 24 Jan 2010, Snir, Marc wrote:
> >>>
> >>>> I can understand the decision to change only the file I/O functions -- this is where the issue is most burning.
> >>>> I also understand the decision to replicate functions that change so that there will be on version keeps the current behavior and a new function with a changed behavior -- this provides a transition period where old codes.
> >>>>
> >>>> I do not understand the advantage of using a type MPI_COUNT that could be, in some implementations, a 32 bit integer, and on others, a 64 bit integer; in some a long and in others a long long. This, rather than defining the new functions to take 64 bit integer count arguments.
> >>>>
> >>>>> From the view-point of implementers, this saves little headache, since we discuss only few functions. From the viewpoint of users, this makes the new functions hard to use. Most programmers expect to run their MPI codes on different platforms, and care about portability. If MPI_COUNT is a 32 bit integer on some platforms and 64 bit integer on others, then portable code can pass only a value that is less than 2^31 as a count argument. In particular, it will be dangerous to pass a long or  long long value. It also seems gratuitous to have a type MPI_COUNT that need not correspond to any specific native type in C or Fortran.  Users expect that the arguments of a library would use types that are defined in the calling programming language -- why introduce an implementation dependence in that correspondence?
> >>>> I would suggest to use explicitly 64 bit integers as the type of count in the new functions. I.e., int64_t in C and INTEGER(KIND=8) in Fortran. Both types are part of the (C/Fortran) standard.
> >>>>
> >>>> On Jan 22, 2010, at 2:06 PM, Jeff Squyres wrote:
> >>>>
> >>>>> Please note that there was a bunch of discussion about MPI_Count and other compatibility issues at the meeting in Atlanta this week.  I posted a summary of takeaways from the discussion on the bwcompat WG mailing list and wiki:
> >>>>>
> >>>>>   https://*svn.mpi-forum.org/trac/mpi-forum-web/wiki/BackCompatMeetings
> >>>>>   http://*lists.mpi-forum.org/mpi3-bwcompat/2010/01/0024.php
> >>>>>
> >>>>> Although there are still some decisions to be made (e.g., about Fortran), a surprising amount of consensus emerged.  Please read up on the notes to see what was discussed -- please chime in ASAP if you have dissenting views.
> >>>>>
> >>>>> Thanks!
> >>>>>
> >>>>> --
> >>>>> Jeff Squyres
> >>>>> jsquyres at cisco.com
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> mpi-forum mailing list
> >>>>> mpi-forum at lists.mpi-forum.org
> >>>>> http://*lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
> >>>> Marc Snir
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> mpi-forum mailing list
> >>>> mpi-forum at lists.mpi-forum.org
> >>>> http://*lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
> >>>>
> >>>>
> >>> _______________________________________________
> >>> mpi-forum mailing list
> >>> mpi-forum at lists.mpi-forum.org
> >>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
> >> Marc Snir
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> mpi-forum mailing list
> >> mpi-forum at lists.mpi-forum.org
> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
> >>
> >
> >
> >
> 
> --
> 
> ======================>
> 
> Steve Poole
> Computer Science and Mathematics Division
> Chief Scientist / Director of Special Programs
> National Center for Computational Sciences Division (OLCF)
> Chief Architect
> Oak Ridge National Laboratory
> 865.574.9008
> "Wisdom is not a product of schooling, but of the lifelong attempt to
> acquire it" Albert Einstein
> 
> =====================>
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
> 

-- 
Jeff Squyres
jsquyres at cisco.com