[Mpi-forum] large count support not as easy as people seem to have thought

Tue May 6 16:40:50 CDT 2014

The size of an int is implementation dependent with a minimum size
requirement (16b).It is not imposed by any standard. A nice summary is
available @ http://en.wikipedia.org/wiki/C_data_types

  George.

On Tue, May 6, 2014 at 5:34 PM, Rob Latham <robl at mcs.anl.gov> wrote:
>
>
> On 05/06/2014 04:14 PM, Jeff Hammond wrote:
>>
>> On Tue, May 6, 2014 at 3:54 PM, Hjelm, Nathan T <hjelmn at lanl.gov> wrote:
>>>
>>> +1 for large counts. I find it just a bit ridiculous that this was punted
>>> on by the forum but I wasn't around for the discussion. Is this in issue
>>> forced on C/C++ by fortran?
>>
>>
>> Unfortunately, I don't think we can blame this one on Fortran.  Users
>> want to pass large counts from the C interface.  That's the motivation
>> for BigMPI, at least.
>
>
> i imagine one day (once again?) the c int type will be 64 bits on a
> platform, right?
>
> or will c ints be 32 bits for ever?
>
> ==rob
>
>
>>
>> The Fortran users that want -i8 and friends to work with the MPI
>> Fortran interface are another issue that doesn't need new functions in
>> the standard, just a bunch of interface gymnastics that most
>> implementers won't enjoy.
>>
>> Best,
>>
>> Jeff
>>
>>> As I noted on this list last Friday, I am working on a higher-level
>>> library to support large counts for MPI communication functions
>>> (https://github.com/jeffhammond/BigMPI).
>>>
>>> In the course of actually trying to implement this the way the Forum
>>> contends it can be done - i.e. using derived-datatypes - I have found
>>> some issues that undermine the Forum's contention that it is so easy
>>> for users to do it that it doesn't need to be in the standard.
>>>
>>> To illustrate some of the issues that I have found, let us consider
>>> the large-count implementation of nonblocking reduce...
>>>
>>> # Example Use Case
>>>
>>> It is entirely reasonable to think that some quantum chemist will want
>>> to reduce a contiguous buffer of more than 2^31 doubles corresponding
>>> to the Fock matrix if they have a multithreaded code, since 16 GB is
>>> not an unreasonable amount of memory per node.
>>>
>>> # Issues
>>>
>>> Unlike the blocking case, where it is reasonable to chop up the data
>>> and performance multiple operations (e.g.
>>> https://github.com/jeffhammond/BigMPI/blob/master/src/reductions_x.c,
>>> assuming that untested code is correct), one must return a single
>>> request to the application if one is to implement MPIX_I(all)reduce_x
>>> with the same semantics as MPI_Iallreduce, as I aspire to do in
>>> BigMPI.
>>>
>>> Issue #1: chopping doesn't work for nonblocking.
>>>
>>> To do the large-count reduction in one nonblocking MPI call, a derived
>>> datatype is required.  However, unlike in RMA, reductions cannot use
>>> built-in ops for user-defined datatypes, even if they are trivially
>>> composed of a large-count of built-in datatypes.  See
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/338 and
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/34 for elaborate
>>> commentary on why this semantic mismatch is lame.
>>>
>>> Issue #2: cannot use built-in reduce ops.
>>>
>>> Once we rule out using built-in ops with our large-count datatypes, we
>>> must reimplement all of the reduction operations required.  I find
>>> this to be nontrivial.  I have not yet figured out how to get at the
>>> underlying datatype info in a simple manner.  It appears that
>>> MPI_Type_get_envelope exists for this purpose, but it's a huge pain to
>>> have to call this function when all I need to know is the number of
>>> built-in datatypes so that I can apply my clever and use
>>> MPI_Reduce_local inside of my user-defined operation.
>>>
>>> Issue #3: implementing the user-defined reduce op isn't easy (in my
>>> opinion).
>>>
>>> Many MPI implementations optimize reductions.  On Blue Gene/Q, MPI has
>>> explicitly vectorized intrinsic/assembly code.  Unless
>>> MPI_Reduce_local hits that code path, I am losing a huge amount of
>>> performance in reductions when I go from 2^31 to 2^31+1 elements.  I
>>> would not be surprised at all if user-defined ops+datatypes exercises
>>> suboptimal code paths in many MPI implementations, which means that
>>> the performance of nonblocking reductions is unnecessarily crippled.
>>>
>>> Issue #4: inability to use optimizations in the MPI implementation.
>>>
>>> # Conclusion
>>>
>>> I believe this problem is best addressed in one of two ways:
>>>
>>> 1) Approve the semantic changes requested in tickets 34 and 338 so
>>> that one can use built-in ops with homogeneous user-defined datatypes.
>>>   This is my preference for multiple reasons.
>>>
>>> 2) Add large-count reductions to the standard.  This means 8 new
>>> functions: blocking and nonblocking (all)reduce and
>>> reduce_scatter(_block).  We don't need large-count functions for any
>>> other collectives because the datatype solution works just fine there,
>>> as I've already demonstrated in BigMPI
>>> (https://github.com/jeffhammond/BigMPI/blob/master/src/collectives_x.c).
>>>
>>> # Social Commentary
>>>
>>>  From now on, when the Forum punts on things and says it's no problem
>>> for users to roll their own using the existing functionality in MPI,
>>> we should strive to be a bit more diligent and actually prototype that
>>> implementation in a manner that proves how easy it is for users.  It
>>> turns out, writing code for some things is harder than just talking
>>> about them in a conference room...
>>>
>>> # Related
>>>
>>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/338#comment:9
>>> captures some of this feedback in Trac.
>>>
>>> I created https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/423 for
>>> the reasons described therein.
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> _______________________________________________
>>> mpi-forum mailing list
>>> mpi-forum at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
>>> _______________________________________________
>>> mpi-forum mailing list
>>> mpi-forum at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum
>>
>>
>>
>>
>
> --
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum