[mpiwg-rma] same_op_no_op

Jeff Hammond jeff.science at gmail.com
Fri Mar 14 09:25:01 CDT 2014


There's an easy solution for "bad" ops: fall back to C&S
implementation the way one does on any shared memory arch when a
particular atomic isn't supported but C&S is.

This means that no mutexes are required, NICs that have HW support for
NO_OP, REPLACE, SUM, XOR, etc. can use them, and when a user asks for
something silly like PROD, the performance is degraded for those ops
alone and not for anything else.

To the argument that one can do C&S above MPI, this does not permit me
to be efficient when general AMO support is available below MPI.

Is this acceptable?  I find the compromise that "common usage is fast
while silly usage is slow" to be particularly satisfying :-)

Best,

Jeff

On Fri, Mar 14, 2014 at 9:11 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
> In that case impls can add new info key for same_op_no_op_replace_hardware or something.
>
> SHMEM only needs same_op_no_op_replace as default but UPC appears to need SUM and XOR to be permitted at the same time to be efficient at the UPC runtime level.
>
> Jeff
>
> Sent from my iPhone
>
>> On Mar 14, 2014, at 8:37 AM, "Underwood, Keith D" <keith.d.underwood at intel.com> wrote:
>>
>> The problem here is that some existing hardware supports some atomic operations.  Multiply is frequently not on that list.  Doing an atomic add on a non-coherent NIC and a multiply somewhere else can be challenging to make correct, much less atomic.  Now, if "all bets are off" in the definition of not atomic (i.e. any interleaving of the two implied load-op-store sequencings is legal), then I would argue that the description you attribute to 2.2 is the better one.
>>
>>> -----Original Message-----
>>> From: mpiwg-rma [mailto:mpiwg-rma-bounces at lists.mpi-forum.org] On
>>> Behalf Of Balaji, Pavan
>>> Sent: Thursday, March 13, 2014 1:02 PM
>>> To: MPI WG Remote Memory Access working group
>>> Subject: Re: [mpiwg-rma] same_op_no_op
>>>
>>>
>>> MPI-2.2 says that accumulate with different ops are not atomic.
>>>
>>> MPI-3 says that accumulate with different ops are not allowed (since
>>> same_op_no_op is default).
>>>
>>> I think we screwed that up?
>>>
>>>  - Pavan
>>>
>>> On Mar 13, 2014, at 11:48 AM, Jeff Hammond <jeff.science at gmail.com>
>>> wrote:
>>>
>>>> It is extremely difficult to see that this is what the MPI-3 standard says.
>>>>
>>>> First we have this:
>>>>
>>>> "The outcome of concurrent accumulate operations to the same location
>>>> with the same predefined datatype is as if the accumulates were done
>>>> at that location in some serial order. Additional restrictions on the
>>>> operation apply; see the info key accumulate_ops in Section 11.2.1.
>>>> Concurrent accumulate operations with different origin and target
>>>> pairs are not ordered. Thus, there is no guarantee that the entire
>>>> call to an accumulate operation is executed atomically. The effect of
>>>> this lack of atomicity is limited: The previous correctness conditions
>>>> imply that a location updated by a call to an accumulate operation
>>>> cannot be accessed by a load or an RMA call other than accumulate
>>>> until the accumulate operation has completed (at the target).
>>>> Different interleavings can lead to different results only to the
>>>> extent that computer arithmetics are not truly associative or
>>>> commutative. The outcome of accumulate operations with overlapping
>>>> types of different sizes or target displacements is undefined."
>>>> [11.7.1 Atomicity]
>>>>
>>>> Then we have this:
>>>>
>>>> "accumulate_ops - if set to same_op, the implementation will assume
>>>> that all concurrent accumulate calls to the same target address will
>>>> use the same operation. If set to same_op_no_op, then the
>>>> implementation will assume that all concurrent accumulate calls to the
>>>> same target address will use the same operation or MPI_NO_OP. This can
>>>> eliminate the need to protect access for certain operation types where
>>>> the hardware can guarantee atomicity. The default is same_op_no_op."
>>>> [11.2.1 Window Creation]
>>>>
>>>> I was not aware that the definition of info keys was normative, given
>>>> that implementations are free to ignore them.  Even if info key text
>>>> is normative, one has to infer from the fact that same_op_no_op is the
>>>> default info behavior - and thus RMA semantic - that accumulate
>>>> atomicity is restricted to the case where one uses the same op or noop
>>>> but not replace.
>>>>
>>>> The MPI-2.2 spec is unambiguous because it explicitly requires the
>>>> same operation in 11.7.1 Atomicity.  This text was removed in MPI-3.0
>>>> in favor of the info key text.
>>>>
>>>> Best,
>>>>
>>>> Jeff
>>>>
>>>>> On Tue, Mar 11, 2014 at 12:04 AM, Balaji, Pavan <balaji at anl.gov> wrote:
>>>>>
>>>>> MPI-2 defines atomicity only for the same operation, not any operation
>>> for MPI_ACCUMULATE.
>>>>>
>>>>> - Pavan
>>>>>
>>>>> On Mar 10, 2014, at 11:22 PM, Jeff Hammond <jeff.science at gmail.com>
>>> wrote:
>>>>>
>>>>>> So MPI-2 denied compatibility between replace and not-replace?
>>>>>>
>>>>>> Jeff
>>>>>>
>>>>>> Sent from my iPhone
>>>>>>
>>>>>>> On Mar 11, 2014, at 12:06 AM, "Balaji, Pavan" <balaji at anl.gov> wrote:
>>>>>>>
>>>>>>>
>>>>>>> It doesn't break backward compatibility.  The info argument is still
>>> useful when you don't want to use replace.  I don't see anything wrong with
>>> it.
>>>>>>>
>>>>>>>> On Mar 10, 2014, at 11:01 PM, Jeff Hammond
>>> <jeff.science at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Does this or does this not break BW compatibility w.r.t. MPI-2.2
>>>>>>>> and did we do it intentionally?  Unless we did so intentionally
>>>>>>>> and explicitly, I will argue that the WG screwed up and the info
>>>>>>>> key+val is invalid.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>>> On Mon, Mar 10, 2014 at 11:03 PM, Balaji, Pavan <balaji at anl.gov>
>>> wrote:
>>>>>>>>>
>>>>>>>>> If a hardware can implement MPI_SUM, it should be able to
>>> implement MPI_SUM with 0 as well.
>>>>>>>>>
>>>>>>>>> But that's not a generic solution.
>>>>>>>>>
>>>>>>>>> Jeff: at some point you were planning to bring in a ticket which does
>>> more combinations of operations than just same_op and no_op.  Maybe it's
>>> worthwhile bringing that up again?
>>>>>>>>>
>>>>>>>>> - Pavan
>>>>>>>>>
>>>>>>>>>> On Mar 10, 2014, at 9:26 PM, Jim Dinan <james.dinan at gmail.com>
>>> wrote:
>>>>>>>>>>
>>>>>>>>>> Maybe there's a loophole that I'm forgetting?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Mon, Mar 10, 2014 at 9:43 PM, Jeff Hammond
>>> <jeff.science at gmail.com> wrote:
>>>>>>>>>> How the hell can I do GA or SHMEM then? Roll my own mutexes
>>> and commit perf-suicide?
>>>>>>>>>>
>>>>>>>>>> Jeff
>>>>>>>>>>
>>>>>>>>>> Sent from my iPhone
>>>>>>>>>>
>>>>>>>>>>> On Mar 10, 2014, at 8:32 PM, Jim Dinan <james.dinan at gmail.com>
>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> You can't use replace and sum concurrently at a given target
>>> address.
>>>>>>>>>>>
>>>>>>>>>>> ~Jim.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Mar 10, 2014 at 4:30 PM, Jeff Hammond
>>> <jeff.science at gmail.com> wrote:
>>>>>>>>>>> Given the following, how do I use MPI_NO_OP, MPI_REPLACE
>>> and
>>>>>>>>>>> MPI_SUM in accumulate/atomic operations in a standard-
>>> compliant way?
>>>>>>>>>>>
>>>>>>>>>>> accumulate_ops - if set to same_op, the implementation will
>>>>>>>>>>> assume that all concurrent accumulate calls to the same target
>>>>>>>>>>> address will use the same operation. If set to same_op_no_op,
>>>>>>>>>>> then the implementation will assume that all concurrent
>>>>>>>>>>> accumulate calls to the same target address will use the same
>>>>>>>>>>> operation or MPI_NO_OP. This can eliminate the need to protect
>>>>>>>>>>> access for certain operation types where the hardware can
>>> guarantee atomicity. The default is same_op_no_op.
>>>>>>>>>>>
>>>>>>>>>>> We discuss this before and the resolution was not satisfying to
>>> me.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>>
>>>>>>>>>>> Jeff
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Jeff Hammond
>>>>>>>>>>> jeff.science at gmail.com
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> mpiwg-rma mailing list
>>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Hammond
>>>>>>>> jeff.science at gmail.com
>>>>>>>> _______________________________________________
>>>>>>>> mpiwg-rma mailing list
>>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> mpiwg-rma mailing list
>>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>> _______________________________________________
>>>>>> mpiwg-rma mailing list
>>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>>
>>>>> _______________________________________________
>>>>> mpiwg-rma mailing list
>>>>> mpiwg-rma at lists.mpi-forum.org
>>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>>
>>>>
>>>>
>>>> --
>>>> Jeff Hammond
>>>> jeff.science at gmail.com
>>>> _______________________________________________
>>>> mpiwg-rma mailing list
>>>> mpiwg-rma at lists.mpi-forum.org
>>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>>>
>>> _______________________________________________
>>> mpiwg-rma mailing list
>>> mpiwg-rma at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma
>> _______________________________________________
>> mpiwg-rma mailing list
>> mpiwg-rma at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-rma



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the mpiwg-rma mailing list