[Mpi3-rma] mpi3-rma post from bradc at cray.com requires approval
Underwood, Keith D
keith.d.underwood at intel.com
Sat Jun 5 10:54:30 CDT 2010
> On 05/30/2010 11:27 AM, Underwood, Keith D wrote:
> > You are right, I should have chosen my words more carefully. I
> believe we tossed ordering on MPI_Put when we tossed atomicity.
> > I don't think we ever got to the discussion of what ordering would be
> for the atomics. And, this is where life gets weird. I'm not positive
> that you need ordering for atomics, except that they are the MPI_Puts
> you would actually use when you needed ordering. So... I will go back
> to... all UPC needs is ordering from one source to one target address.
> I see three options:
> > 1) Just define accumulates/get_accumulates to have the ordering that
> UPC needs
> > 2) Default to the UPC ordering and allow an assert to relax that
> restriction. This would be consistent with how the current one-sided
> operations handle locks
> > 3) Add some other call to handle ordering, but we wouldn't want some
> call that you had to call with EVERY put or anything... After all, all
> we need is relatively minimal ordering.
> There's an option (4) similar to (2):
> 4) Default to unordered, and allow an assert to add ordering.
> I believe we decided to use either (2) or (4) in the last meeting, but
> didn't finalize on which one.
> To me, (2) seems more logical since MPI's semantics have typically
> use a conservative approach by default (ordered in this case), and
> the application to assert that it can use a more relaxed mode as a
> performance optimization (unordered).
Yeah, (4) exists, but I didn't list it because, as you observe, (2) is the analogue of it and is "the MPI way". That is, implement the conservative behavior and relax it through an assertion. This is exactly what happens with locks. It is also consistent with the MPI two-sided ordering model.
We would need to think about whether we have to have the whole message ordered or ordered on a per target address basis. For UPC, it is sufficient to order on a per-target-address basis; however, there have been many Cray SHMEM programs written that assume at least "last byte ordering". That is, the last byte in the Put() is guaranteed to arrive after all of the other bytes (the ones in the middle can arrive in any order). You can tell those programs exist because Quadrics gave this in their implementation ;-) Such an ordering model has an advantage for target side completion detection (i.e. you can poll the last byte), but is probably not a requirement.
Of course, I guess you could full message ordering be the default and have two back offs (one to ordered by address and one to unordered), but that seems like overkill.
More information about the mpiwg-rma