[Mpi3-rma] Proposal 1 discussion points for next telecon

Mon Oct 25 22:58:00 CDT 2010

Hi Pavan,
> This proposal doesn't have an atomic GET operation.
>
> MPI_ACCUMULATE with MPI_REPLACE is an atomic PUT.
>
> MPI_GET_ACCUMULATE with MPI_NO_OP does not work as an atomic GET as it  
> does not take more than 1 count, or non-predefined datatypes.
Correct, we had count>1 and ddts in an earlier version and changed it
after heated discussions about buffering. I forgot to add it to the
discussion items.  I have no huge problems to allow ddts and counts >1, 
however, I believe Brian and Keith were against it.

We should see if an advice to users/implementers that MPI_NO_OP is a
special case that doesn't require buffering and that all other
operations might be really really slow with large data.

Please consider this item 7 on the discussion list!

Thanks,
  Torsten

>  -- Pavan
>
> On 10/25/2010 05:15 PM, Torsten Hoefler wrote:
>> Hi RMA Working Group,
>>
>> Bill and I finished the edits on proposal 1 (see wiki).
>>
>> We have several points to discuss in the next telecon:
>>
>> 1) We should work on a more generic info mechanism which solves the info
>> attach, detach and query problems we mentioned (however, it should not
>> be limited to RMA/windows)
>>
>> 2) We should discuss the register/deregister interface. Right now it
>> has base and size. This has several issues. One possible solution would
>> be to return a handle that is needed to free the memory. This has issues
>> too. We put both choices into the current draft and should discuss.
>>
>> 3) CAS needs a query to determine if a datatype is hardware-optimized.
>> We think it should use RMA_Query for this and assume that anything that
>> returns the RMA unified model is hw optimized
>>
>> 4) Discuss difference between lock-free synchronization and lock/unlock.
>> The semantics of lock-free are different from a lock-all/shared epoch.
>> The main difference is that holding a "shared" lock means that the user
>> guarantees that there are no conflicting accesses (if there are, then he
>> can often "upgrade" to an exclusive lock). So this is consistent with
>> the literature on concurrent programming. If we now allow conflicting
>> accesses (remember that MPI-2 defines a "conflicting" access on a
>> per-window granularity) in lock-shared then this would break this
>> semantic property. One would also need to allow this for the single-lock
>> epochs and this could cost us several optimizations.
>>
>> 5) Do we need ordering semantics for any other synchronization mode? I
>> would like to keep it a bit separate for now (i.e., in the lockfree
>> synch mode), however, it would not be hard to specify it also for other
>> modes. Are there use-cases?
>>
>> 6) We added a sentence about the access granularity to 11.5.5 (lockfree
>> synchronization). This is because we have to allow conflicting accesses
>> (load/store + put/get etc.) in the same epoch on a window. The
>> granularity there is aligned with MPI's access granularity (i.e.,
>> datatype sizes). We avoided to mention anything in bytes or such.
>>
>> I uploaded the latest working version to the wiki at
>> https://svn.mpi-forum.org/trac/mpi-forum-web/attachment/wiki/mpi3-rma-proposal1/one-side-2.pdf
>>
>> Thanks a lot,
>>   Torsten&  Bill
>>
>
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>

-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Es ist leichter, einen Atomkern zu spalten als ein Vorurteil
                                           [Albert Einstein]