[Mpi3-rma] Updated Proposal 1

Fri Nov 26 16:47:45 CST 2010

Jeff,
> > 2) Keith and Brian, could you please elaborate on the arguments against
> > allowing multiple elements (count>1) in MPI_Get_accumulate? I remember
> > there was some discussion about buffering and failures if one wanted to
> > support it in hardware but I don't remember what the issues were. It
> > seems like one could simply pipeline the hardware operations or just
> > fall back to a software implementation if count is bigger than a certain
> > threshold.
> 
> With pipelining, I assume atomicity is only per element?  I do not see
> how you could realize anything else in hardware.  I don't see any
> value in multi-element RMW if it is element-wise atomic.  One could
> just as easily send multiple messages.
Correct, the current RMA doesn't offer more than element-wise atomicity
and we don't plan to extend this. You are right, the only benefit would
lie in reduced overheads which isn't all that much. I'm not sure if
full-message atomicity would be easy/fast to implement.

> What is the use case multi-element RMW anyways?  I'm definitely going
> to need message-atomic put, but I'll do the portable i.e. MPI version
> with RMW-lock + put + RMW-unlock.
That was the use-case. Manual lock/unlock (+flushes etc.) as you
proposed is a valid workaround but seems to impose additional roundtrips
(you'd have to poll on CAS).

All the Best,
  Torsten

-- 
 bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
Torsten Hoefler         | Performance Modeling and Simulation Lead
Blue Waters Directorate | University of Illinois (UIUC)
1205 W Clark Street     | Urbana, IL, 61801
NCSA Building           | +01 (217) 244-7736