If the question you were asked as merely "more strict or more relaxed", then it needs more context from the conversation.  The specific semantics in question were:

1) Ordering:
	a) Ordered from a given source to a given address on a given target (unordered otherwise), or
	b) completely unordered
2) Granularity of access/atomicity:
	a) Two accesses from two sources to the same address appear in some (undefined) order, or
	b) two accesses from two sources to the same address yield undefined behavior

I argued strongly for 1a and 2a, since they correspond to the weakest UPC memory model, but Jeff (and others) argued that those were not necessary and imposed unacceptable performance penalties.  Our current trend is toward 1b and 2b.  When Jeff reported back to us that you were fine with the latter two, I must admit that I was rather surprised given that I assumed that Chapel's weakest memory model was on par with UPCs.


Hi MPI-3 RMA team --

I ran into Jeff Hammond at a workshop a few weeks back and we had a brief chat about whether, as a potential client of MPI-3 RMA, I would prefer its semantics to err more on the strict or relaxed side.  He requested that I consider sending a brief note to this group with my thoughts, so this is that note.  I hope that this opinion will be considered useful and not out-of-turn given how little time I've had to invest in following the work of the MPI-3 team.

I should start with the disclaimer that I'm not an expert on memory consistency models -- I probably know more than the average programmer, but have typically been insulated from worrying about it in a great amount of detail, either by relying on other software layers or languages to take care of it for me or by having the fortune to work with codes and idioms that don't fall afoul of the differences.

My gut response to the question is that I'd prefer things to be on the more relaxed side.  I think one of the key benefits of single-sided communication is its separation of data transfer from synchronization. 
I'd worry that by trying to enforce too much strictness in the RMA interface, it would work break down this separation and result in performance overheads that couldn't be recouped.

On the other hand, if MPI-3 exported a model that was more relaxed than a particular programmer/programming model wanted, my assumption is that they could increase the strictness by doing more manual synchronization/memory fences/etc. themselves.  That is, a relaxed model would not seem to exclude strictness while a strict model may impact performance negatively without any recourse.  If that's a correct interpretation, the relaxed approach seems like the one to take to me.

I'm reluctant to speak for others, but wanted to note (if he hasn't already done so) that Dave Grove from IBM's X10 team was with us and seemed to agree with this point-of-view (though perhaps we were both simply falling prey to Jeff's subliminal hypnosis? :).  All that said, owing to my lack of depth in this area, I would say that if the GASNet team and/or the UPC/Titanium teams who built on top of GASNet felt that this was clearly the wrong approach, I would tend to cast my vote with them since I think they've studied this issue in far more detail than most parallel language groups, ours included.  (I do think that Kathy Yelick voiced a compatible opinion in another context at this same workshop, which gave me some reassurance that relaxed was the way to go, but again, these were fairly high-level conversations.  More generally, I would encourage you to get input from the GASNet team as you consider this issue and others related to 1-sided communication if you haven't).

If you think it would be useful for me to hear the other side of the debate and/or consider some specific case examples in more detail, I'd be happy to do so as time permits.

Have a good weekend,

