[Mpi3-rma] MPI-3.1 RMA planning
balaji at mcs.anl.gov
Thu Jul 26 12:27:39 CDT 2012
On 07/26/2012 11:31 AM, Torsten Hoefler wrote:
> On Thu, Jul 26, 2012 at 11:18:54AM -0500, Pavan Balaji wrote:
>> On 07/26/2012 08:52 AM, Pavan Balaji wrote:
>>>> I would like to add some info arguments to the list of things to
>>>> consider. I don't have the full list at this point, but we have the
>>>> "same_size" argument for create and allocate. However, we have no
>>>> "same_displ_unit", which goes by the same rationale. We could also add
>>>> some info arguments to dynamic windows to mitigate some of the
>>>> implementation issues (allow optimized implementations on RMA systems).
>>> Sounds good.
>> What does "same_size" mean for heterogeneous systems?
> The size is in bytes, it's clearly defined in the draft standard.
That was a rhetorical question to illustrate my point below :-).
>> If I give 100 * sizeof(int) on both sides, that will not be the same
>> size on a heterogeneous system. This might be worth a user advise.
> Well, this is a user error.
Correct. But it deserves a user or implementor advice, since it's
pretty hard for the implementor to catch these details.
>> 1. The standard currently states, "A put or accumulate must not access a
>> target window once a load/store update or a put ... ". The term
>> "load/store update" doesn't make sense. This has caused a major
>> confusion even to the RMA working group, for example on the slide that
>> we put together for compatibility between load, store, put, get,
>> accumulate, etc. It should really say "load/store accesses" -- that is,
>> simultaneous loads and PUTs are not allowed in the SEPARATE model.
> Ah, I thought we cleaned all of those things up - one more straggler,
> probably a ticket 0 (will not do it now since we're supposed to be
We can clean it up in 3.1.
>> 2. The disp_unit is a weird semantic which is really meant to
>> demonstrate what datatype I will be using. We are jumping through hoops
>> to get the same_size and same_disp_unit measures which make little sense
>> on heterogeneous systems. The correct way to do this would have been to
>> not take a disp_unit parameter at all, and instead take a MPI_Datatype
>> parameter. In this case, two different processes can give MPI_INT but
>> have different type sizes. That adds better safety checks in MPI.
> Maybe, but for some reasons, RMA windows are completely specified with
> bytes. I am not sure what the reason for this was.
>> Unfortunately, all our window creation routines are screwed up in this
>> manner. We should consider adding MPI_Win_create_type,
>> MPI_Win_allocate_type, MPI_Win_allocate_shared_type in MPI-3.1 and
>> deprecating the older routines. It would have been much better to do
>> this in 3.0, but it's not a small change.
> Ugs, this makes me shiver. I agree in principle, but it's ugly.
This is just a starting point for discussion. I was just trying to
illustrate the problem and give a starting recommendation for
discussion. The underlying issue is that using bytes directly is a bad
model in MPI, whether it is for send/recv or RMA.
More information about the mpiwg-rma