[Mpi3-rma] Dynamic windows example

James Dinan dinan at mcs.anl.gov
Sun Dec 5 22:19:41 CST 2010


Hi Torsten,

Thanks for your comments!

>> In the process of trying to gain a deeper understanding of the
>> dynamic windows proposal I wrote a small distributed linked list
>> example.  I've attached the code in case this may be helpful for
>> others.  Please feel free to dissect, modify, post questions,
>> complaints, etc.
>
> Thanks for sharing your code. It's a good example for a dynamic window
> and the use of flush. Did you consider using MPI_Compare_and_swap() for
> the pointer chase? This would require another MPI_Get() in case you
> didn't get the tail but would allow for much more concurrency than the
> lock exclusive version (each lock/unlock incurs one roundtrip latency).

I had initially done a CAS on the rank followed by a put on the
displacement, but the code was a bit messy so I went for the simpler
Get-Flush-Put option.  When chasing the tail, I ended up having to poll
on the second component of the pointer in case I saw the result of the
CAS, but not yet the result of the put.  Looking at it again though, it
seems like you ought to be able to just define the pointer with the same
type for both elements:

typedef struct { MPI_Aint proc, disp; } llist_ptr_t;

and accomplish this is one CAS operation.  (You might have to do a hack
like set the value of nil.proc to 0 since MPI_Aint is unsigned and
increment all process ids so that proc has a proper nil value in case
two CAS operations overlap.)  If you did this with shared locks, it
seems like you will still have the polling problem since CAS is
element-wise atomic.  If you use exclusive locks you won't get as much
concurrency, but it's a single one-sided CAS operation instead of
Get-Flush-Put.

> Using MPI_BYTE in the broadcast is also not working in heterogeneous
> environments (where you should also compare the sizes of MPI_Aint).

Will dynamic windows be usable on such systems if processes don't agree
on the size of MPI_Aint?

Best,
 ~Jim.



More information about the mpiwg-rma mailing list