<div class="gmail_extra">On Sat, Dec 15, 2012 at 8:08 AM, Torsten Hoefler <span dir="ltr"><<a href="mailto:htor@illinois.edu" target="_blank">htor@illinois.edu</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div id=":2fl">>    Those use cases ([3]<a href="http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php" target="_blank">http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php</a>)<br>

<div class="im">>    were all dependent on being able to reduce to overlapping targets.<br>

</div>Depends on your definition of target.  If you mean processes by<br>

"targets", then the current interface proposal provides this; if you<br>

mean memory locations at one process by "targets", then this will not be<br>

possible within current MPI semantics.<br></div></blockquote><div><br></div><div>I mean that the memory overlaps on the processor accumulating the result of the reduction. Think of a bunch of subdomains of a regular grid with one or two cells of overlap. An example of a "reduction" is to add up the contribution from all copies of each given cell. Cells near the middle of a "face" are only shared by two processes, but corner cells are shared by several processes.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2fl">

<div class="im"><br>

>    As for defining "identity", the operation I would like is to reduce by<br>

>    combining with a local buffer (usually in-place destination buffer). That<br>

>    is, if I have the local buffer<br>

>    mine = [1.1, 2.1, 3.1, 4.1, 5.1, 6.1]<br>

</div>This can be expressed as a self-edge (we can discuss about in-place<br>

arguments, but then you would need to guarantee that the local buffer is<br>

larger than the largest neighbor buffer).<br></div></blockquote><div><br></div><div>Useful application semantics would require the same.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<div id=":2fl">

<div class="im"><br>

>    and vector types for my two neighbors (defined by me)<br>

>    incoming_type[0] = [0, 3, 4]<br>

>    incoming_type[1] = [1, 4]<br>

>    with incoming data (actually getting sent)<br>

>    incoming_data[0] = [10.2, 20.2, 30.2]<br>

>    incoming_data[1] = [100.3, 200.3]<br>

>    the result would be<br>

>    [op(1.1, 10.2), op(2.1, 100.3), 3.1, op(4.1, 20.2), op(5.1, 30.2, 200.3),<br>

>    6.1]<br>

>    This would be a natural expression of the operation I call "SFReduce" in<br>

</div>>    [4]<a href="http://59A2.org/files/StarForest.pdf" target="_blank">http://59A2.org/files/StarForest.pdf</a><br>

I see, this may be doable with the vector interface (if we remove the<br>

restriction of equal vector sizes -- this would remove some optimization<br>

opportunities). Can you confirm that the current proposed<br>

neighbor_reducev() interface can cover this case?<br></div></blockquote><div><br></div><div>If you remove the restriction of equal vector sizes, are you going to add an MPI_Datatype describing where to put the result? (I'd expect that to be a neighbor_reducew.) Note that in general, there would be some points shared by neighbors {1,2} and other points shared by neighbors {1,3} (and {2,3}, ...) thus we can't just sort such that the reduction is always applied to the "first N" elements.</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2fl">

<br>

One remaining question is if you can always guarantee "packed" data,<br>

i.e., that the "empty" elements are always at the tail. Well, I guess<br>

you could always add identity elements in the middle to create gaps.<br></div></blockquote><div><br></div><div>I could pack, but I thought the point of the W interfaces was to enable the user to avoid packing (with possible performance advantages relative to fully-portable user code).</div>

<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2fl">

<br>

Also, would the static communication topology work for your use-cases<br>

(neighborhoods don't change for a while).<br></div></blockquote><div><br></div><div>Yes, comm topology is typically static, and we wouldn't use the neighborhood routines if it was changing frequently.</div><div><br>

</div></div></div>