<div class="gmail_extra">On Sat, Dec 15, 2012 at 8:08 AM, Torsten Hoefler <span dir="ltr"><<a href="mailto:htor@illinois.edu" target="_blank">htor@illinois.edu</a>></span> wrote:<br><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=":2fl">> Those use cases ([3]<a href="http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php" target="_blank">http://lists.mpi-forum.org/mpi3-coll/2011/11/0239.php</a>)<br>
<div class="im">> were all dependent on being able to reduce to overlapping targets.<br>
</div>Depends on your definition of target. If you mean processes by<br>
"targets", then the current interface proposal provides this; if you<br>
mean memory locations at one process by "targets", then this will not be<br>
possible within current MPI semantics.<br></div></blockquote><div><br></div><div>I mean that the memory overlaps on the processor accumulating the result of the reduction. Think of a bunch of subdomains of a regular grid with one or two cells of overlap. An example of a "reduction" is to add up the contribution from all copies of each given cell. Cells near the middle of a "face" are only shared by two processes, but corner cells are shared by several processes.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2fl">
<div class="im"><br>
> As for defining "identity", the operation I would like is to reduce by<br>
> combining with a local buffer (usually in-place destination buffer). That<br>
> is, if I have the local buffer<br>
> mine = [1.1, 2.1, 3.1, 4.1, 5.1, 6.1]<br>
</div>This can be expressed as a self-edge (we can discuss about in-place<br>
arguments, but then you would need to guarantee that the local buffer is<br>
larger than the largest neighbor buffer).<br></div></blockquote><div><br></div><div>Useful application semantics would require the same.</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=":2fl">
<div class="im"><br>
> and vector types for my two neighbors (defined by me)<br>
> incoming_type[0] = [0, 3, 4]<br>
> incoming_type[1] = [1, 4]<br>
> with incoming data (actually getting sent)<br>
> incoming_data[0] = [10.2, 20.2, 30.2]<br>
> incoming_data[1] = [100.3, 200.3]<br>
> the result would be<br>
> [op(1.1, 10.2), op(2.1, 100.3), 3.1, op(4.1, 20.2), op(5.1, 30.2, 200.3),<br>
> 6.1]<br>
> This would be a natural expression of the operation I call "SFReduce" in<br>
</div>> [4]<a href="http://59A2.org/files/StarForest.pdf" target="_blank">http://59A2.org/files/StarForest.pdf</a><br>
I see, this may be doable with the vector interface (if we remove the<br>
restriction of equal vector sizes -- this would remove some optimization<br>
opportunities). Can you confirm that the current proposed<br>
neighbor_reducev() interface can cover this case?<br></div></blockquote><div><br></div><div>If you remove the restriction of equal vector sizes, are you going to add an MPI_Datatype describing where to put the result? (I'd expect that to be a neighbor_reducew.) Note that in general, there would be some points shared by neighbors {1,2} and other points shared by neighbors {1,3} (and {2,3}, ...) thus we can't just sort such that the reduction is always applied to the "first N" elements.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2fl">
<br>
One remaining question is if you can always guarantee "packed" data,<br>
i.e., that the "empty" elements are always at the tail. Well, I guess<br>
you could always add identity elements in the middle to create gaps.<br></div></blockquote><div><br></div><div>I could pack, but I thought the point of the W interfaces was to enable the user to avoid packing (with possible performance advantages relative to fully-portable user code).</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2fl">
<br>
Also, would the static communication topology work for your use-cases<br>
(neighborhoods don't change for a while).<br></div></blockquote><div><br></div><div>Yes, comm topology is typically static, and we wouldn't use the neighborhood routines if it was changing frequently.</div><div><br>
</div></div></div>