[mpi3-coll] Telecon Notes

Mon Feb 25 09:47:55 CST 2008

Dear coll-group,

some follow-up:

On Tue, Feb 12, 2008 at 05:11:37PM -0500, Torsten Hoefler wrote:
> c) Jesper has a paper about this on IPDPS

I have attached two papers. The first goes through some problems with
especially graph topologies. Main points: a) specification is useless,
because not scalable, b) there is too little information conveyed to
the library, c) there could be different optimization criteria. Remedies
are easy, deprecate old topology creation functions, make new scalable 
interface with weights and info-object. A quick and dirty implementation
of new interface can trivially be done on top of old one.

The second shows that even weights do not give enough information to
let an MPI library do what the application wants. More radical remedy:
take topologies out of MPI, but provide some functionality for getting
"system independent" topology information ("distance" between processes
and so on) for separate package.

I would propose setting up a topology subgroup for these issues. We can
discuss at the meeting, see if there's interest.

> 
> 4) IU goes off and proposes better ideas, Jesper wants to join
> 
very initial idea/question in separate mail to UI

> 5) Jesper writes proposal-like document
> 
a first draft is attached. I would like to propose MPI_Reduce_scatter_block
already for 2.2. About the new operators, I'm not even sure it is something
I want to propose for real. What do you think? Are there need for anything
else of this flavor?

> 6) contact mpi3-subsetting and see what they think
> 
> general things:
> - next meeting in March in Chicago
> - have proposals ready at leat one week before meeting in March
> - contact more application people (forward to Torsten)
> - contact Asian researchers (Jesper?)
will not do more than what has already been done

best regards

Jesper

> 
> The full recording (68:43) is available for private use (contact me).
> 
> Best,
>   Torsten
> 
> -- 
>  bash$ :(){ :|:&};: --------------------- http://www.unixer.de/ -----
> Indiana University    | http://www.indiana.edu
> Open Systems Lab      | http://osl.iu.edu/
> 150 S. Woodlawn Ave.  | Bloomington, IN, 474045-7104 | USA
> Lindley Hall Room 135 | +01 (812) 855-3608
> _______________________________________________
> mpi3-coll mailing list
> mpi3-coll at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-coll
-------------- next part --------------
New reduce-scatter function
---------------------------

MPI_Reduce_scatter_block(sendbuf, recvbuf, count, datatype, op, comm)

IN  sendbuf   satring address of send buffer (choice)
OUT recvbuf   starting address of receive buffer (choice)
IN  count     number of elements in send buffer
IN  datatype  data type of element of input and output buffer
IN  op        operation (handle)
IN  comm      communicator (handle)

MPI_Reduce_scatter_block is a regular, non-vector variant of
MPI_Reduce_scatter. It performs an element-wise reduction on vectors
of count*size(comm) elements stored in the send buffers of the
processes determined by sendbuf, datatype, and stores the
count element segment of the result starting at position i*count in
the receive buffer of process i defined by recvbuf, count and
datatype. All processes must contain count*size(comm) elements of datatype
in their send buffers. The "in-place" option is specified by passing
MPI_IN_PLACE as the value for sendbuf. In that case the input for the
reduction is taken from the receive buffer at that process.

[RATIONALE - not to be put in the standard: a non-vector variant seems
to have been missed in the MPI-1 specification. Useful for many MPI-internal
problems. Easy to implement, and can be more efficient than the general
MPI_Reduce_scatter]

New binary operators
--------------------

The following operators operate on the predefined (value, index) pair 
datatypes of MPI. For these operators the index component is used as a 
flag/mark.

MPI_SEGMENTED_SUM/PROD/MIN/MAX/.../BXOR
MPI_SELECTIVE_SUM/PROD/MIN/MAX/.../BXOR

MPI_ALL_MAX 
MPI_ALL_MIN

The operator MPI_SELECTIVE_SUM (likewise for MPI_SELECTIVE_PROD, ...)
is used to compute the sum only over the selected elements. An element
is selected by a 1 as the index of the pair, and should be 0 otherwise.
With MPI_Reduce, the operation computes for each element (x_i, ix_i)

\sum_{i=0}^{size(comm)}  [x_i if ix_i==1, 0 if ix_i==0]

as the value of the result. The index of the result is undefined.

The operator MPI_SEGMENTED_SUM (likewise for MPI_SEGMENTED_PROD, ...)
is used primarily for the MPI_SCAN and MPI_EXSCAN collectives (but may
be used and have meaning also for the other reduction collectives) to
compute all segmented sums of a sequence of elements. The start of a segment
is marked by setting the index of the pair to 1. Elements belonging to
a segment starting are marked by 0.

With MPI_Scan the operation computes for process i the value

\sum_{j=start}^{i} x_i

where start is the largest numbered process before i such with index
value 1.

The operators MPI_ALL_MAX and MPI_ALL_MIN compute maxima/minima over
pairs, respectively, and sets the index value of the result to 1 (0 otherwise)
if all processes contributed the same value.

The operators are defined by the following, binary associative functions
on (value,index) pairs: ... [TO BE DONE] ...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smpaware.pdf
Type: application/pdf
Size: 83963 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-coll/attachments/20080225/2c92089c/attachment-0002.pdf>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 41920293.pdf
Type: application/pdf
Size: 372444 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-coll/attachments/20080225/2c92089c/attachment-0003.pdf>