[Mpi3-hybridpm] Ticket 217 proposal document updated

Mon Jan 30 17:28:00 CST 2012

Hi Doug,

Thanks for the quick response -- and for correcting my poor OpenMP.  I 
think that DIM was just meant to have the amount of work not depend on 
COUNT.  Either way should be fine.

In terms of the info argument, I would prefer an example without it 
where the caption states that "nobreak" could be used.  I think we had 
this and it didn't work well at the Forum, so maybe it is logistically 
better to keep it.

If we can get rid of the barrier, I think the example would be clearer.

  ~Jim.

On 1/30/12 2:55 PM, Douglas Miller wrote:
> Yes, the idea about the for loop at the bottom was that a parallel copy
> is faster. Yes, it eliminates the need for any additional
> synchronization. I guess it's about lines of code, I guess it could be
> "memcpy(...); #pragma omp barrier" but it seems like this way provokes
> more thinking about parallelism.
>
> Re: "i" being private: in the "#pragma omp for" construct the
> loop-control variable is implicitly made private by OpenMP.
>
> Re: "oldval = newval" having races: there is only one value for
> "newval", so each thread will redundantly copy the same value over the
> first thread's copy. the value is safe since no thread can update newval
> until all threads have completed the for loop. No thread can test the
> value(s) until all threads have completed not only the MPI_Team_leave
> but also the "for" loop that copies the array. Making them private
> complicates the example by requiring more synchronization in order to
> share/compute the single value between all threads.
>
> Re: remove INFO "nobreak": my take from previous readings is that this
> was needed in the example. I am reluctant to remove it and face another
> set-back.
>
>
>
> Q: What is "DIM" in the work for-loop? should that be "count" (COUNT) -
> the size of the sendbuf array? I've changed it to COUNT and ran a
> compile. Aside from no declaration for "i" it was fine. I added "i"
> inside the while loop, which eliminates concerns about a problem if shared.
>
> Here's what I currently have for the example:
>
> MPI_Team team;
> MPI_Info info;
> double oldval = 0.0, newval = 9.9e99;
> double tolerance = 1.0e-6;
> double sendbuf[COUNT] = { 0.0 };
> double recvbuf[COUNT] = { 0.0 };
> MPI_Info_create(&info);
> MPI_Info_set(info, "nobreak", "true");
> MPI_Team_create(omp_get_thread_limit(), info, &team);
> MPI_Info_free(&info);
> #pragma omp parallel num_threads(omp_get_thread_limit())
> {
> while (abs(newval - oldval) > tolerance) {
> double myval = 0.0;
> int i;
> oldval = newval;
>
> # pragma omp for
> for (i = 0; i < COUNT; i++) {
> myval += do_work(i, sendbuf);
> }
>
> # pragma omp critical
> {
> newval += myval;
> }
>
> MPI_Team_join(omp_get_num_threads(), team);
> /* this barrier is not required but helps ensure
> * all threads arrive before the MPI_Allreduce begins */
> # pragma omp barrier
> # pragma omp master
> {
> MPI_Allreduce(sendbuf, recvbuf, COUNT, MPI_DOUBLE,
> MPI_SUM, MPI_COMM_WORLD);
> }
> /* The remaining threads directly go to MPI_Team_leave */
> MPI_Team_leave(team);
>
> # pragma omp for
> for (i = 0; i < COUNT; i++) {
> sendbuf[i] = recvbuf[i];
> }
> }
> }
> MPI_Team_free(&team);
>
> _______________________________________________
> Douglas Miller BlueGene Messaging Development
> IBM Corp., Rochester, MN USA Bldg 030-2 A401
> dougmill at us.ibm.com Douglas Miller/Rochester/IBM
>
> Inactive hide details for Jim Dinan ---01/30/2012 01:58:19 PM---Jim
> Dinan <dinan at mcs.anl.gov>Jim Dinan ---01/30/2012 01:58:19 PM---Jim Dinan
> <dinan at mcs.anl.gov>
>
>     *Jim Dinan <dinan at mcs.anl.gov>*
>     Sent by: mpi3-hybridpm-bounces at lists.mpi-forum.org
>
>     01/30/2012 01:56 PM
>     Please respond to
>     mpi3-hybridpm at lists.mpi-forum.org
>
> 	
>
> To
> 	
>
>     mpi3-hybridpm at lists.mpi-forum.org,
>
>
> cc
> 	
>
> Subject
> 	
>
>     Re: [Mpi3-hybridpm] Ticket 217 proposal document updated
>
> 	
>
>
> Hi Doug,
>
> I think we can probably get rid of the barrier, it seems to suggest a
> stronger requirement (sync-first) for helper threads. MPI could
> implement this behavior internally in leave if it's required by the
> implementation.
>
> This copy code also seems a bit verbose to me:
>
> # pragma omp for
> for (i = 0; i < count; i++) {
> sendbuf[i] = recvbuf[i];
> }
>
> Is the idea that this would be faster than a serial memcpy? We could
> simplify the example by just calling memcpy. I guess we would need to
> add a barrier, though.
>
> "i" needs to be made thread private (declare in parallel region?).
>
> "double myval = 0.0" should be moved to the top of the block for C89
> compliance.
>
> oldval and newval are shared, "oldval = newval" will have
> race/consistency problems. Should these also be thread private?
>
> "nobreak" adds a lot of lines of code and not a lot of information to
> the example. I think it would be better to point out in a caption that
> you could pass the "nobreak" info key and shorten the example by leaving
> out all the Info wrangling code.
>
> Suggest s/count/COUNT/ so that it looks like a constant in the array
> declarations.
>
> Have you tried compiling/running this with no-op/barrier implementations
> of the helper threads functions?
>
> ~Jim.
>
> On 1/30/12 1:32 PM, Douglas Miller wrote:
>  > The "#pragma omp for" does a barrier at the end of the block (since we
>  > did not specify "nowait"), so I don't think it is required.
>  >
>  >
>  > _______________________________________________
>  > Douglas Miller BlueGene Messaging Development
>  > IBM Corp., Rochester, MN USA Bldg 030-2 A401
>  > dougmill at us.ibm.com Douglas Miller/Rochester/IBM
>  >
>  > Inactive hide details for Jim Dinan ---01/30/2012 12:44:47 PM---Jim
>  > Dinan <dinan at mcs.anl.gov>Jim Dinan ---01/30/2012 12:44:47 PM---Jim Dinan
>  > <dinan at mcs.anl.gov>
>  >
>  > *Jim Dinan <dinan at mcs.anl.gov>*
>  > Sent by: mpi3-hybridpm-bounces at lists.mpi-forum.org
>  >
>  > 01/30/2012 12:38 PM
>  > Please respond to
>  > mpi3-hybridpm at lists.mpi-forum.org
>  >
>  >
>  >
>  > To
>  >
>  >
>  > mpi3-hybridpm at lists.mpi-forum.org,
>  >
>  >
>  > cc
>  >
>  >
>  > Subject
>  >
>  >
>  > Re: [Mpi3-hybridpm] Ticket 217 proposal document updated
>  >
>  >
>  >
>  >
>  > Hi Doug,
>  >
>  > In Example 12.3, the "#omp barrier" should be required to ensure that
>  > all threads have finished updating sendbuf and reading recvbuf before
>  > the Allreduce.
>  >
>  > ~Jim.
>  >
>  > On 1/30/12 8:27 AM, Douglas Miller wrote:
>  > > I've updated the PDF in Ticket 217 based on latest comments. Please
>  > > review the text and send me comments. This is an attempt to make the
>  > > concept easier to follow by describing in terms of a "team epoch".
>  > >
>  > > thanks,
>  > > _______________________________________________
>  > > Douglas Miller BlueGene Messaging Development
>  > > IBM Corp., Rochester, MN USA Bldg 030-2 A401
>  > > dougmill at us.ibm.com Douglas Miller/Rochester/IBM
>  > >
>  > >
>  > > _______________________________________________
>  > > Mpi3-hybridpm mailing list
>  > > Mpi3-hybridpm at lists.mpi-forum.org
>  > > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>  > _______________________________________________
>  > Mpi3-hybridpm mailing list
>  > Mpi3-hybridpm at lists.mpi-forum.org
>  > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>  >
>  >
>  >
>  >
>  > _______________________________________________
>  > Mpi3-hybridpm mailing list
>  > Mpi3-hybridpm at lists.mpi-forum.org
>  > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm
>
>
>
>
> _______________________________________________
> Mpi3-hybridpm mailing list
> Mpi3-hybridpm at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-hybridpm