<div dir="ltr">Hi All,<div><br></div><div>+1, this is the most successful FT presentation that we've had so far, and we got lots of positive feedback. Here are few notes I took during the FT plenary, primarily items that were discussed or questions that were asked:<br>
</div><div><br></div><div><span style="font-size:13px;font-family:Arial">* Do we need a new error code in place of MPI_ERR_PENDING?</span><br>
<span style="font-size:13px;font-family:Arial"> * Right now, if there is an ANY_SOURCE receive in a request array passed to e.g. MPI_Waitall, you need to scan the list of requests to see if they are all MPI_ERR_PENDING in order to determine if a process failure may have occurred.</span></div>
<div><font face="Arial"><br></font>
<span style="font-size:13px;font-family:Arial">* MPI_Comm_shrink should specify the process ordering (may already be covered in the spec)</span><br>
<span style="font-size:13px;font-family:Arial"> * Should there be a key argument to MPI_Comm_shrink to allow the user to specify ordering? (Martin)</span><br>
<span style="font-size:13px;font-family:Arial"><br></span></div><div><span style="font-size:13px;font-family:Arial">* Can we query whether a communicator has been revoked? Perhaps through a communicator attribute? (Jim)</span><br>
<span style="font-size:13px;font-family:Arial"><br></span></div><div><span style="font-size:13px;font-family:Arial">* Discussed getting the failed group uniformly at all processes</span><br>
<span style="font-size:13px;font-family:Arial"> * Two protocols, shrink and agree</span><br>
<span style="font-size:13px;font-family:Arial"> * Agree is faster when there are no intervening failures, otherwise shrink is faster</span><br>
<span style="font-size:13px;font-family:Arial"> * Might be worthwhile to add a function in the future to achieve this, (Jim)</span><br>
<span style="font-size:13px;font-family:Arial"><br></span></div><div><span style="font-size:13px;font-family:Arial">* Need to verify sane interaction between endpoints, init/finalize, and FT proposals (Martin)</span><br>
</div><div><span style="font-size:13px;font-family:Arial"><br></span></div><div><span style="font-size:13px;font-family:Arial"> ~Jim.</span></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Dec 13, 2013 at 12:01 PM, Wesley Bland <span dir="ltr"><<a href="mailto:wbland@mcs.anl.gov" target="_blank">wbland@mcs.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word">Now that the forum meeting has finished, I wanted to send a wrap-up email about how things went for those who couldn’t be there and to continue the discussion with those who were.<div>
<br></div><div>We had a productive meeting on Monday and Tuesday within the working group where we discussed some of the concerns raised by some of our outside collaborators. I won’t go into all of the details as those were captured in the wiki page (<a href="https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/2013-12-10" target="_blank">https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/2013-12-10</a>), but the general result was that we seem to have addressed all of the concerns that have been raised so far. The biggest challenge is to continue to have discussions with people and evangelize the proposal.</div>
<div><br></div><div>On Thursday, we had a plenary time with the full forum where we presented the latest version of the proposal. There were minor changes since the last time this was presented to the forum, but there was one more important change to MPI_COMM_AGREE that provides new functionality. Again, I won’t go into detail about how it works as that was covered in the talk and text, but it does solve one of the use cases raised by Rich that some users want to be able to continue using a communicator without revoking and reordering ranks. Now it is possible to use MPI_COMM_AGREE as a transactional style function to periodically agree on the remaining processes. The slides for the talks given by Aurelien and me should be posted on the web site soon (<a href="http://meetings.mpi-forum.org/secretary/2013/12/slides.php" target="_blank">http://meetings.mpi-forum.org/secretary/2013/12/slides.php</a>).</div>
<div><br></div><div>The reaction from the forum was quite positive. There were plenty of questions, but from what we could tell, it seems like most attendees were largely receptive to the current version. The major contributing factors to this that we heard from the people we talked to at the end of the plenary were that they like the ability to “turn off” FT for systems where it is not needed (smaller scale, reliable hardware, etc.) and we also provided more concrete examples of how to use the proposal. There had been concern about the performance impact of this proposal on systems where it was not needed, but the ability to compile it out should make that better. Many people said they still need to take this back to their users now that they have a better understanding of what’s going on in the proposal. We’ll hopefully hear back from them before March if there are concerns on their end. I don’t think there were any major issues with any of the technical content of the presentation.</div>
<div><br></div><div>Our current plan is still to bring this for a reading at the next meeting in March in San Jose and pursue votes at the following two meetings. One of the most requested things to show at that meeting is to have performance numbers, so we will try to have something ready by then. These will be easier if we have some application partners that we can use to generate these numbers so if you have some “real” apps that you can run with ULFM (even if it’s failure-free runs), that would be very helpful. The other thing we can all do is to talk to our collaborators and see if there are any concerns that they didn’t raise during the full meeting that might hinder passing the proposal</div>
<div><br></div><div>Thanks for all of your work!</div><span class="HOEnZb"><font color="#888888"><div>Wesley</div></font></span></div><br>_______________________________________________<br>
mpiwg-ft mailing list<br>
<a href="mailto:mpiwg-ft@lists.mpi-forum.org">mpiwg-ft@lists.mpi-forum.org</a><br>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-ft</a><br></blockquote></div><br></div>