[mpiwg-ft] 2013-12 MPI Forum Wrapup
wbland at mcs.anl.gov
Fri Dec 13 11:01:07 CST 2013
Now that the forum meeting has finished, I wanted to send a wrap-up email about how things went for those who couldn’t be there and to continue the discussion with those who were.
We had a productive meeting on Monday and Tuesday within the working group where we discussed some of the concerns raised by some of our outside collaborators. I won’t go into all of the details as those were captured in the wiki page (https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/2013-12-10), but the general result was that we seem to have addressed all of the concerns that have been raised so far. The biggest challenge is to continue to have discussions with people and evangelize the proposal.
On Thursday, we had a plenary time with the full forum where we presented the latest version of the proposal. There were minor changes since the last time this was presented to the forum, but there was one more important change to MPI_COMM_AGREE that provides new functionality. Again, I won’t go into detail about how it works as that was covered in the talk and text, but it does solve one of the use cases raised by Rich that some users want to be able to continue using a communicator without revoking and reordering ranks. Now it is possible to use MPI_COMM_AGREE as a transactional style function to periodically agree on the remaining processes. The slides for the talks given by Aurelien and me should be posted on the web site soon (http://meetings.mpi-forum.org/secretary/2013/12/slides.php).
The reaction from the forum was quite positive. There were plenty of questions, but from what we could tell, it seems like most attendees were largely receptive to the current version. The major contributing factors to this that we heard from the people we talked to at the end of the plenary were that they like the ability to “turn off” FT for systems where it is not needed (smaller scale, reliable hardware, etc.) and we also provided more concrete examples of how to use the proposal. There had been concern about the performance impact of this proposal on systems where it was not needed, but the ability to compile it out should make that better. Many people said they still need to take this back to their users now that they have a better understanding of what’s going on in the proposal. We’ll hopefully hear back from them before March if there are concerns on their end. I don’t think there were any major issues with any of the technical content of the presentation.
Our current plan is still to bring this for a reading at the next meeting in March in San Jose and pursue votes at the following two meetings. One of the most requested things to show at that meeting is to have performance numbers, so we will try to have something ready by then. These will be easier if we have some application partners that we can use to generate these numbers so if you have some “real” apps that you can run with ULFM (even if it’s failure-free runs), that would be very helpful. The other thing we can all do is to talk to our collaborators and see if there are any concerns that they didn’t raise during the full meeting that might hinder passing the proposal
Thanks for all of your work!
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the mpiwg-ft