[mpiwg-ft] 2017 09 27 Meeting Notes
Bland, Wesley
wesley.bland at intel.com
Wed Sep 27 14:52:46 CDT 2017
Here's the notes from today's WG con call.
https://github.com/mpiwg-ft/ft-issues/wiki/2017-09-27
mpiwg-ft/ft-issues
Attendees
* Intel - Wesley
* Argonne - Ken, Yanfei
* UTK - Aurelien
* ORNL - Geoffroy
Error Handlers
* Wesley made edits based on the feedback from the face-to-face.
* There are still a couple of very minor edits that need to be made
Process Fault Tolerance
* Is it possible to use ULFM and Reinit at the same time?
* Not sure how they can be composed (even if the smaller communicator used ULFM) because the error handler for the larger communicator is still likely to be triggered after a process failure, which would trigger reinit.
* We don't think it's a problem to use error handlers, but if using MPI_ERRORS_REINIT, it would need to be consistent across all communicators.
* We still like using error handlers better than an API call
* It doesn't create a new API interface
* Changing the error handler is already required for process fault tolerance anyway.
TODO Items
* Aurelien - Write first draft of ULFM composability/recovery advice to have libraries repair MPI in one place.
* Aurelien - Merge MPI_COMM_ISHRINK branch
* Aurelien - Go back over other ULFM branches so we can discuss them next time
* Wesley - Go back through ULFM RMA discussions to see what we need to do (if anything to move forward).
* Wesley - Improve slides for catastrophic errors to include example use cases
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20170927/3f0c1e28/attachment.html>
More information about the mpiwg-ft
mailing list