[Mpi-forum] Fault Tolerance Readings

Rajeev Thakur thakur at mcs.anl.gov
Tue Mar 6 23:24:28 CST 2012


pg 538, ln 11-12: for nonblocking operations, it should be "initiation" instead of "initialization"


For the text on I/O in 17.2.5, pg 541:  In independent I/O functions, each process performs I/O on its own, so it won't detect failure of another process. The issue is in collective functions, and it applies equally to both reading and writing. It is not related to whether writes are buffered, but whether collective functions are synchronizing are not (they don't have to be).

So, I would replace the text on lines 5-11 with simply this text:

"Since collective I/O functions may not synchronize with other processes, process failures may not be reported during a collective I/O function. If a process failure prevents a file operation from completing, an error of class MPI_ERR_PROC_FAILED is raised." 


For MPI_File_invalidate, as I had said during the reading, it is the file handle, not the file, that is invalidated. So the text should be modified to:

"This function notifies all ranks within file handle fh that this file handle is now considered invalid. An invalidated file handle completes any non-local completion operations on fh (see Section 17.2.5) and causes a new operation to complete with error. Once a file handle has been invalidated, all subsequent non-local operations on the file handle must fail with an error of class MPI_ERR_INVALIDATED."



On Mar 6, 2012, at 9:39 PM, Josh Hursey wrote:

> Tomorrow the FT working group will be presenting a revised version of #323 that include the ticket0 changes mentioned in the meeting. A new pdf is attached to ticket #323:
>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/323
>   https://svn.mpi-forum.org/trac/mpi-forum-web/attachment/ticket/323/mpi-report-323-revised-2012-03-06.pdf
> 
> Based on forum feedback, the working group has separated dynamics, rma, and I/O into separate tickets for additional consideration. In addition to reading #323 we will also be reading #327 and #326
>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/327
>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/326
> 
> We are postponing the reading of #325 until a future meeting:
>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/325
> 
> -- Josh
> 
> On Mon, Feb 27, 2012 at 12:16 PM, Josh Hursey <jjhursey at open-mpi.org> wrote:
> The Fault Tolerance Working Group has a session from 1-6 pm on March 5, 2012 during which we will have a formal reading of two FT related tickets, linked below.
> 
> User-Level Failure Mitigation:
>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/323
> 
> Clarify MPI_ERRORS_ARE_FATAL scope of abort:
>   https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/324
> 
> The working group will be providing an overview of the User-Level Failure Mitigation proposal followed by the first reading.
> 
> Thanks,
> Josh
> 
> 
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> 
> 
> 
> -- 
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum





More information about the mpi-forum mailing list