[Mpi3-ft] Fault Tolerance (sub)Chapter or Tighter Integration
buntinas at mcs.anl.gov
Wed Mar 2 11:31:54 CST 2011
I think a section in the environmental management chapter would make sense. Then we wouldn't need additional text in the Point to Point chapter for things like MPI_Send and MPI_Recv, but places where additional explanation is needed (perhaps collectives?) we would add in those chapters.
Though I would be OK with making it a chapter too. We should then move Error Handling from the Environmental Management there too. (I think that's what you said)
On Mar 1, 2011, at 8:38 AM, Joshua Hursey wrote:
> We start edging toward a final draft of the run-through stabilization proposal and embark on process recovery (TBA). As we do so, I wanted to start thinking about how we might integrate this language into the current MPI standard. A PDF version of the working proposal will make it easier for someone new to pick up and read exactly what we are going to add. This is in contrast to the mixture of notes and standard text that is currently on the wiki.
> In particular, should we:
> A) Create an entirely new chapter on Fault Tolerance and Error Management. Pull in all existing section to a central location.
> B) Add a section to the Environmental Management chapter on Fault Tolerance. Pull in relevant existing sections on error handling into this section.
> C) Tightly integrate the semantics throughout the MPI standard (e.g., P2P semantics in the P2P chapter, Collective semantics in the Collectives chapter).
> D) Something else...
> There are pros and cons to each. In essence the question is, should we move all the error management logic to a central location or keep it close to the actual functionality?
> What do folks think about this?
> -- Josh
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
More information about the mpiwg-ft