[Mpi3-ft] FTWG conference call today

Sur, Sayantan sayantan.sur at intel.com
Wed Feb 6 11:23:31 CST 2013


Sorry, wasn't able to make it today. Were there any important minutes of the meeting?

Thanks,
Sayantan

> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
> bounces at lists.mpi-forum.org] On Behalf Of Aurélien Bouteiller
> Sent: Wednesday, February 06, 2013 7:29 AM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: [Mpi3-ft] FTWG conference call today
> 
> Dear WG members,
> 
> This is a reminder that according to our planning, we are having our regular
> phone meeting.
> 
> Agenda:
> - Followup on object state discussions
> - Reboot discussion on RMA ?
> 
> 
> Date: Feb. 6, 2012
> Time: Noon EDT/New York
> Dial-in information: 218-339-4600
> Code: 623998#
> 
> 
> Next Meeting:
> * Feb. 20, 2013
> 
> Le 12 déc. 2012 à 13:31, "Sur, Sayantan" <sayantan.sur at intel.com> a écrit :
> 
> > Hello WG members,
> >
> > Josh, Darius and I were on the call. We discussed our assignment to define
> what happens to objects upon failure. Specifically, what happens to objects
> that are created locally (i.e. do not require any remote processes to call MPI),
> but the MPI implementation can store them in a distributed fashion.
> >
> > We had a short brainstorming session. The thoughts that were discussed
> were:
> >
> > - We could require of the implementation that after failure and when such
> objects are accessed, the implementation provides either SUCCESS or
> FAILURE, i.e. there are no corrupted or partially available objects.
> > - It could be that some alive ranks can read their objects, whereas others
> cannot.
> > - The app could use MPI_Comm_agree to reach consensus on whether all
> required objects are able to be read on ranks that are alive.
> > - For some objects, such as Datatype, there are no accessor functions other
> than when it is used (e.g. Send/recv). It is possible that an MPI
> implementation could return error when a datatype is used by app, but the
> internal representation is not available to the implementation. However, this
> is not very useful as the app then needs a way to discern why a send failed.
> > - Would it make sense to add *_Check functions to objects to see if they
> are still available (after failure)?
> >
> > Please let me know if I missed something in the notes.
> >
> > Sayantan
> >
> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft




More information about the mpiwg-ft mailing list