[Mpi3-ft] FTWG conference call today

Aurélien Bouteiller bouteill at icl.utk.edu
Wed Feb 6 09:29:18 CST 2013


Dear WG members,

This is a reminder that according to our planning, we are having our regular phone meeting. 

Agenda:
- Followup on object state discussions
- Reboot discussion on RMA ?


Date: Feb. 6, 2012
Time: Noon EDT/New York
Dial-in information: 218-339-4600
Code: 623998#


Next Meeting:
* Feb. 20, 2013

Le 12 déc. 2012 à 13:31, "Sur, Sayantan" <sayantan.sur at intel.com> a écrit :

> Hello WG members,
> 
> Josh, Darius and I were on the call. We discussed our assignment to define what happens to objects upon failure. Specifically, what happens to objects that are created locally (i.e. do not require any remote processes to call MPI), but the MPI implementation can store them in a distributed fashion.
> 
> We had a short brainstorming session. The thoughts that were discussed were:
> 
> - We could require of the implementation that after failure and when such objects are accessed, the implementation provides either SUCCESS or FAILURE, i.e. there are no corrupted or partially available objects.
> - It could be that some alive ranks can read their objects, whereas others cannot.
> - The app could use MPI_Comm_agree to reach consensus on whether all required objects are able to be read on ranks that are alive.
> - For some objects, such as Datatype, there are no accessor functions other than when it is used (e.g. Send/recv). It is possible that an MPI implementation could return error when a datatype is used by app, but the internal representation is not available to the implementation. However, this is not very useful as the app then needs a way to discern why a send failed.
> - Would it make sense to add *_Check functions to objects to see if they are still available (after failure)?
> 
> Please let me know if I missed something in the notes.
> 
> Sayantan
> 





More information about the mpiwg-ft mailing list