[mpi3-ft] Picking up working group activities

Pavan Balaji balaji at mcs.anl.gov
Mon Jan 21 13:27:49 CST 2008


11a EST or later will be fine with me. Here are some things that we 
might want to discuss, depending on interest, during the telecon:

1. Level of fault tolerance --- should the MPI implementation provide 
information about what level of fault tolerance it supports (similar to 
threads). Choices could be: (a) None, (b) MPI provides error description 
and allows application to save information and abort (cannot continue), 
(c) MPI provides error description and allows for recovery in some cases.

2. Turn-off non-transparent fault-tolerance: If the MPI implementation 
is capable of automatically dealing with faults, the application should 
be allowed to turn on/off each feature. For example, if the MPI 
implementation supports auto-migration to a different network when a 
network fails (or is giving performance problems), the application 
should be able to shut off this capability if it doesn't want it.

3. Distinguish FAULTS vs. HINTS. Faults can be considered fatal errors, 
while hints are non-fatal (e.g., performance is not optimal).

4. User directives --- if the MPI implementation supports some 
capability (e.g., checkpoint), should the application be allowed to 
force (or request) it to happen at some specific time?


  -- Pavan

On 01/21/2008 01:08 PM, Erez Haba wrote:
> Thanks Rich,
> Wednesday's are okay for me; however 11am EST is too early. 1pm or 2pm EST would work better for me thought it might not for people from Europe.
>>From my pov, resolving the MPI_COMM_WORD FT behavior is the highest priority issue.
> -----Original Message-----
> From: mpi3-ft-bounces at cs.uiuc.edu [mailto:mpi3-ft-bounces at cs.uiuc.edu] On Behalf Of Richard Graham
> Sent: Monday, January 21, 2008 10:15 AM
> To: mpi3-ft at cs.uiuc.edu
> Subject: [mpi3-ft] Picking up working group activities
> I would like to start bi-weekly con calls to discuss Fault Tolerance and
> dynamic process support  in the context of MPI 3.0.  First, we need to find
> a time for the telecon that works for most people, so I will start by
> suggesting that we have the call on Wed's at 11 am EST, starting 1/30/2008.
> How does this work for people who plan to be active participants in this
> work ?
> The items I would like to start addressing next week are
>  - Does MPI provide FT, or enable FT work ?
>  - Do we need more than is available in the current standard ?
>  - If so, what are the use-case scenarios that we are aiming to support ?
> These seem to be a good starting point for discussions.  If you have a
> particular use-case you want us to consider in this working group, please
> write it up and circulate it to on the mailing list at least a day before we
> meet, to give people a chance to read these over.  Please keep the
> descriptions high-level, avoiding specific solutions, at this stage.  We
> will start to discuss specifics once we have defined the scope of the
> problem that needs to be addressed - if we agree that there is one in the
> first place.
> Same thing for dynamic process support - if you have specific use-case
> scenarios that are not supported under the current standard, and you would
> like to have these considered for inclusion in the 3.0 standard, please
> provide the use cases you would like the group to consider.
> We should continue to take input on use case scenarios until the next
> face-to-face meeting, but the sooner we get these, the more effective we
> will be in proceeding in a timely manner.
> Thanks,
> Rich
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/mpi3-ft
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/mpi3-ft

Pavan Balaji

More information about the mpiwg-ft mailing list