Purpose:  This is intended to describe how MPI will respond when a process 
  either fails, or appears to have failed, due to a broken network
  connection.

Use case scenario:
   - Symmetric failure: one process out of N is no longer responsive
   - Asymmetric failure:  Communications A->B are ok, but B->A are not


Errors:
   - Error detection is external to the MPI specification, and is implementation specific.
   - Failures that can be corrected by the MPI implementation without input from the 
     application are not considered errors from MPI's perspective.  So a transient network 
     failure, or an link failure that can be routed around are not considered failures.
   - An MPI implementation must support detecting the following errors:
      - process failure.  A process is considered as failed if it is non-responsive - failed or disconnected.
      - communications failure

Error states:
  - Error state is associated with a communicator
  - A communicator that has lost a process will be defined to be in an
    error state
  - A process moves into error state, if it tries to interact with a
    failed process
  - When an MPI process is restored, it is restored in an error state
  - Each process that is in an error state must undertake an MPI
    repair action
  - Recovery from error state must done one communicator at a time

Error notification:  
 By default only calls in which the remote process is
 involved with will be notified of error synchronously.  So if B fails,
 A will be notified when:
   - send to B is attempted
   - receive from B is posted
   - receive from MPI_ANY_SOURCE is posted
   - put to B
   - get from B
   - collective operation is posted
   - collective I/O is attempted
   - new communicator is created, where B is part of the new communicator

 A process can subscribe for notification for event that it would not
 be notified of by default.  This notification will be asynchronous.

 Error notification must be consistent, i.e. MPI must propagate a single view of the system, 
 which may change over time.

Recovery process:
  The overall approach is to keep recovery local to the processes directly affected by the failure, 
  and minimize global recovery.
  Point-to-point communications:
    - Recovery is local, i.e., not collective
    - When the application is notified of failure, the recover function must be called, if the
        application wants to continue using MPI.
    - The recovery function will have the following attributes:
       - called on a per communicator basis
       - caller specifies the mode of recovery - restore processes, replace missing processes with
         MPI_PROC_NULL, or eliminate the communicator.
       - caller specifies what to do with outstanding point-to-point communications traffic - preserve
         traffic with unaffected ranks, or remove all traffic queued on the given communicator.  (we do
         have a potential race condition in the last case, if this communicator will continue to send
         data - need to think more about this)
       - the recovery function may handle recovery for as many failed processes as it is
         aware of at a given time.  This implies that when subsequent MPI calls hit error conditions
         that have not yet been cleared and return an error (this is a requirement), the subsequent
         call to the recover function will not restore a process that has already been restored.

   Collective communications:
     - Recovery is collective.
     - The collective call will return an error code that indicates the known state of the
       communicator , with failure of any process in the communicator putting it into an error state.
     - By default, the error code will be generated based on local information, so if the collective
       completes locally successfully but an other process has failed, MPI_SUCCESS will be returned.
     - An "attribute" can be set on a per communicator basis that will require the communicator to
       verify that the collective has completed successful on a global basis.  This does not require
       a two phase collective, i.e. data may be written to the destination buffer before global
       verification has occurred, but completion can't occur until the global check has occurred.
     - MPI should "interrupt" outstanding collectives, and return an error code if it detects an error.
       MPI should also remove all remove "traces" of the failed collectives.
     - MPI should handle outstanding traffic associated with the failed processes, as described
       in the point-to-point recovery section.
     - MPI will provide a collective (?) function to check the state of the communicator, as a way
       to verify global state, something like mpi_fault_status(comm, array_of_status_codes) or
       MPI_Comm_validate(comm)

    Process recovery:

Communicator life cycle:
     - A communicator starts its life with N processes, and in an ACTIVE state.
     - Once an process fails, the communicator enters an error state.
     - When in error state:
        - Point-to-point communications
           - If the failed process is not involved in the communications, the local part of the
             communicator will NOT enter into en error state, and communications will continue
             as though no error has occurred.
           - If the failed process IS involved in the communications, the local part of the
             communicator will BE marked as in ERROR state.  The application must call the
             MPI recovery function to restore the state of the communicator to ACTIVE.  No
             communications can process while the communicator is in an error state (Question:
             This could require checking error state all the time, so may need to think of a
             better way to do this).
        - Collective communications
           - Any process failure with outstanding communications puts all parts of the communicator
             in an error state.  A collective MPI recovery function must be called to restore the
             state of the communicator.  This gives the implementation the ability to perform
             any optimization it typically will do for collective operations when communicators
             are constructed.


Error return codes:
     - The application needs to set MPI_ERRORS_RETURN in all processes if it is to take advantage
       of MPI's error recovery capabilities.