[MPIWG Fortran] Fortran coarrays - failed images

Bill Long longb at cray.com
Mon Sep 7 00:45:14 CDT 2015


Hi Jeff,

The current version of the TS is WG5 document N2074.  The failed image feature has not changed much recently, but still better to use the latest version.  N2074 is currently out for WG5 review.  Assuming it passes, this (minus the line numbers that we like but ISO doesn’t) is the version that will be sent to the ISO editors for publication. 

I think it would be an excellent idea for the TS/Fortran 2015  and MPI facilities for FT to be able to use common underlying infrastructure if possible.   I have not thought about how a program that uses both the Fortran and the MPI facilities would work (or how the MPI spec should be written in that regard), but if the links into components like PMI or SLURM would be the same, that would certainly help.   I wrote up a summary of the TS features for the benefit of the MPI FT experts, pasted in below. 

Cheers,
Bill

ISO TS 18508 includes features that can be used to help a program
react to the failure of an image. It is intended to be a minimal
capability. Facilities are included for notification, inquiry,
testing, and simple continuation of execution.


Background:

The parallel programming model using coarrays that is included in
Fortran 2008 assumes that the number of executing images remains
constant for the entire program. As a consequence, failure of an image
(typically from a hardware failure) aborted the whole program.  The
addition of teams in TS 18508 allows for the possibility of the number
of images decreasing following a failure by forming a new team
consisting of the active images and continuing execution in that team.
While this was not the main motivation for introducing teams, this
observation lead to the addition of minimal resilience facilities to
the TS.


Notification:

The image control statements include an optional STAT= specifier will
return an error status. The image selector syntax for remote
references also allow an optional STAT= specifier.  The new collective
and atomic subroutines have an optional STAT argument that will also
return an error status. An error status of zero indicates success. If
the operation involved communication with a failed image, the status
returned is equal to the named constant STAT_FAILED_IMAGE that is
defined in the intrinsic module ISO_FORTRAN_ENV, and execution
continues.  If there is no status variable provided and the operation
involves communication with a failed image, the program aborts.  A
negative value of STAT_FAILED_IMAGE indicates that the processor
cannot detect processor failure, and a positive value indicates it can.
This effectively makes provision of this facility optional.

[MPI analog: Remote communication operations in MPI either return a
status as a function result (C), or have an optional MPI subroutine
argument that returns an error status (Fortran). A named constant
MPI_FAILED_RANK could be added to the MPI spec to provide a
corresponding capability.]


Inquiry:

Two inquiry functions are provided that can return information on the
status of images. IMAGE_STATUS( N ) will return STAT_FAILED_IMAGE if
image N has failed. This could be used to check on the health of an
image just before a loop that involved many accesses to that
image. FAILED_IMAGES( ) returns a 1-D array containing the numbers of
the failed images. This can be used to compute which images should be
omitted from a new team that can be used for continued execution. If
there are no failed images, the returned array has size zero.

[MPI analog: Adding corresponding functions ( IMAGE -> RANK ) to MPI
would seem straightforward.]


Testing:

A new statement

   FAIL IMAGE

is added. When executed by image N it causes image N to appear to have
failed as seen from the other images.  This is included so that
programmers can test recovery algorithms without having to wait for an
actual failure. An image that executes this statement does not
continue execution.

[MPI analog: One more function.]


Continuation:

The FORM TEAM statement allows the program to create a new team.  In
the case of a failed image, the strategy would be to form a new team
consisting of the remaining active images.  The CHANGE TEAM statement
causes a switch in the execution environment to the specified
team. Note that whether the program can meaningfully continue depends
on the algorithm being implemented and whether the program includes
code to switch teams to continue.

[MPI analog: Fortran teams could be mapped to MPI communicators. One
difference is that Fortran allows you to omit a team specification in
most statements involving communication, in which case the "current"
team is used. The MPI spec might want to specify that MPI_COMM_WORLD
is redefined in the case that "team" is shrunk through rank failure,
to reduce the need to modify existing code.]


Implementation issue:

The program launching and management environment (PMI, ALPS, SLURM,
...) need to be modified to include an API that can provide failed
image information to the program and keep the remaining images
executing, as opposed to the current behavior of aborting all the
images.  The API also needs a function that the program startup code
can call to inform the management environment whether it is employing
resilient features. It would certainly be advantageous to have the
same API used for both Fortran and MPI.

——————————End of Summary ——————————————




On Sep 5, 2015, at 7:09 PM, Jeff Hammond <jeff.science at gmail.com> wrote:

> So Fortran 2015 (TS 18508 - section 6 in the attachment) is going to support failed images.
> 
> The OpenCoarrays folks (who are responsible for enabling GCC 5+ Fortran coarray support) have started looking into how to support this feature.  They currently use MPI-3 RMA and GASNet as communication runtimes, but the need to support FT will likely push them in a new direction.  The options they have mentioned thus far are undesirable.
> 
> It would be great if there were more people who could help look at RMA FT, particularly as it pertains to Fortran 2015.  Are any of the Fortran WG folks savvy on how those failures map to MPI concepts and whether or not the MPI RMA FT discussion is going in the right direction?
> 
> Thanks,
> 
> Jeff
> 
> -- 
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> <ISO-IECJTC1-SC22-WG5_N2056_Draft_TS_18508_Additional_Paralle.pdf>_______________________________________________
> mpiwg-fortran mailing list
> mpiwg-fortran at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-fortran

Bill Long                                                                       longb at cray.com
Fortran Technical Support  &                                  voice:  651-605-9024
Bioinformatics Software Development                     fax:  651-605-9142
Cray Inc./ Cray Plaza, Suite 210/ 380 Jackson St./ St. Paul, MN 55101





More information about the mpiwg-fortran mailing list