[mpiwg-ft] FTWG Con Call Today

Ignacio Laguna lagunaperalt1 at llnl.gov
Wed Jun 16 12:50:02 CDT 2021


Hi Wesley, all,

If we have time when we meet again, I'd like to add to the agenda tools 
support for the Reinit spec. Here's a new version (0.2) of the Reinit spec:

     https://reinit.github.io/reinit/docs/reinit-0.2.0.pdf

We added a section on tools. The main question we have is what happens 
with performance variables after we resume execution at the rollback 
point: should the MPI implementation reset the variables or should the 
tool adjust its values of the variables taking into account failures? My 
preference is that MPI resets the variables, but we would like to hear 
opinions.

In any case, we are thinking to propose a perf variable that reflects 
the failure counts that tools can use. We could also propose an event 
triggered after rollback occurs.

Ignacio



On 6/14/21 7:38 AM, Wesley Bland via mpiwg-ft wrote:
> The Fault Tolerance Working Group’s weekly con call is today at 12:00 PM Eastern. Today's agenda:
>
> * Serializing MPI Objects (Tony/Derek)
>
> If there's something else that people would like to discuss, please just send an email to the WG so we can get it on the agenda.
>
> Thanks,
> Wes
>
> .........................................................................................................................................
> Join from PC, Mac, Linux, iOS or Android: https://urldefense.us/v3/__https://tennessee.zoom.us/j/632356722?pwd=lI4_169CGcewIumekTziMw__;!!G2kpM7uM-TzIFchu!itjSxWneni5E2s9ZaA77xBOXnKwhFtUUtlw_waZGuMdn39EBSrQjr9oV5N0SwWDO8AKQqg$
>      Password: mpiforum
>
> Or iPhone one-tap (US Toll):  +16468769923,632356722#  or +16699006833,632356722#
>
> Or Telephone:
>      Dial:
>      +1 646 876 9923 (US Toll)
>      +1 669 900 6833 (US Toll)
>      Meeting ID: 632 356 722
>      International numbers available: https://urldefense.us/v3/__https://zoom.us/u/6uINe__;!!G2kpM7uM-TzIFchu!itjSxWneni5E2s9ZaA77xBOXnKwhFtUUtlw_waZGuMdn39EBSrQjr9oV5N0SwWBXw43iQw$
>
> Or an H.323/SIP room system:
>      H.323: 162.255.37.11 (US West) or 162.255.36.11 (US East)
>      Meeting ID: 632 356 722
>      Password: 364216
>
>      SIP: 632356722 at zoomcrc.com
>      Password: 364216
> .........................................................................................................................................
>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://urldefense.us/v3/__https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft__;!!G2kpM7uM-TzIFchu!itjSxWneni5E2s9ZaA77xBOXnKwhFtUUtlw_waZGuMdn39EBSrQjr9oV5N0SwWCNDYUSRQ$
>


More information about the mpiwg-ft mailing list