[mpiwg-ft] Subjects for Tomorrow's FTWG Call

Jeff Hammond jeff.science at gmail.com
Thu Nov 2 23:01:55 CDT 2017


> Jim - Should MPI_ERR_DATA_UNAVAILABLE be usable outside of RMA? Does it
apply to point-to-point or collectives?

No.  RMA is the only context where data movement/access is decoupled from
processing at the target.  It is impossible to make data available via
two-sided communication if the/a target process is dead.

> Jim/Aurelien - Is the justification for this work that flush doesn't
allow detection of process failure? They're still not convinced that this
is true.

The issue is not that flush can't detect process failure in some cases.  It
is that flush cannot detect process failure in all cases, and by cases, I
mean implementations.  If your RMA implementation utilizes an ASIC, rather
than the CPU, then it has no reason to determine if the target process is
alive or not, and burdening it with doing so may be a performance disaster.

The other scenario where detecting if the target process is alive or not is
anything like Casper, where the remote processing is done by an agent other
than the target MPI process, although I recall that we discussed a model
where any process failure associated with the communicator associated with
a shared-memory window would need to take out all of those processes in
order to have a sane model.

> Jim - On the other hand, it might be true that we can't guarantee any
process failure detection in any RMA operation. Maybe we should just not
allow process failure errors (as opposed to "upgrading" other types of
errors to process failure).

There is no reason to disallow detecting of process failure.  We know many
implementations use an active message implementation where such detection
is trivial.

> Jim - One place this still makes sense as is is having a process with
data corrupted because another process failed during a put. If a third
process is reading the bad memory, it could get MPI_ERR_DATA_UNAVAILABLE
instead of MPI_ERR_PROC_FAILED.

This sounds like the Byzantine problem.  MPI should not be responsible for
detecting data corruption.

> We're still unclear on the failure model expected here. We probably need
to get more feedback from Jeff.

First, all-or-nothing window invalidation is awful.  I think we've moved
beyond that, but just to be sure, I'll justify why it's awful.

Some PGAS applications - NWChem being the most famous example - have
read-only and update-only epochs, which are collective.  If a process
fails, the application should be able to continue to use data that it knows
programmatically could not have been modified during the epoch.
Furthermore, the application may know programmatically which processes
touched which data and thus can recover some portion of the window(s)
touched in an update-only epoch.

Second, I do not want to burden RMA implementations with "is target process
alive?" active-message traffic in every communication operation, when the
fault-intolerant implementation would not interact with the target process,
e.g. RDMA networks.  I want an error code that RMA can use that recognizes
that RMA decouples synchronization and data movement and that process
live/dead checking is a form of synchronization.

Jeff

On Wed, Nov 1, 2017 at 6:24 AM, Bland, Wesley <wesley.bland at intel.com>
wrote:
>
> I was waiting until you got back from leave.
>
> We have some notes from the discussion two weeks ago on the wiki page:
https://github.com/mpiwg-ft/ft-issues/wiki/2017-10-18. We'd like to get a
little more feedback on the failure model you were envisioning. It'd
probably be most productive to do this on a call so we can get some back
and forth.
>
> Why don't we go ahead and cancel for this week. Jeff, can you attend next
week? (Wednesdays at 12:00 PM Eastern US)
>
> Thanks,
> Wesley
>
> On Oct 31, 2017, at 4:46 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>
> What feedback do you want? I haven’t seen any specific requests for
comment.
>
> Jeff
>
> On Tue, Oct 31, 2017 at 1:04 PM Bland, Wesley <wesley.bland at intel.com>
wrote:
>>
>> Hi all,
>>
>> I think we're still on hold for the ULFM data resilience discussion
until we're able to get some more feedback from Jeff and our non-ULFM
topics are ready for another discussion with the rest of the forum.
>>
>> Are there other topics that people want to discuss tomorrow or should we
cancel the call?
>>
>> Thanks,
>> Wesley
>> _______________________________________________
>> mpiwg-ft mailing list
>> mpiwg-ft at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft
>
>
>
> _______________________________________________
> mpiwg-ft mailing list
> mpiwg-ft at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-ft




--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20171102/93a1659b/attachment.html>


More information about the mpiwg-ft mailing list