[mpiwg-persistence] Completion call before all partitions marked ready

Tue Jan 24 05:59:52 CST 2023

Hi Joachim,

That exact question was raised and discussed in the Forum during the standardisation procedure for partitioned communication.

The resulting intent, IMHO, was that calling a completion procedure before all partitions are marked ready is permitted, but will always respond with its "not completed" semantic -- MPI_TEST will set flag==false and MPI_WAIT will not return (until something changes -- in a multithreaded situation, another thread could make calls to MPI_Pready and release MPI_WAIT from what would otherwise be a deadlock).

This modus operandi may be important for offload to accelerators, where the "other thread" that calls MPI_Pready is actually a kernel running on an accelerator device (once we have accelerator bindings for MPI_Pready and can guarantee sufficient preparedness of the partitioned communication operation that such a device call makes sense, which are both work in progress).

As this question has arisen several times from different people, I would say that the intent is not stated in the document clearly enough. We should work on writing some better wording. Given that the intended semantics would not be changing, we might still have time to get a wording fix proposal into MPI-4.1 -- it needs to be read at the March voting meeting to hit that goal.

Best wishes,
Dan.

-----Original Message-----
From: mpiwg-persistence <mpiwg-persistence-bounces at lists.mpi-forum.org> On Behalf Of Joachim Protze via mpiwg-persistence
Sent: 21 September 2022 08:21
To: mpiwg-persistence at lists.mpi-forum.org
Subject: [mpiwg-persistence] Completion call before all partitions marked ready

Hello wg-persistence,

Looking at the MPI 4.0 document, it is not clear to us, whether it is allowed to call a completion call for a partitioned communication request before all partitions are marked ready. A simple single-threaded example would be:

```C
MPI_Psend_init(message, partitions, COUNT, MPI_DOUBLE, dest, tag,
                MPI_COMM_WORLD, MPI_INFO_NULL, &request); MPI_Start(&request); for(i = 0; i < partitions-1; ++i) {
     MPI_Pready(i, request);
}
MPI_Test(&request, &flag, MPI_STATUS_IGNORE); // flag will always be 0 MPI_Pready(partitions-1, request); MPI_Wait(&request, MPI_STATUS_IGNORE); MPI_Request_free(&request); ```

The question becomes more relevant in a multi-threaded context. One thread could finish the work early and call MPI_Wait to detect when all partitions were sent.

 From my understanding, the only requirement is that all partitions must be marked ready with explicit ready calls before the operation can complete. Replacing the test in above example with a wait call would result in deadlock.

Best
Joachim

--
Dr. rer. nat. Joachim Protze

IT Center
Group: High Performance Computing
Division: Computational Science and Engineering RWTH Aachen University Seffenter Weg 23 D 52074  Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
protze at itc.rwth-aachen.de
www.itc.rwth-aachen.de