[Mpi-22] [Mpi-forum] MPI 2.2 proposal: resolving MPI_Request_free issues

Erez Haba erezh at [hidden]
Thu Jul 17 14:40:55 CDT 2008



The problem with solution #5 is that it still might be incorrect. It is okay to read the buffer while the local interconnect is still using it but it's not always the case with write.

For example, it is okay for the interconnect to probe the memory again before releasing it. It can use probe for write (which is odd, but can happen). This probe is usually implemented as read-write back, if the probe happens as the same time the user is updating the data; the user data might get corrupted.  I know it is farfetched, but I'd be extremely cautions recommending the user to write the buffer before the local resource "owner" (which is the interconnect) relinquish it back.

As a solution I would put text that discourage the use for MPI_Request_free for active request; and recommend using Adam Moody proposal for MPI_REQUEST_IGNORE.
(we might not be able to make this an error case in 2.2 to keep backward compatibility)

The text specifically would say that writing or freeing the buffer after calling MPI_Request_free is unsafe.

Thanks,
.Erez

-----Original Message-----
From: mpi-22-bounces_at_[hidden] [mailto:mpi-22-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Wednesday, July 16, 2008 6:59 PM
To: MPI 2.2
Subject: Re: [Mpi-22] [Mpi-forum] MPI 2.2 proposal:resolving MPI_Request_free issues

On Jul 16, 2008, at 7:31 PM, Underwood, Keith D wrote:

>>> [erezh] Correct; this is the scenario I was describing with the RDMA write.
>
> It would be interesting to see exactly what the error mode here is.
> Retransmitting corrupted data should be ok, since a correctly
> delivered
> message means that the retransmit must be dropped.  I suppose that if
> the NIC speaks virtual addresses and the free actually results in a
> trap
> to the kernel that unmaps the pages, then the NIC could retransmit and
> find that there isn't a valid page table entry...

'zactly.  And then the local completion entry would be a failure -- so
the sender would [erroneously] think that the message had failed to be
delivered.

> Solution #5:  Change the advice to users - "...the arrival of the
> reply
> informs the sender that the send has completed and the send buffer can
> be overwritten.  If the buffer will ever be freed, the application
> should call MPI_Wait or MPI_Cancel instead of MPI_Request_free."

That seems like a good compromise.


--
Jeff Squyres
Cisco Systems
_______________________________________________
mpi-22 mailing list
mpi-22_at_[hidden]
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-22




More information about the Mpi-22 mailing list