[Mpi-forum] Missing MPI primitives - Probe+Wait

Tue Apr 20 03:27:43 CDT 2010

On Apr 19 2010, Solt, David George wrote:

>Can you do an MPI_Iprobe + MPI_Waitall(iprobe request + other requests)?
>
> Regarding "of the form that will yield to other threads and processes, 
> not a spin loop" is currently considered and implementation artifact and 
> it is up to a particular MPI implementation if they want to spin or block 
> or yield or expose how waiting is handled to the user.

Yes, agreed.  I wasn't writing a specification, but expressing the intent.

One of the Great Myths of parallel programming is that timing and progress
issues are semantically neutral - they aren't, as MPI-2 notes in the section
on one-sided communication.  MPI-1 is about as much as you can do while
remaining semantically neutral, and there are some gotchas even there.

The point about such a facility is that it would ALLOW an implementation
to deliver what it needed, so at least the program could be syntactically
portable.  At present, even that isn't possible.

> The 2nd request could be handled in a number of ways using current API's 
> primitives, but would all require MPI_THREAD_MULTIPLE to be set. I 
> believe you will need to argue a strong case for why multiple threads 
> should be able to simultaneously call any existing or desired MPI AI's 
> without MPI_THREAD_MULTIPLE set.

Your first sentence is precisely why I posted this!

The reason is implementability.  I believe that I could implement what I
asked for, fairly reliably, on every major system I know about.  I know
of none on which I could implement MPI_THREAD_MULTIPLE without relying on
undefined behaviour, and  I very much doubt that it will ever be possible
to write a non-trivial, portable, reliable MPI_THREAD_MULTIPLE program.

I don't know how well you (or others) know POSIX threads and how they
are implemented, or other threading interfaces, but the assumption that
you can write portable code is completely false.  For example, POSIX
contains no specification of how non-memory actions (including almost
everything that happens inside a library call) can be synchronised.  FAR
worse are the number of facilities that are neither thread-neutral nor
thread-specific (signals is the extreme case, but even network I/O is
often like that, and MPI rather relies on it).

Incidentally, the second paragraph of 2.9.2 is false.  If only it were
that simple :-(

Regards,
Nick Maclaren.