[Mpi-forum] MPI_Mprobe workaround

Jeff Hammond jhammond at alcf.anl.gov
Fri Jul 13 14:32:06 CDT 2012

Hmm, it seems I've stumbled upon a variant of Listing 1.1 in your paper :-)

I'm a big fan of buffer pools, so I am unfazed by the potential
performance impact of sbrk().  If you need to malloc() so much memory
that sbrk() is called, I suspect that the message transfer time is
going to take longer than a system call.  But I confess to being
somewhat ignorant of the performance of slow, general purpose
operating systems like Linux.


On Fri, Jul 13, 2012 at 2:19 PM, Torsten Hoefler <htor at illinois.edu> wrote:
> Jeff,
>> I was reading through
>> www.unixer.de/publications/img/mprobe-proposal-rev4.pdf and - just for
>> fun - wondered if there is a workaround, not because MPI_Mprobe
>> shouldn't be in the MPI-3 standard, but because some folks might want
>> a workaround for backwards compatibility if they are forced to use
>> MPI-2 somewhere.
>> I apologize in advance if something equivalent to what I say below was
>> discussed at the Forum and I was not present.
>> Torsten and coworkers say the following.
>> ============================
>> For example, the following code can not be executed concurrently by
>> two threads in an MPI process, because a message could be found by the
>> MPI Probe in both threads, while only one of the threads could
>> successfully receive the message (the other will block):
>> MPI_Status status;
>> int value;
>> MPI_Probe(MPI_ANY_SOURCE, /*tag=*/0, MPI_COMM_WORLD, &status);
>> MPI_Recv(&value, 1, MPI_INT, status.MPI_SOURCE, /*tag=*/0,
>> <snip>
>> There is no known workaround that addresses all of the problems with
>> MPI Probe and MPI Iprobe in multi-threaded MPI applications.
>> ============================
>> Obviously, a fat mutex around this block solves the problem, but the
>> time spent in the mutex will scale with the message size.  I was
>> curious the following workaround is reasonable when MPI-2 must be
>> used.
>> ============================
>> MPI_Status status;
>> MPI_Request request;
>> AppropriateMutex mutex;
>> int value;
>> ACQUIRE_MUTEX(&mutex);
>> MPI_Probe(MPI_ANY_SOURCE, /*tag=*/0, MPI_COMM_WORLD, &status);
>> /* ? */
>> MPI_Irecv(&value, 1, MPI_INT, status.MPI_SOURCE, /*tag=*/0,
>> MPI_COMM_WORLD, &request);
>> RELEASE_MUTEX(&mutex);
>> MPI_Wait(&request, MPI_STATUS_IGNORE);
>> ============================
> This is correct. But most use-cases for probe require a malloc and
> malloc is not always fast (especially if it needs to execute sbrk). This
> will still have the contention problem (assume 1000's of cores!). Mprobe
> allows to use wait-free algorithms for the queue management.
>> Are there nuances regarding the use of MPI that I have missed?
> Nope.
>> Do the real-world use cases have too much to do in the "/* ? */" to
>> make this viable?
> I think so (at least all use-cases I know). Do you have ones that only
> require trivial "/* ? */"?
> This was brought up at a point when proposals were much harsher reviewed
> than today and we had many discussions and even a full EuroMPI paper on
> this http://www.unixer.de/publications/index.php?pub=103 .
> All the Best,
>   Torsten
> --
> ### qreharg rug ebs fv crryF ------------- http://www.unixer.de/ -----
> Torsten Hoefler         | Performance Modeling and Simulation Lead
> Blue Waters Directorate | University of Illinois (UIUC)
> 1205 W Clark Street     | Urbana, IL, 61801
> NCSA Building           | +01 (217) 244-7736
> _______________________________________________
> mpi-forum mailing list
> mpi-forum at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi-forum

Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381

More information about the mpi-forum mailing list