[Mpi3-ft] API draft: Comments, questions, suggestions

Richard Barrett rbarrett at ornl.gov
Tue Apr 21 13:54:54 CDT 2009

With attachment.

On 4/21/09 2:20 PM, "Richard Barrett" <rbarrett at ornl.gov> wrote:

> Folks,
> Pardon me for jumping in to a conversation that may have already taken place,
> but I¹ll offer some comments regarding the proposed api anyway :) My doc dated
> 20 Feb 2009, which I've attached here.
> 0.2 Initializing fault tolerance...
> 1. There are several parameters that may be set, each with a call to
> MPI_COMM_SET_NAME. Would it make sense/be possible to define a structure (or
> Fortran derived type) instead, one field per parameter? Default settings upon
> instantiation of the struct/type, with user modifying for new settings. May
> want to also keep single parameter setting function as well. I don¹t think I
> feel strongly about this, just occurred to me that might be a desirable
> convenience. 
> 2. Also, has the Fortran setting of the recover functions been addressed? I
> recall doing this a few years ago, required some acrobatics to pass a Fortran
> function into a C world.
> 0.3 Restoring MPI processes
> 1. Function names don¹t adhere to the MPI_COMM_ prefix of the others.
> 2. MPI_GET_LOST_COMMUNICATOR : Last word s/b shortened to COMM, right? And
> with (1), perhaps MPI_COMM_RESTORE_LIST is more descriptive?
> 0.5 Communicator state
> 1. Seems that the two versions are more analogous to MPI_Wait and MPI_Test
> rather than blocking and non-blocking. For example, the non-blocking query
> does not (seem to) have a completion routine, i.e. analogous to MPI_Wait for,
> say MPI_Irecv. And in fact the current text claims that ivalidate is the
> asynchronous version of validate. At the risk of lengthening names, seems more
> like, MPI_COMM_VALIDATE_WAIT and MPI_COMM_VALIDATE_TEST. Also, the mention of
> ³collective² in MPI_COMM_VALIDATE is not made for IVALIDATE ­ but IVALIDATE is
> (effectively) collective, too, correct?
> 2. Would (1) then lead to function bloat, eg
> MPI_COMM_VALIDATE_WAIT/ANY/SOME/ALL? Ok, probably not ANY. And same for TEST?
> Misc
> ------
> Could a log file be generated (within MPI_Finalize), perhaps written to
> /tmp/$USER, that lists fault tolerant ³incidents², etc? For example, the total
> number of restored processes, perhaps the mean of the ³generation²? PVM wrote
> a log file in this manner (forget what was in it), which the user was
> (permitted to be) unaware of. Came in handy when, for example, a user
> complained of something. Each process would maintain information, aggregated
> upon termination. Could be overridden or otherwise managed via some mechanism.
> Could envision a tool that monitors the individual process logs, provides data
> to user code for writing to their log file, etc.
> Again, I hope I¹m not intruding into well-trodden ground, but I would greatly
> appreciate your feedback on this topic.
> Richard

  Richard Barrett
  Application Performance Tools group
  Computer Science and Mathematics Division
  Oak Ridge National Laboratory


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090421/c1462bbe/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: api_doc.pdf
Type: application/octet-stream
Size: 83852 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20090421/c1462bbe/attachment-0001.obj>

More information about the mpiwg-ft mailing list