<HTML>

<HEAD>

<TITLE>API draft: Comments, questions, suggestions</TITLE>

</HEAD>

<BODY>

<FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Folks,<BR>

<BR>

Pardon me for jumping in to a conversation that may have already taken place, but I’ll offer some comments regarding the proposed api anyway :) My doc dated 20 Feb 2009, which I've attached here.<BR>

<BR>

0.2 Initializing fault tolerance...<BR>

<BR>

</SPAN></FONT><OL><LI><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>There are several parameters that may be set, each with a call to MPI_COMM_SET_NAME. Would it make sense/be possible to define a structure (or Fortran derived type) instead, one field per parameter? Default settings upon instantiation of the struct/type, with user modifying for new settings. May want to also keep single parameter setting function as well. I don’t think I feel strongly about this, just occurred to me that might be a desirable convenience. 

</SPAN></FONT><LI><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Also, has the Fortran setting of the recover functions been addressed? I recall doing this a few years ago, required some acrobatics to pass a Fortran function into a C world. <BR>

</SPAN></FONT></OL><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>

0.3 Restoring MPI processes<BR>

<BR>

</SPAN></FONT><OL><LI><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Function names don’t adhere to the MPI_COMM_ prefix of the others. 

</SPAN></FONT><LI><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>MPI_GET_LOST_COMMUNICATOR : Last word s/b shortened to COMM, right? And with (1), perhaps MPI_COMM_RESTORE_LIST is more descriptive?<BR>

</SPAN></FONT></OL><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>

0.5 Communicator state<BR>

<BR>

</SPAN></FONT><OL><LI><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Seems that the two versions are more analogous to MPI_Wait and MPI_Test rather than blocking and non-blocking. For example, the non-blocking query does not (seem to) have a completion routine, i.e. analogous to MPI_Wait for, say MPI_Irecv. And in fact the current text claims that ivalidate is the asynchronous version of validate. At the risk of lengthening names, seems more like, MPI_COMM_VALIDATE_WAIT and MPI_COMM_VALIDATE_TEST. Also, the mention of “collective” in MPI_COMM_VALIDATE is not made for IVALIDATE – but IVALIDATE is (effectively) collective, too, correct? 

</SPAN></FONT><LI><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'>Would (1) then lead to function bloat, eg MPI_COMM_VALIDATE_WAIT/ANY/SOME/ALL? Ok, probably not ANY. And same for TEST?<BR>

</SPAN></FONT></OL><FONT FACE="Calibri, Verdana, Helvetica, Arial"><SPAN STYLE='font-size:11pt'><BR>

Misc<BR>

------<BR>

Could a log file be generated (within MPI_Finalize), perhaps written to /tmp/$USER, that lists fault tolerant “incidents”, etc? For example, the total number of restored processes, perhaps the mean of the “generation”? PVM wrote a log file in this manner (forget what was in it), which the user was (permitted to be) unaware of. Came in handy when, for example, a user complained of something. Each process would maintain information, aggregated upon termination. Could be overridden or otherwise managed via some mechanism. Could envision a tool that monitors the individual process logs, provides data to user code for writing to their log file, etc.<BR>

<BR>

Again, I hope I’m not intruding into well-trodden ground, but I would greatly appreciate your feedback on this topic.<BR>

<BR>

Richard<BR>

-- <BR>

  Richard Barrett<BR>

  Application Performance Tools group<BR>

  Computer Science and Mathematics Division<BR>

  Oak Ridge National Laboratory<BR>

<BR>

  <a href="http://users.nccs.gov/~rbarrett">http://users.nccs.gov/~rbarrett</a><BR>

</SPAN></FONT>

</BODY>

</HTML>