[Mpi3-ft] FTWG meeting: supplementary document

Aurélien Bouteiller bouteill at icl.utk.edu
Wed Apr 17 12:41:09 CDT 2013


Here are the minutes from today's meeting. 


Attendees: 
OSU, Livermore, UTK, ORNL, ANL

* ULFM function coverage:
  * Some RMA and File functions are not completed yet.  
  * Better  define RMA as 1-sided rather than non local: it is not clear to RMA people if one-sided should be allowed (or not) to block (example put after a lock). Separating terms gives flexibility.

* RMA
  * first paragraph: major change is that put, etc may return failures immediately (not only in the following sync). This behavior is optional and left to the implementation, in any case the following sync MUST return an error. 
    * The text is implicit, it should state this in clear. 
    * It does not seem to make sense to extend this behavior to comm fn, because somewhere it is already defined that the error is reported in wait/test. Check?

    * we do not force consistent error reporting on puts. This does not matter because error reporting should only be consistent with regard to completions, that is, the following sync. 

  * Locks: point 2 is unclear. The removal of the specification behavior for "non target process failure" is a problem. My personal take is that we should phrase that "failure of a process that has tried to acquire the lock in the current epoch should raise an error when acquiring the same lock", or something similar. We should not try to "know" which process does hold the lock at a given moment (it is difficult in fact), and simply consider that the failure of anybody that -may- hold the lock raise an error. 

  * free: use the "high quality" wording with regard to local cleanup.
     * the same wording must appear in file_free

  * Todo: include the new ops in items 1, 2 for completion of passive targets etc. 


Todo: update the wiki with the latest version of the document, trim out the old ones. 


Hear you in 2 weeks, 
Aurelien 


Le 17 avr. 2013 à 11:34, Wesley Bland <wbland at mcs.anl.gov> a écrit :

> It should be mentioned that most of the functions related to RMA and File I/O have not been classified as they might require feedback from the appropriate people. We also still need to write the definitions of each of the classifications of functions.
> 
> On Apr 17, 2013, at 9:06 AM, Aurélien Bouteiller <bouteill at icl.utk.edu> wrote:
> 
>> All, 
>> 
>> Wesley and I made a first pass on the systematic investigation of compatibility of all MPI functions with ULFM. Here is the first draft of the finding. We found no real problem functions, 3 functions may need clarification but are properly covered.
>> 
>> Please take a look and make sure we made no mistake in the classification
>> 
>> https://docs.google.com/document/d/1Ir1zCZP2ZwG1NIO8DEStQGGyQHRe8NjNZXyLolSZqF4/edit?usp=sharing
>> 
>> 
>> 
>> Début du message réexpédié :
>> 
>>> De : Aurélien Bouteiller <bouteill at icl.utk.edu>
>>> Objet : FTWG meeting reminder
>>> Date : 17 avril 2013 09:29:48 UTC-04:00
>>> À : "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>
>>> 
>>> Dear WG members,
>>> 
>>> This is a reminder that according to our planning, we are having our regular phone meeting today at noon EDT. 
>>> 
>>> Agenda:
>>> - followup on RMA work (new interface)
>>> - Preliminary findings on coverage of "all" functions by ULFM
>>> 
>>> 
>>> 
>>> Date: Apr. 17, 2013
>>> Time: Noon EDT/New York
>>> Dial-in information: 218-339-4600
>>> Code: 623998#
>>> 
>>> 
>>> Next Meetings:
>>> * May. 1, 2013
>>> * May. 15, 2013
>>> 
>>> Please contact me if you want to add items to the agenda on these meetings.
>>> 
>>> --
>>> * Dr. Aurélien Bouteiller
>>> * Researcher at Innovative Computing Laboratory
>>> * University of Tennessee
>>> * 1122 Volunteer Boulevard, suite 309b
>>> * Knoxville, TN 37996
>>> * 865 974 9375
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>> 
>> --
>> * Dr. Aurélien Bouteiller
>> * Researcher at Innovative Computing Laboratory
>> * University of Tennessee
>> * 1122 Volunteer Boulevard, suite 309b
>> * Knoxville, TN 37996
>> * 865 974 9375
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

--
* Dr. Aurélien Bouteiller
* Researcher at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 309b
* Knoxville, TN 37996
* 865 974 9375











More information about the mpiwg-ft mailing list