[Mpi3-ft] MPI_Comm_validate - What's in a name?

Anthony Skjellum tony at cis.uab.edu
Wed Jan 25 16:24:32 CST 2012


I think it is a given that point-to-point should remain allowed on a collective with "holes" :-)
Tony

----- Original Message -----
From: "Josh Hursey" <jjhursey at open-mpi.org>
To: "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>
Sent: Wednesday, January 25, 2012 4:09:56 PM
Subject: Re: [Mpi3-ft] MPI_Comm_validate - What's in a name?


Yes. I am starting that conversation in another email thread (still working on it). I did not want that discussion to distract here. 


-- Josh 


On Wed, Jan 25, 2012 at 5:02 PM, Sur, Sayantan < sayantan.sur at intel.com > wrote: 






Hi Josh, 



I think we also had another discussion that Darius initiated regarding this. We discussed that we will allow p2p communication on communicator with ‘holes’, but disallow collectives. Since this seemed to satisfy your use cases. We were thinking about bringing in the validate as an “add-on” ticket. Are we still on track to do that? 



Thanks. 



=== 

Sayantan Sur, Ph.D. 

Intel Corp. 






From: mpi3-ft-bounces at lists.mpi-forum.org [mailto: mpi3-ft-bounces at lists.mpi-forum.org ] On Behalf Of Josh Hursey 
Sent: Wednesday, January 25, 2012 12:56 PM 
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group 
Subject: [Mpi3-ft] MPI_Comm_validate - What's in a name? 





On the call today it was suggested that we re-evaluate the name MPI_Comm_validate. It was pointed out that 'validate' seems a bit too close to 'invalid' which is probably not the semantic that we are trying to imply with the name. An alternative is 'check', but that is a bit close to 'checkpoint' so might not be the best either. So we are looking for a good name. 





We started a similar discussion for reenable_any_source, which might lend some ideas: 


http://lists.mpi-forum.org/mpi3-ft/2011/12/0931.php 





The semantic behind MPI_Comm_validate [at the moment] are: 


(1) A fault tolerant synchronization point returning a consistent value (failed group) at all participating processes 


(2) Allow for the posting of new collective operations on the communicator (a communicator with potential holes in it) 





Name suggestions are welcome. 





-- Josh 






-- 
Joshua Hursey 
Postdoctoral Research Associate 
Oak Ridge National Laboratory 
http://users.nccs.gov/~jjhursey 
_______________________________________________ 
mpi3-ft mailing list 
mpi3-ft at lists.mpi-forum.org 
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft 




-- 
Joshua Hursey 
Postdoctoral Research Associate 
Oak Ridge National Laboratory 
http://users.nccs.gov/~jjhursey 

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-- 
Anthony Skjellum, PhD
Professor and Chair
Dept. of Computer and Information Sciences
Director, UAB Center for Information Assurance and Joint Forensics Research ("The Center")
University of Alabama at Birmingham
+1-(205)934-8657; FAX: +1- (205)934-5473

___________________________________________
CONFIDENTIALITY: This e-mail and any attachments are confidential and
may be privileged. If you are not a named recipient, please notify the
sender immediately and do not disclose the contents to another person,
use it for any purpose or store or copy the information in any medium.

Please consider the environment before printing this e-mail 




More information about the mpiwg-ft mailing list