[Mpi3-ft] MPI_INIT_THREAD and MPI_Failhandler_set/get_mode at MPI initialization

Rolf Rabenseifner rabenseifner at hlrs.de
Wed Jan 18 13:06:01 CST 2012


To make MPI_Failhandler_set_mode collective is another choice.
Collective over what?  All connected processes?
What is with processes that are spawned after MPI_Failhandler_set_mode?

The other idea is, to allow setting special options before MPI_Init/MPI_Init_thread.

The only requirement is, that the FT proposal is consistent
with the rest of MPI.

Best regards
Rolf 

----- Original Message -----
> From: "George Bosilca" <bosilca at eecs.utk.edu>
> To: "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>
> Cc: "Terry D. Dontje" <terry.dontje at oracle.com>, "Josh Hursey" <jjhursey at open-mpi.org>, "Rolf Rabenseifner"
> <rabenseifner at hlrs.de>, "Bronis R. de Supinski" <bronis at llnl.gov>, "Pavan Balaji" <balaji at mcs.anl.gov>
> Sent: Wednesday, January 18, 2012 4:26:43 PM
> Subject: Re: MPI_INIT_THREAD and MPI_Failhandler_set/get_mode at MPI initialization
> I concur with the previous statements. As Rolf highlighted it in his
> email, one of the reasons of this new proposal is to fix the "unclear"
> collective behavior of MPI_Failhandler_set_mode. I don't see the
> unclearness, and here are two of my reasons.
> 
> 
> 1. There is no reason to have such a function
> (MPI_Failhandler_set_mode), setting of the fail handler should ALWAYS
> be collective, otherwise the entire purpose of the fail handler is
> annihilated.
> 
> 
> 2. If no collective behavior is required (meaning the software stack
> doesn't have to be rebuild in a collective way), then the fail handler
> is a clear overkill. A saner and more clear behavior can be obtained
> by using local Error handler with carefully crafted requests (as an
> example a non-blocking, never to be matched, request on a duplicate of
> MPI_COMM_WORLD can do the trick).
> 
> 
>   george.
> 
> 
> 
> On Jan 18, 2012, at 09:56 , Josh Hursey wrote:
> 
> 
> I like the motivation of the proposal, but I think Terry has a good
> point. It seems a bit like a hack to repurpose the required/provided
>  arguments to achieve semantic assertions. I would almost prefer some
> other functionality that must be called before MPI_Init{_thread} that
> would explicitly set these options. That starts to sound like the
> assertion ticket that Terry mentioned. So maybe they can be merged or
> revised.
> 
> 
> I am also a bit concerned about having conditional semantics in the
> MPI standard. Though the FT proposal is founded in the condition that
> the semantics are only meaningful when the error handler is not
> ARE_FATAL, which is conditional. So I am a bit torn on this point.
> 
> 
> One thing that your proposal should clearly specify is whether the
> specified bits must be set to the same value at all processes/threads.
> Additionally, what if two MPI_COMM_WORLDs connect/accept but have
> different bits set? Does that restrict how these two world can
> interact? This was one of the problems posed for the
> Failhandler_set_mode() semantics that still needs to be addressed
> here, but in a more general sense.
> 
> 
> So I think it is an interesting proposal worth considering further.
> Setting options like 'enable FT' at initialization time (or just
> before initialization time) might allow the MPI implementation to
> optimize the library appropriately during setup (choosing different
> components or algorithms). It might be worth looking at the assertions
> proposal to see if there is a viable alternative solution there that
> would achieve the same goals as this proposal without repurposing the
> required/provided arguments of MPI_Init_thread.
> 
> 
> -- Josh 
> 
> 
> 
> On Wed, Jan 18, 2012 at 9:32 AM, TERRY DONTJE <
> terry.dontje at oracle.com > wrote:
> 
> 
> 
> I think the idea is worthwhile but it really smells similar to the
> defunct assertion ticket.  I really find piggy-backing the ft, cancel
> and any_source modes onto the required/provided bits a little
> unpleasing to my senses.  The reason I am displeased with the proposal
> is it seems to slightly open a door to give an application the ability
> to give hints and if we are going to do that we might as well open the
> door fully and allow vendor specific hints.  Doing the latter will
> require more than the require/provided bits.
> 
> The above aside, if the proposal is passed I guess my only other
> comment is the of moving the MPI_INIT_THREAD & MPI_QUERY_THREAD to 8.7
> (startup) seems odd to me.   I guess I can see the reasoning of moving
> the interface to the Startup section but then the threadsafety portion
> of 12.4.3 section seems to stick out strangely IMO. 
>  
> --td
> 
> 
> 
> On 1/17/2012 5:31 AM, Rolf Rabenseifner wrote:
> 
> Dear committees of - FT, - MPI_Init --> 8. Environmental Management, -
> MPI_Init_thread --> 12. External Interfaces. Before discussing
> details, I would like to get a clear answer whether you believe that
> the proposal below is a good or bad idea. As already mentioned at the
> Jan. 2012 meeting, I would like to propose that the FT group may
> substitute the unclear collective behavior of MPI_Failhandler_set_mode
> by adding the mode to the MPI initialization. For this, I added a
> proposal to slide 4 in
> MPI_Forum_Overview_MPI-3.0_Jan2012_action-items.ppt (see my previous
> mail to the MPI-Forum list) - If appropriate, a new ticket that
> enhances MPI_INIT_THREAD -- Required and provided as “bit vector” of
> "bit-wise OR" of required_/provided_threadsafety |
> required_/provided_ft_mode | required_/provided_cancel_mode |
> required_/provided_any_source_mode -- New mask-constants
> MPI_THREAD_MASK, MPI_FT_MASK, MPI_CANCEL_MASK, MPI_ANY_SOURCE_MASK --
> With existing values for required_/provided_threadsafety
> MPI_THREAD_SINGLE, MPI_THREAD_FUNNELED, MPI_THREAD_SERIALIZED, and
> MPI_THREAD_MULTIPLE. -- With new values for -
> required_/provided_ft_mode = MPI_FT_NONE=0, or
> MPI_FT_FAILHANDLER_MODE_ALL≠0, or MPI_FT_FAILHANDLER_MODE_SUBSET≠0, or
> - required_/provided_cancel_mode = MPI_CANCEL_ALLOWED=0, or
> MPI_NO_CANCEL≠0 - required_/provided_any_source_mode =
> MPI_ANY_SOURCE_ALLOWED=0, or MPI_NO_ANY_SOURCE≠0 -- Values must be set
> identical for all processes in an MPI_COMM_WORLD - It is easier to
> relax about this in further versions of MPI than to relax already now
> and to restrict later as now done in ticket #222 for
> required_/provided_threadsafety - For each of the for "variables", a
> different decision can be done. -- At least for
> required_/provided_cancel_mode and ...any_source_mode, I would require
> that the provided value must be identical to the required value.
> Reason: Internally, the value can be ignored. -- For
> required_/provided_ft_mode, I would recommend to allow that
> provided_ft_mode must be - identical to required_ft_mode or
> MPI_FT_NONE - and the same in all processes. -- MPI_INIT_THREAD &
> MPI_QUERY_THREAD moves - from 12.4.3 (External Interfaces) - to 8.7
> (Startup) - but explanations to ...threadsafety | ...ft_mode |
> ...cancel_mode | ...any_source_mode are kept or written in the
> appropriate sections 12.4.3, new 17.5 (FT Environm.), 3.8.4 (Cancel),
> 3.2.4 (Blocking Receive) -- A call to MPI_INIT is identical to
> MPI_INIT_THREAD with - the rules in 12.4.3 about
> required_/provided_threadsafety - required_/provided_ft_mode =
> MPI_FT_NONE, - required_/provided_cancel_mode = MPI_CANCEL_ALLOWED, -
> required_/provided_any_source_mode = MPI_ANY_SOURCE_ALLOWED -- This
> ticket would have the following properties: - It is clearly
> source-code and ABI backward compatible, because - the values of
> MPI_THREAD_SINGLE, _FUNNELED, ... need not to be changed and
> MPI_THREAD_SINGLE need not to be zero; - the values representing the
> current MPI-2.2 quality are set to zero: MPI_FT_NONE=0,
> MPI_CANCEL_ALLOWED=0, and MPI_ANY_SOURCE_ALLOWED=0 - It is very
> unlikely that an implementation has used more than 24 different bits
> in these 4 integer constants MPI_THREAD_SINGLE, ... _MULTIPLE.
> Therefore MPI_THREAD_MASK would have a maximum of 24 bits. Enough room
> for the 4 bits needed together for the other three ..._MASKs. - FT can
> be switched on or off at the MPI initialization and is switched off in
> unchanged applications. Therefore no backward-compatibility-problem
> with a modified behavior of the default error handlers when FT is
> switched on. - Normally there should be enough bit-space for further
> decisions at MPI initialization. - The decisions about the cancel and
> any_source values would be done in different tickets - FT quality is
> optional if we add the rule that provided_ft_mode may be identical to
> required_ft_mode ***or*** MPI_FT_NONE - This rule can be changed in a
> further version of MPI without backward-compatibility-problems. I
> would like to get a reply from - the FT group - the chapter committee
> of MPI_Init --> 8. Environmental Manag. George Bosilca(c), Josh
> Hursey, Terry Dontje - the chapter committee of MPI_Init_thread -->
> 12. External Interf. Bronis R. de Supinski(c), Pavan Balaji Before
> discussing details, I would like to get a clear answer whether you
> believe that this is a good or bad idea. Best regards Rolf
> 
> --
> 
> <Mail Attachment.gif>
> 
> 
> 
> 
> Terry D. Dontje | Principal Software Engineer
> 
> Developer Tools Engineering | +1.781.442.2631
> Oracle - Performance Technologies
> 95 Network Drive, Burlington, MA 01803
> Email terry.dontje at oracle.com
> 
> 
> 
> 
> 
> 
> 
> --
> Joshua Hursey
> Postdoctoral Research Associate
> Oak Ridge National Laboratory
> http://users.nccs.gov/~jjhursey

-- 
Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner at hlrs.de
High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)




More information about the mpiwg-ft mailing list