[Mpi3-ft] <no subject>

Joshua Hursey jjhursey at open-mpi.org
Wed Oct 7 11:37:09 CDT 2009

You can also get at these emails from the web archive:

-- Josh

On Oct 7, 2009, at 12:33 PM, Barrett, Richard F. wrote:

>> -----Original Message-----
>> From: Solt, David George
>> Sent: Tuesday, September 01, 2009 10:36 AM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: [Mpi3-ft] Questions about the the Proposed API document
>> Hi all,
>> Admittedly, I have missed a lot of discussion during recent months,  
>> so feel
>> free to ignore questions that have already been answered.
>> Thanks,
>> Dave
>>        a)  How is this different from using MPI_ERRORHANDLER_CREATE  
>> and
>> 1)  MPI_Comm_validate.
>>        a) Is it required that all callers return the same value for
>> failed_process_count and failed_ranks?
>>        b) If ranks are partitioned into two groups A and B such  
>> that all
>> ranks in A can communicate and all ranks in B can communicate, but  
>> a rank in A
>> cannot communicate with a rank in B, what should failed_ranks  
>> return for a
>> rank in A?  a rank in B?
>>        c) I was told that MPI_Comm_validate uses a phased system  
>> such that
>> the result of the call is based on the callers' states prior to the  
>> call or at
>> the start of the call but with the understanding that the results  
>> are not
>> guaranteed to be accurate at the return of the call.   Is this  
>> accurate?  If
>> so, can you show an example of where this call would either  
>> simplify an
>> application code or allow for a recovery case that would not be  
>> possible
>> without it?
>> 2) MPI_Comm_Ivalidate.
>>        a) Is there a way to wait on the resulting request in such a  
>> way that
>> you can access the failed_process_count and failed_ranks data?
>> 3) MPI_Comm_restorable.
>>        a) Does this return count=0 for a rank that is already a  
>> member of the
>> original application launched MPI_Comm_world?
>>        The following assumes that the answer to the above question  
>> is yes:
>> In order for this to have the data necessary, a "replacement"  
>> process must be
>> created through MPI_Comm_restore (i.e. user's can't bring their own  
>> singletons
>> into existence through a scheduler, etc.)
>> 4) MPI_Comm_rejoin.
>>        a) Is this intended only to be used by a process that was not
>> previously a member of comm_names and the caller replaces an exited  
>> rank that
>> was a member of comm_names?
>>        b) Must MPI_Comm_rejoin and MPI_Comm_restore be used in  
>> matching way
>> between existing ranks and newly created ranks?  If ranks A and B  
>> call
>> MPI_Comm_restore, which creates a new replacement rank C, will the  
>> call to
>> MPI_Comm_restore hang until MPI_Comm_rejoin is called by C?
>> 5) MPI_Comm_restore.
>>        a) Does this create processes (I have assumed so in Q#4b  
>> above)?   If
>> so, I suggest that we learn from the problem with MPI_Comm_spawn  
>> from MPI-2
>> that interaction with a scheduler should be considered as we  
>> develop the API.
>> 6) MPI_Comm_proc_gen/MPI_Comm_gen
>>        a) The name MPI_Comm_proc_gen seems like it should be  
>> MPI_Proc_gen.
>> I see that all other routines are prefixed with MPI_Comm_, but I  
>> think that
>> they all genuinely involve aspects of a communicator except for  
>> this one.
>>        b) Ranks 0,1,2 are part of a restorable communicator C.   
>> Rank 2 dies.
>> Rank 0 calls MPI_Comm_restore.   Is rank 1 obligated to make any  
>> calls before
>> using communicator C successfully?   What will MPI_Comm_proc_gen  
>> return for
>> rank 0? 1?  2?   What will MPI_Comm_gen return for rank 0? 1? 2?
>> 7) General question:
>>        a) If rank x fails to communicate using point-to-point  
>> communication
>> to rank y over communicator C, is it guaranteed that any collective  
>> call made
>> by rank x or y on communicator C will immediately fail (even if the  
>> path
>> between x and y is not used for the collective)?  (or is it up to the
>> implementation)
>> Some more questions for us to think about.  It is quite possible  
>> that I have
>> some fundamental flaws in my thinking that make some, many or all  
>> of these
>> questions invalid.  So, I ask that if anyone sees a basic fallacy  
>> in my view
>> of how these calls are intended to work that you point that out to  
>> me first
>> and I can review my questions and see if there are still issues  
>> that do not
>> make sense to me.
>> 8) If MPI_Comm_rejoin passes MPI_COMM_WORLD, then does it really  
>> change its
>> size of MPI_COMM_WORLD?
>> 9) Why does MPI_Comm_restore take an array of ranks to restore?    
>> Shouldn't
>> the restored ranks be based on MPI_PROC_RESTORE_POLICY? Or maybe a  
>> better way
>> to ask: "What call does MPI_PROC_RESTORE_POLICY influence?"
>> 10) The API document says this regarding MPI_Comm_Irestore: "It is  
>> local in
>> scope, and thus restores local communications (point-to-point, one- 
>> sided,
>> data-type creation, etc.), but not collective communications."  If  
>> this is
>> true, then how do you restore collective communications?   Can you  
>> then go on
>> to collectively call MPI_Comm_restore_all?   If you do, would every  
>> rank need
>> to specify that no new ranks are to be created since they have  
>> already been
>> created by the earlier call to MPI_Comm_restore?  Also, I don't  
>> think it is
>> *really* local in scope, if it was, there would be no reason to  
>> have a
>> non-blocking version.
>> 11) MPI_COMM_REJOIN - It seems like the resulting communicator  
>> should be
>> collective-capable if the calling process was created through a  
>> call to
>> MPI_Comm_restore_all and not collective-capable if created through  
>> a call to
>> MPI_Comm_restore?  If we go with that, there should be a way for  
>> the caller of
>> MPI_Comm_rejoin to know the status of the communicator with respect  
>> to
>> collectives.
>> 12) MPI_Comm_restore specifies ranks of communicators that should  
>> be restored.
>> I assume it will block until the restored ranks call  
>> MPI_Comm_rejoin on those
>> communicators?   (I say that because of the line  
>> "[MPI_Comm_restore ] is local
>> in scope, and thus restores local communications...".  Restoring  
>> local
>> communications to who?   I assume to the process created by  
>> MPI_Comm_restore?
>> If it does not block, how does it know when it can safely talk to  
>> the restored
>> ranks using any of the communicators they are in?   So, I assume it  
>> blocks.
>> That seems to imply that the restored rank MUST call  
>> MPI_Comm_rejoin on the
>> communicator referenced in its creation.   If rank 0 calls  
>> MPI_Comm_restore()
>> and passes in Rank 3 of communicator FOO, then the restored process  
>> must call
>> MPI_Comm_rejoin on communicator FOO.  But when the restored rank 3  
>> calls
>> MPI_Comm_recoverable, it could return several communicators and  
>> rank 3 has no
>> idea that it MUST call MPI_Comm_rejoin on some, but is not req!
>> uired to call MPI_Comm_rejoin on others?
>> 13) What does MPI_COMM_WORLD look like before the new process calls
>> MPI_COMM_REJOIN.  If the process was created through a call to
>> MPI_Comm_restore that specified multiple ranks to be restored, are  
>> all of
>> those ranks together in an MPI_COMM_WORLD until they call  
>> MPI_Comm_rejoin?  Is
>> the MPI_Comm_rejoin call collective across all of those newly created
>> processes or can they all call one at a time at their leisure?
>> 14) Is there anything we are proposing with MPI_Comm_rejoin/restore  
>> that
>> cannot be accomplished with MPI_Comm_spawn, MPI_Comm_merge?  The  
>> only thing I
>> can think of is that MPI_COMM_WORLD cannot be "fixed" using
>> MPI_Comm_spawn/merge, but only because it is a constant.
>> 15) ranks_to_restore struct is not defined in the version of API I  
>> have.
>> 16) MPI_Comm_restore seems to be based on the idea that some ranks  
>> have
>> exited.   What if rank A cannot talk to rank B, but rank B still  
>> exists and
>> can talk to rank C?  What does it mean to restore a rank in this  
>> case?  None
>> of the ranks are gone, they are just having communication  
>> problems.   It seems
>> like there should be some way to come up with a failure free set of  
>> ranks such
>> that all the ranks in the set can communicate across all process  
>> pairs.
>> 17) Ranks 0, 1, & 2 are in Comm FOO. Rank 2 dies.   Rank 0 calls
>> MPI_Comm_restore({FOO,2}) and can now communicate with 2 once again  
>> using
>> point-to-point calls?   Is there a way that 1 can ever restore  
>> communication
>> to the new rank 2?   I believe the only way is that all ranks  
>> (including the
>> new rank 2) collectively call MPI_Comm_restore({})?  I'm not sure  
>> that is a
>> problem, but I wanted to check my understanding of how these calls  
>> work.
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

More information about the mpiwg-ft mailing list