[Mpi3-ft] <no subject>
Joshua Hursey
jjhursey at open-mpi.org
Wed Oct 7 11:37:09 CDT 2009
You can also get at these emails from the web archive:
http://lists.mpi-forum.org/mpi3-ft/2009/09/0382.php
http://lists.mpi-forum.org/mpi3-ft/2009/09/0383.php
-- Josh
On Oct 7, 2009, at 12:33 PM, Barrett, Richard F. wrote:
>
>> -----Original Message-----
>> From: Solt, David George
>> Sent: Tuesday, September 01, 2009 10:36 AM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: [Mpi3-ft] Questions about the the Proposed API document
>>
>> Hi all,
>>
>> Admittedly, I have missed a lot of discussion during recent months,
>> so feel
>> free to ignore questions that have already been answered.
>>
>> Thanks,
>> Dave
>>
>>
>> 0) MPI_ERROR_REPORTING_FN
>>
>> a) How is this different from using MPI_ERRORHANDLER_CREATE
>> and
>> MPI_COMM_ERRHANDLER_SET
>>
>> 1) MPI_Comm_validate.
>>
>> a) Is it required that all callers return the same value for
>> failed_process_count and failed_ranks?
>
>
>
>> b) If ranks are partitioned into two groups A and B such
>> that all
>> ranks in A can communicate and all ranks in B can communicate, but
>> a rank in A
>> cannot communicate with a rank in B, what should failed_ranks
>> return for a
>> rank in A? a rank in B?
>>
>> c) I was told that MPI_Comm_validate uses a phased system
>> such that
>> the result of the call is based on the callers' states prior to the
>> call or at
>> the start of the call but with the understanding that the results
>> are not
>> guaranteed to be accurate at the return of the call. Is this
>> accurate? If
>> so, can you show an example of where this call would either
>> simplify an
>> application code or allow for a recovery case that would not be
>> possible
>> without it?
>>
>> 2) MPI_Comm_Ivalidate.
>>
>> a) Is there a way to wait on the resulting request in such a
>> way that
>> you can access the failed_process_count and failed_ranks data?
>>
>> 3) MPI_Comm_restorable.
>>
>> a) Does this return count=0 for a rank that is already a
>> member of the
>> original application launched MPI_Comm_world?
>> The following assumes that the answer to the above question
>> is yes:
>> In order for this to have the data necessary, a "replacement"
>> process must be
>> created through MPI_Comm_restore (i.e. user's can't bring their own
>> singletons
>> into existence through a scheduler, etc.)
>>
>> 4) MPI_Comm_rejoin.
>>
>> a) Is this intended only to be used by a process that was not
>> previously a member of comm_names and the caller replaces an exited
>> rank that
>> was a member of comm_names?
>>
>> b) Must MPI_Comm_rejoin and MPI_Comm_restore be used in
>> matching way
>> between existing ranks and newly created ranks? If ranks A and B
>> call
>> MPI_Comm_restore, which creates a new replacement rank C, will the
>> call to
>> MPI_Comm_restore hang until MPI_Comm_rejoin is called by C?
>>
>> 5) MPI_Comm_restore.
>>
>> a) Does this create processes (I have assumed so in Q#4b
>> above)? If
>> so, I suggest that we learn from the problem with MPI_Comm_spawn
>> from MPI-2
>> that interaction with a scheduler should be considered as we
>> develop the API.
>>
>> 6) MPI_Comm_proc_gen/MPI_Comm_gen
>>
>> a) The name MPI_Comm_proc_gen seems like it should be
>> MPI_Proc_gen.
>> I see that all other routines are prefixed with MPI_Comm_, but I
>> think that
>> they all genuinely involve aspects of a communicator except for
>> this one.
>>
>> b) Ranks 0,1,2 are part of a restorable communicator C.
>> Rank 2 dies.
>> Rank 0 calls MPI_Comm_restore. Is rank 1 obligated to make any
>> calls before
>> using communicator C successfully? What will MPI_Comm_proc_gen
>> return for
>> rank 0? 1? 2? What will MPI_Comm_gen return for rank 0? 1? 2?
>>
>> 7) General question:
>>
>> a) If rank x fails to communicate using point-to-point
>> communication
>> to rank y over communicator C, is it guaranteed that any collective
>> call made
>> by rank x or y on communicator C will immediately fail (even if the
>> path
>> between x and y is not used for the collective)? (or is it up to the
>> implementation)
>>
>> Some more questions for us to think about. It is quite possible
>> that I have
>> some fundamental flaws in my thinking that make some, many or all
>> of these
>> questions invalid. So, I ask that if anyone sees a basic fallacy
>> in my view
>> of how these calls are intended to work that you point that out to
>> me first
>> and I can review my questions and see if there are still issues
>> that do not
>> make sense to me.
>>
>> 8) If MPI_Comm_rejoin passes MPI_COMM_WORLD, then does it really
>> change its
>> size of MPI_COMM_WORLD?
>>
>> 9) Why does MPI_Comm_restore take an array of ranks to restore?
>> Shouldn't
>> the restored ranks be based on MPI_PROC_RESTORE_POLICY? Or maybe a
>> better way
>> to ask: "What call does MPI_PROC_RESTORE_POLICY influence?"
>>
>> 10) The API document says this regarding MPI_Comm_Irestore: "It is
>> local in
>> scope, and thus restores local communications (point-to-point, one-
>> sided,
>> data-type creation, etc.), but not collective communications." If
>> this is
>> true, then how do you restore collective communications? Can you
>> then go on
>> to collectively call MPI_Comm_restore_all? If you do, would every
>> rank need
>> to specify that no new ranks are to be created since they have
>> already been
>> created by the earlier call to MPI_Comm_restore? Also, I don't
>> think it is
>> *really* local in scope, if it was, there would be no reason to
>> have a
>> non-blocking version.
>>
>> 11) MPI_COMM_REJOIN - It seems like the resulting communicator
>> should be
>> collective-capable if the calling process was created through a
>> call to
>> MPI_Comm_restore_all and not collective-capable if created through
>> a call to
>> MPI_Comm_restore? If we go with that, there should be a way for
>> the caller of
>> MPI_Comm_rejoin to know the status of the communicator with respect
>> to
>> collectives.
>>
>> 12) MPI_Comm_restore specifies ranks of communicators that should
>> be restored.
>> I assume it will block until the restored ranks call
>> MPI_Comm_rejoin on those
>> communicators? (I say that because of the line
>> "[MPI_Comm_restore ] is local
>> in scope, and thus restores local communications...". Restoring
>> local
>> communications to who? I assume to the process created by
>> MPI_Comm_restore?
>> If it does not block, how does it know when it can safely talk to
>> the restored
>> ranks using any of the communicators they are in? So, I assume it
>> blocks.
>> That seems to imply that the restored rank MUST call
>> MPI_Comm_rejoin on the
>> communicator referenced in its creation. If rank 0 calls
>> MPI_Comm_restore()
>> and passes in Rank 3 of communicator FOO, then the restored process
>> must call
>> MPI_Comm_rejoin on communicator FOO. But when the restored rank 3
>> calls
>> MPI_Comm_recoverable, it could return several communicators and
>> rank 3 has no
>> idea that it MUST call MPI_Comm_rejoin on some, but is not req!
>> uired to call MPI_Comm_rejoin on others?
>>
>> 13) What does MPI_COMM_WORLD look like before the new process calls
>> MPI_COMM_REJOIN. If the process was created through a call to
>> MPI_Comm_restore that specified multiple ranks to be restored, are
>> all of
>> those ranks together in an MPI_COMM_WORLD until they call
>> MPI_Comm_rejoin? Is
>> the MPI_Comm_rejoin call collective across all of those newly created
>> processes or can they all call one at a time at their leisure?
>>
>> 14) Is there anything we are proposing with MPI_Comm_rejoin/restore
>> that
>> cannot be accomplished with MPI_Comm_spawn, MPI_Comm_merge? The
>> only thing I
>> can think of is that MPI_COMM_WORLD cannot be "fixed" using
>> MPI_Comm_spawn/merge, but only because it is a constant.
>>
>> 15) ranks_to_restore struct is not defined in the version of API I
>> have.
>>
>> 16) MPI_Comm_restore seems to be based on the idea that some ranks
>> have
>> exited. What if rank A cannot talk to rank B, but rank B still
>> exists and
>> can talk to rank C? What does it mean to restore a rank in this
>> case? None
>> of the ranks are gone, they are just having communication
>> problems. It seems
>> like there should be some way to come up with a failure free set of
>> ranks such
>> that all the ranks in the set can communicate across all process
>> pairs.
>>
>> 17) Ranks 0, 1, & 2 are in Comm FOO. Rank 2 dies. Rank 0 calls
>> MPI_Comm_restore({FOO,2}) and can now communicate with 2 once again
>> using
>> point-to-point calls? Is there a way that 1 can ever restore
>> communication
>> to the new rank 2? I believe the only way is that all ranks
>> (including the
>> new rank 2) collectively call MPI_Comm_restore({})? I'm not sure
>> that is a
>> problem, but I wanted to check my understanding of how these calls
>> work.
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
More information about the mpiwg-ft
mailing list