[Mpi3-ft] list of opaque objects and othe rmpi entities on the list
sayantan.sur at intel.com
Fri Dec 7 11:25:31 CST 2012
This is an important point that you mention. I think you raised in the WG meeting too, but I don't recall any deep discussion on this topic beyond that. I agree with your suggested approach that maybe it will be better to deal with the template at a global level first.
One idea could be that we create Info keys that are hints to the MPI lib that say the application would like it if these objects were local. E.g. it would like fast access to them at the cost of increased storage. The application therefore makes a conscious decision about this up front. However, setting info on an info object would be weird ;)
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
> bounces at lists.mpi-forum.org] On Behalf Of Adam T. Moody
> Sent: Thursday, December 06, 2012 4:53 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: Re: [Mpi3-ft] list of opaque objects and othe rmpi entities on the list
> The goal of this task seems a bit ill-defined to me. I think it'd be good to
> select one object type, have one person work through a draft, send it to the
> list, and then review that as a group to have a template of how to handle the
> remaining objects.
> Also, my guess is that we'll generally end up saying that any MPI object may
> be invalid after any failure, in fact even failures outside the scope of the
> user's MPI processes. This can be true for objects that are generally thought
> of being "local" to the MPI rank, such as information contained in an info
> object. For example, there is nothing preventing an MPI implementation
> from storing the contents of an info object on a remote node, such that
> simple key/value queries may not work after this node has failed. In fact you
> might have cases where all ranks can talk with one another just fine, but one
> particular rank can't read its info objects.
> At very large scales, certain objects like groups will have to be stored in a
> distributed manner, so that a call to inquire group information will have to
> access memory on a remote node which may have failed. Thus even a "local
> MPI call" like MPI_GROUP_TRANSLATE_RANKS might suddently start to
> return errors when it didn't before.
> Howard Pritchard wrote:
> >Hi Folks,
> >Here's the list of mpi opaque objects and a few additional constructs
> >for consideration of states in the presence of process failures:
> >communicators - Aourelian, Wesley
> >groups - Rich G.
> >data types - Sayantan
> >RMA windows - Howard
> >files (file handles) - Darius B.
> >info object - Darius
> >error handler - Darius
> >message obj. - David S.
> >request - Manjo
> >status - Manjo
> >op - Darius
> >port (mpi-2 dynamic) - David S.
> >user buffers attached to MPI for bsends - Sayantan
> >Need to define lifecycle of the object in the case of no process
> >failures, and in the case when one or more process failures occur while
> >the object exists.
> >mpi3-ft mailing list
> >mpi3-ft at lists.mpi-forum.org
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
More information about the mpiwg-ft