[Mpi3-ft] Communicator Virtualization as a step forward
Greg Bronevetsky
bronevetsky1 at llnl.gov
Fri Feb 13 10:47:26 CST 2009
>Such statements are way to broad to be true. In fact it depends on
>what recovery mode was used. Please read the document I sent few
>emails ago, to see all the capabilities that FT-MPI provided.
I make this statement because the FT-MPI model requires all processes
to participate in recovery. At the very least, they need to
participate in recreating communicators. This may become quite bad if
we're using millions of processes. The problem I have is that this is
a property of the model, not one implementation of it.
>Again this is not true. First, one will need a kind of database to
>store this information (distributed or centralized) that came with its
>own scalability and cost problems. In addition, in the context of
Good point. The scalability and cost problems will absolutely need to
be studied. However, as Thomas also pointed out, layering the current
API on top of FT-MPI will be complex and will have very different
performance properties, making it useless for such a study.
>recovery the new processes will have to retrieve this information and
>let everybody else know not only their new contact information but the
>fact that they are back in the specified communicator. Unfortunately,
>this is [again] _NOT_ a local operation. The fact that you seems to
>plan to delegate these problems to the runtime environment, doesn't
>make it local nor more scalable.
You're right, any implementation would need to do a bunch of
additional communication operations in order to support the "local
rejoin" API, making it not truly local. However, the key difference
is that in FT-MPI the recovery must employ a series of global
collective operations that require all processes to synchronize. In
contrast, runtime support for the local rejoin option is much less
coupled. When a process sends a message to another process, it will
need to attach the receiver's expected rank in MPI_COMM_WORLD to the
message. If the receiver exists and has the right rank, message
delivery occurs fine. If not, the sender gets an error and needs to
ask the runtime environment for the correct physical address for the
given receiver rank. This system involves probably as much overall
communication as the everybody-synchronize approach but since the
communication is decoupled, it will have a much smaller hit on
performance. One thing I don't know is how to do the above for puts
and gets. Is it possible for RDMA hardware to do any verification of
the destination process?
Greg Bronevetsky
Post-Doctoral Researcher
1028 Building 451
Lawrence Livermore National Lab
(925) 424-5756
bronevetsky1 at llnl.gov
More information about the mpiwg-ft
mailing list