[Mpi3-ft] docs on CP/R with RDMA fabrics
mike.heffner at evergrid.com
Fri Jun 6 09:47:31 CDT 2008
On the last teleconf we primarily gave a brief background on our
Availability Services and the fabrics we support today. We described
briefly how we integrate with MVAPICH to support CP/R over zero-copy
RDMA with low overhead.
As our goal is to remain transparent from an application standpoint our
plan right now is to present how we integrate below the MPI API. We are
not currently proposing any new MPI interfaces, but as a company we are
interested in following the MPI FT discussions and providing
feedback/insight into the proposals. Our interest would be in following
the discussions to see how applications that were written to take
advantage of the new MPI FT interfaces could leverage our CP/R solution
and how that may best serve the FT design decisions.
As a company we'll also be pursuing various partner relationships with
the popular MPI vendors to discuss integration opportunities. I'd
imagine most work would go on below the hood at first, but hopefully we
could form some type of formal integration design that would assist in
future MPI stack integration.
Josh Hursey wrote:
> I was not on the last teleconf so this might have been covered there,
> but just to make sure I understand the later two proposals you sent.
> You are not proposing any MPI interfaces for checkpoint/restart, but
> just describing how you implemented a transparent solution inside or
> below an MPI implementation. Is that correct?
> -- Josh
> On Jun 5, 2008, at 5:44 PM, Mike Heffner wrote:
>> [3rd try around, mailing list bounce]
>> Attached are the documents I promised during the previous conference
>> The IPDPS 2007 paper gives a technical description of the
>> Availability Services product (known academically as "DejaVu")
>> including the online logging algorithm used for BSD socket
>> applications. It also includes some performance numbers from
>> previous experiments with our RDMA MVAPICH implementation.
>> The avs_mpi_integration.pdf document provides a brief description of
>> the interfaces that were required to integrate a userlevel CP/R
>> framework with MVAPICH/RDMA. This provides an insight into what we
>> required to integrate with a real-world RDMA MPI stack to provide CP/
>> R with very little overhead.
>> The third document is one I wrote this afternoon to propose an
>> asynchronous, quiescence interface. It is similar to Joshua's
>> proposal on the wiki, but provides an asynchronous, driver-level
>> version that our solution requires for application transparency.
>> I will try to get some of these onto the wiki as well.
>> Mike Heffner <mike.heffner at evergrid.com>
>> EverGrid Software
>> Blacksburg, VA USA
>> Voice: (540) 443-3500 x603
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
Mike Heffner <mike.heffner at evergrid.com>
Blacksburg, VA USA
Voice: (540) 443-3500 x603
More information about the mpiwg-ft