[Mpi3-ft] docs on CP/R with RDMA fabrics

Mike Heffner mike.heffner at evergrid.com
Fri Jun 6 09:47:31 CDT 2008


Josh,

On the last teleconf we primarily gave a brief background on our 
Availability Services and the fabrics we support today. We described 
briefly how we integrate with MVAPICH to support CP/R over zero-copy 
RDMA with low overhead.

As our goal is to remain transparent from an application standpoint our 
plan right now is to present how we integrate below the MPI API. We are 
not currently proposing any new MPI interfaces, but as a company we are 
interested in following the MPI FT discussions and providing 
feedback/insight into the proposals. Our interest would be in following 
the discussions to see how applications that were written to take 
advantage of the new MPI FT interfaces could leverage our CP/R solution 
and how that may best serve the FT design decisions.

As a company we'll also be pursuing various partner relationships with 
the popular MPI vendors to discuss integration opportunities. I'd 
imagine most work would go on below the hood at first, but hopefully we 
could form some type of formal integration design that would assist in 
future MPI stack integration.


Cheers,

Mike


Josh Hursey wrote:
> Mike,
> 
> I was not on the last teleconf so this might have been covered there,  
> but just to make sure I understand the later two proposals you sent.  
> You are not proposing any MPI interfaces for checkpoint/restart, but  
> just describing how you implemented a transparent solution inside or  
> below an MPI implementation. Is that correct?
> 
> -- Josh
> 
> On Jun 5, 2008, at 5:44 PM, Mike Heffner wrote:
> 
>> [3rd try around, mailing list bounce]
>>
>> All,
>>
>> Attached are the documents I promised during the previous conference  
>> call.
>>
>> The IPDPS 2007 paper gives a technical description of the  
>> Availability Services product (known academically as "DejaVu")  
>> including the online logging algorithm used for BSD socket  
>> applications. It also includes some performance numbers from  
>> previous experiments with our RDMA MVAPICH implementation.
>>
>> The avs_mpi_integration.pdf document provides a brief description of  
>> the interfaces that were required to integrate a userlevel CP/R  
>> framework with MVAPICH/RDMA. This provides an insight into what we  
>> required to integrate with a real-world RDMA MPI stack to provide CP/ 
>> R with very little overhead.
>>
>> The third document is one I wrote this afternoon to propose an  
>> asynchronous, quiescence interface. It is similar to Joshua's  
>> proposal on the wiki, but provides an asynchronous, driver-level  
>> version that our solution requires for application transparency.
>>
>> I will try to get some of these onto the wiki as well.
>>
>>
>> Cheers,
>>
>> Mike
>>
>> -- 
>>
>>  Mike Heffner <mike.heffner at evergrid.com>
>>  EverGrid Software
>>  Blacksburg, VA USA
>>
>>  Voice: (540) 443-3500 x603
>> < 
>> ipdps07 
>> .pdf 
>> < 
>> avs_mpi_integration 
>> .pdf 
>> < 
>> evergrid_async_quiesce 
>> .pdf>_______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> 
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft



-- 

   Mike Heffner <mike.heffner at evergrid.com>
   EverGrid Software
   Blacksburg, VA USA

   Voice: (540) 443-3500 x603



More information about the mpiwg-ft mailing list