[Mpi3-ft] one sided

Josh Hursey jjhursey at open-mpi.org
Fri Oct 21 17:21:54 CDT 2011


As an aside: We will probably be talking more about the communication
objection creation semantics a bit more next week along with the
one-sided FT semantics in general.

-- Josh

On Fri, Oct 21, 2011 at 6:08 PM, Sur, Sayantan <sayantan.sur at intel.com> wrote:
> Thanks for the clarification, Josh. It makes sense to me.
>
>> -----Original Message-----
>> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
>> Sent: Friday, October 21, 2011 2:28 PM
>> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
>> Subject: Re: [Mpi3-ft] one sided
>>
>> Since MPI is creating a new communication object, we require that all
>> processes have access to the object if it was created anywhere.
>>
>> So consider if the 'win' object was created at some processes and not
>> others. The application with the valid 'win' calls MPI_Win_fence().
>> Other processes in the group associated with the window will not be to
>> call MPI_Win_fence since they do not have a valid object. The program
>> is erroneous since not all processes are calling the collective. So
>> the semantics become muddled when we talk about collective operations
>> (even like MPI_Win_free) when not all processes are guaranteed to have
>> a valid communication object to use.
>>
>> So we just need the requirement that the object is either created
>> everywhere or nowhere. Since we can only make statements about the
>> behavior of MPI after the MPI_ERR_PROC_FAIL_STOP error code, we
>> restrict the language to just that error code. Though it could be
>> argued that this is a more general requirement, but that is slightly
>> out of scope for this proposal.
>>
>> Does that help clarify?
>>
>> -- Josh
>>
>> On Fri, Oct 21, 2011 at 4:54 PM, Sur, Sayantan <sayantan.sur at intel.com>
>> wrote:
>> > Hi All,
>> >
>> > The new chapter says this about the window creation:
>> >
>> > "If the MPI_WIN_CREATE operation fails at any live process due to a
>> process failure, then the operation must fail at every live process
>> with an error in the class MPI_ERR_PROC_FAIL_STOP."
>> >
>> > I'm wondering what would happen if MPI_WIN_CREATE did not have this
>> qualification at all. i.e. it would succeed at some processes and fail
>> at some processes. After all, any following GET or PUT calls can always
>> raise the error class MPI_ERR_PROC_FAIL_STOP. Also, the communicator
>> passed to MPI_WIN_CREATE is allowed to have dead processes in it ...
>> then why qualify win create with this requirement?
>> >
>> > Thanks,
>> > Sayantan.
>> >
>> >> -----Original Message-----
>> >> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> >> bounces at lists.mpi-forum.org] On Behalf Of Pavan Balaji
>> >> Sent: Wednesday, October 19, 2011 9:05 AM
>> >> To: mpi3-ft at lists.mpi-forum.org
>> >> Subject: Re: [Mpi3-ft] one sided
>> >>
>> >>
>> >> FYI, you cannot "require" some behavior from MPI through an info
>> >> argument. It is perfectly legitimate for the MPI implementation to
>> >> completely ignore any info arguments passed. They are just user
>> hints.
>> >>
>> >>   -- Pavan
>> >>
>> >> On 10/19/2011 07:59 AM, Josh Hursey wrote:
>> >> > Let's be sure to talk about this on today's call. I have some
>> other
>> >> > one-sided notes that I would like to go over as well.
>> >> >
>> >> > It would be fairly easy to support both modes since the
>> >> MPI_Win_create
>> >> > operation takes an info argument. We could define a key (similar
>> to
>> >> > what they have done for other operations) that either loosens or
>> >> > tightens the semantics depending on what the default behavior
>> should
>> >> > be.
>> >> >
>> >> > I think it is ok to have a non-synchronizing option, just as long
>> as
>> >> > we have clear semantics for when the window is not created at all
>> >> > processes due to some process failure - or if the window is always
>> >> > created regardless of emerging failure then we might avoid this
>> >> issue,
>> >> > but that might require some additional clarification.
>> >> >
>> >> > Thanks,
>> >> > Josh
>> >> >
>> >> > On Wed, Oct 19, 2011 at 4:11 AM, Supalov, Alexander
>> >> > <alexander.supalov at intel.com>  wrote:
>> >> >> Thanks. Why not having two calls or modes of operation to cover
>> >> both?
>> >> >>
>> >> >> -----Original Message-----
>> >> >> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-
>> >> bounces at lists.mpi-forum.org] On Behalf Of Darius Buntinas
>> >> >> Sent: Tuesday, October 18, 2011 9:57 PM
>> >> >> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working
>> >> Group
>> >> >> Subject: [Mpi3-ft] one sided
>> >> >>
>> >> >>
>> >> >> I got some feedback from Jim and Pavan on the one-sided section.
>> >> One thing Jim pointed out was that we don't want to make window
>> >> creation synchronizing, and the fail-or-succeed everywhere
>> requirement
>> >> would do that.
>> >> >>
>> >> >> If we say that window creation should not fail due to failed
>> >> processes, that would accomplish the same thing:  If a window is
>> >> created by a correct program, then it will succeed at all live
>> >> processes.  Note that if an incorrect program specifies invalid
>> >> parameters then the window creation may fail at some processes and
>> >> succeed at others, but this is what we already have today.
>> >> >>
>> >> >> However, it's possible that some implementations cannot satisfy
>> this
>> >> requirement because, e.g., they do collectives as part of the
>> >> operation.  So maybe we should have two options:
>> >> >>
>> >> >>   Either:
>> >> >>     window creation won't fail because if failed processes
>> >> >>   or
>> >> >>     window creation will either succeed or fail everywhere and if
>> >> window creation fails at
>> >> >>     any process it fails at every process
>> >> >>
>> >> >> -d
>> >> >>
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> mpi3-ft mailing list
>> >> >> mpi3-ft at lists.mpi-forum.org
>> >> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> >> >> -----------------------------------------------------------------
>> ---
>> >> ------------------
>> >> >> Intel GmbH
>> >> >> Dornacher Strasse 1
>> >> >> 85622 Feldkirchen/Muenchen, Deutschland
>> >> >> Sitz der Gesellschaft: Feldkirchen bei Muenchen
>> >> >> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes
>> Schwaderer
>> >> >> Registergericht: Muenchen HRB 47456
>> >> >> Ust.-IdNr./VAT Registration No.: DE129385895
>> >> >> Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052
>> >> >>
>> >> >>
>> >> >> _______________________________________________
>> >> >> mpi3-ft mailing list
>> >> >> mpi3-ft at lists.mpi-forum.org
>> >> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >>
>> >> --
>> >> Pavan Balaji
>> >> http://www.mcs.anl.gov/~balaji
>> >> _______________________________________________
>> >> mpi3-ft mailing list
>> >> mpi3-ft at lists.mpi-forum.org
>> >> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> >
>> > _______________________________________________
>> > mpi3-ft mailing list
>> > mpi3-ft at lists.mpi-forum.org
>> > http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>> >
>> >
>>
>>
>>
>> --
>> Joshua Hursey
>> Postdoctoral Research Associate
>> Oak Ridge National Laboratory
>> http://users.nccs.gov/~jjhursey
>>
>> _______________________________________________
>> mpi3-ft mailing list
>> mpi3-ft at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
>
>



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey




More information about the mpiwg-ft mailing list