[Mpi3-ft] Nonblocking Process Creation and Management

Supalov, Alexander alexander.supalov at intel.com
Wed Jan 13 15:28:27 CST 2010


Thanks. Good points. Still, I think what you're really looking for is a way to say "enough" before basically closing something down. This is less general than the individual ability to CANCEL this and that at will at any time: flexible, yes, but cumbersome enough to be asserted out in one of the proposals.

Anyway, we may want to leave CANCEL alone for the moment. Let's get back to JOIN. Are there any apps out there that use it still? I think this was a hack to get around the temporary unavailability of the proper accept/connect back then. If the hack has lived its useful life, we may want to deprecate it now.

________________________________
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Richard Treumann
Sent: Wednesday, January 13, 2010 10:19 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management


An application that is really trying to do overlap of communication and computation is likely to post Isends and Irecvs before entering a computation step. If the computation step discovers the answer and another iteration is not needed then why require all the sends and receives to be done? The application already knows the data is useless.

A master/worker application may have an outstanding MPI_Irecv at each worker with tag 1 to pick up workload and an outstanding MPI_Irecv with tag 2 that is looking for the "all done" message. When the "all done" shows up, the workload MPI_Irecv needs to be completed before a disconnect can proceed. Why make the master send a null workload to each worker just to clear those obsolete MPI_Irecvs?

The standard has several points at which it states that all outstanding sends and receives must be complete. If an Isend or Irecv has been posted there are 2 ways to complete it: Make the matching Send or Recv happen or call MPI_Cancel. The pair of operations MPI_Cancel; MPI_Wait will always complete no matter what the other side does. As long as the application does not care whether the data is delivered, this is a clean way to satisfy the requirement that all outstanding sends and receives must be complete.

I have not been following the FT stuff but it seems like MPI_Cancel would be useful there. If I have posted an MPI_Irecv from task 9 and then learned task 9 is gone why wouldn't I want the option of doing an MPI_Cancel on the MPI_Irecv request? That seems cleaner than any other way of getting rid of the receive descriptor.


Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


[cid:653422421 at 13012010-135C]"Supalov, Alexander" ---01/13/2010 03:52:11 PM---Thanks. Do we know any active app that uses the JOIN still? If none, why the heck keep it afloat? I


From:
"Supalov, Alexander" <alexander.supalov at intel.com>

To:
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>

Date:
01/13/2010 03:52 PM

Subject:
Re: [Mpi3-ft] Nonblocking Process Creation and Management

Sent by:
mpi3-ft-bounces at lists.mpi-forum.org
________________________________



Thanks. Do we know any active app that uses the JOIN still? If none, why the heck keep it afloat?

I meant CANCEL in all its varieties. Again, how many apps cannot live without it?

________________________________
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Richard Treumann
Sent: Wednesday, January 13, 2010 9:48 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management

Why would JOIN be any less useful today than in the past? Why would you want to deprecate it? It is a trivial bootstrap for the simplest form of ACCEPT/CONNECT. It just hides the ACCEPT and CONNECT operations and lets the user handle the complexity of identifying the two end points any way he likes.

I never felt JOIN was needed very badly but it went in as a "what the heck" decision and I do not see that anything has changed.

Is it because IJOIN seems more difficult than IACCEPT and ICONNECT in some way?

When you say CANCEL should be deprecated, do you mean MPI_Cancel of an outstanding ISEND/IRECV request or something else?

Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


[cid:653422421 at 13012010-135C]"Supalov, Alexander" ---01/13/2010 02:53:18 PM---Thanks. I meant the JOIN per se as a way of establishing the communicator. Do we need that still? Wh

From:
"Supalov, Alexander" <alexander.supalov at intel.com>

To:
"MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>

Date:
01/13/2010 02:53 PM

Subject:
Re: [Mpi3-ft] Nonblocking Process Creation and Management

Sent by:
mpi3-ft-bounces at lists.mpi-forum.org
________________________________



Thanks. I meant the JOIN per se as a way of establishing the communicator. Do we need that still? What practically relevant cases can be provided to justify its continuing existence? If there are none, we should rather deprecate the JOIN and drop the IJOIN.

Good point on the CANCEL.

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
Sent: Wednesday, January 13, 2010 8:25 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management

Since join() does a handshake to create the new communicator, it
should be delayed by the remote side of the protocol. A nonblocking
version would allow the application to possibly do other computation
while waiting for the remote side.

As far as Cancel, I have been thinking that it might be useful for
Accept and Connect. Though with the normal problems of Cancel, I don't
know how to really specify it. I want to look into it a bit more
before next week to see if anything useful can be said of using Cancel
with Accept/Connect.

-- Josh

On Jan 13, 2010, at 2:01 PM, Supalov, Alexander wrote:

> Hi,
>
> Do we really need the IJOIN thing? I think the JOIN itself should be
> deprecated. Just as CANCEL, by the way.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [mailto:mpi3-ft-bounces at lists.mpi-forum.org
> ] On Behalf Of Josh Hursey
> Sent: Tuesday, January 12, 2010 11:04 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: [Mpi3-ft] Nonblocking Process Creation and Management
>
> I extended and cleaned up the Nonblocking Process Creation and
> Management proposal on the wiki:
>    https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Async-proc-mgmt
>
> I added the rest of the nonblocking interface proposals, and touched
> up some of the language. I do not have an implementation yet, but will
> work on that next. There are a few items that I need to refine a bit
> still (e.g., MPI_Cancel, mixing of blocking and nonblocking), but this
> should give us a foundation to start from.
>
> I would like to talk about this next week during our working group
> slot at the MPI Forum meeting.
>
> Let me know what you think, and if you see any problems.
>
> Thanks,
> Josh
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/16979853/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: graycol.gif
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/16979853/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: ecblank.gif
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/16979853/attachment-0003.gif>


More information about the mpiwg-ft mailing list