[Mpi3-ft] Nonblocking Process Creation and Management

Richard Treumann treumann at us.ibm.com
Wed Jan 13 15:49:14 CST 2010


If somebody implemented JOIN without first implementing a basic
ACCEPT/CONNECT I do not think they made good use of their time.  JOIN
requires most of the ACCEPT/CONNECT logic anyway.  Much easier to implement
ACCEPT/CONNECT and then layer JOIN on top.

What JOIN does is allow a form of ACCEPT/CONNECT in an environment where
PUBLISH_NAME, LOOKUP_NAME, OPEN_PORT are not very usable.

Why deprecate something like JOIN that is simple to provide, harmless to
have and possibly useful?

You did not answer whether it is because there is something hard about
IJOIN and you want symmetry with IACCEPT and ICONNECT.


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363



                                                                                                                                           
  From:       "Supalov, Alexander" <alexander.supalov at intel.com>                                                                           
                                                                                                                                           
  To:         "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>                            
                                                                                                                                           
  Date:       01/13/2010 04:29 PM                                                                                                          
                                                                                                                                           
  Subject:    Re: [Mpi3-ft] Nonblocking Process Creation and Management                                                                    
                                                                                                                                           
  Sent by:    mpi3-ft-bounces at lists.mpi-forum.org                                                                                          
                                                                                                                                           





Thanks. Good points. Still, I think what you're really looking for is a way
to say "enough" before basically closing something down. This is less
general than the individual ability to CANCEL this and that at will at any
time: flexible, yes, but cumbersome enough to be asserted out in one of the
proposals.

Anyway, we may want to leave CANCEL alone for the moment. Let's get back to
JOIN. Are there any apps out there that use it still? I think this was a
hack to get around the temporary unavailability of the proper
accept/connect back then. If the hack has lived its useful life, we may
want to deprecate it now.

From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Richard Treumann
Sent: Wednesday, January 13, 2010 10:19 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management



An application that is really trying to do overlap of communication and
computation is likely to post Isends and Irecvs before entering a
computation step. If the computation step discovers the answer and another
iteration is not needed then why require all the sends and receives to be
done? The application already knows the data is useless.

A master/worker application may have an outstanding MPI_Irecv at each
worker with tag 1 to pick up workload and an outstanding MPI_Irecv with tag
2 that is looking for the "all done" message. When the "all done" shows up,
the workload MPI_Irecv needs to be completed before a disconnect can
proceed. Why make the master send a null workload to each worker just to
clear those obsolete MPI_Irecvs?

The standard has several points at which it states that all outstanding
sends and receives must be complete. If an Isend or Irecv has been posted
there are 2 ways to complete it: Make the matching Send or Recv happen or
call MPI_Cancel. The pair of operations MPI_Cancel; MPI_Wait will always
complete no matter what the other side does. As long as the application
does not care whether the data is delivered, this is a clean way to satisfy
the requirement that all outstanding sends and receives must be complete.

I have not been following the FT stuff but it seems like MPI_Cancel would
be useful there. If I have posted an MPI_Irecv from task 9 and then learned
task 9 is gone why wouldn't I want the option of doing an MPI_Cancel on the
MPI_Irecv request? That seems cleaner than any other way of getting rid of
the receive descriptor.


Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


Inactive hide details for "Supalov, Alexander" ---01/13/2010 03:52:11
PM---Thanks. Do we know any active app that uses the JOIN"Supalov,
Alexander" ---01/13/2010 03:52:11 PM---Thanks. Do we know any active app
that uses the JOIN still? If none, why the heck keep it afloat? I
                                                                           
                                                                           
 From:      "Supalov, Alexander" <alexander.supalov at intel.com>             
                                                                           
                                                                           
 To:        "MPI 3.0 Fault Tolerance and Dynamic Process Control working   
            Group" <mpi3-ft at lists.mpi-forum.org>                           
                                                                           
                                                                           
 Date:      01/13/2010 03:52 PM                                            
                                                                           
                                                                           
 Subject:   Re: [Mpi3-ft] Nonblocking Process Creation and Management      
                                                                           
                                                                           
 Sent by:   mpi3-ft-bounces at lists.mpi-forum.org                            
                                                                           





Thanks. Do we know any active app that uses the JOIN still? If none, why
the heck keep it afloat?

I meant CANCEL in all its varieties. Again, how many apps cannot live
without it?

From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Richard Treumann
Sent: Wednesday, January 13, 2010 9:48 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management


Why would JOIN be any less useful today than in the past? Why would you
want to deprecate it? It is a trivial bootstrap for the simplest form of
ACCEPT/CONNECT. It just hides the ACCEPT and CONNECT operations and lets
the user handle the complexity of identifying the two end points any way he
likes.

I never felt JOIN was needed very badly but it went in as a "what the heck"
decision and I do not see that anything has changed.

Is it because IJOIN seems more difficult than IACCEPT and ICONNECT in some
way?

When you say CANCEL should be deprecated, do you mean MPI_Cancel of an
outstanding ISEND/IRECV request or something else?

Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


Inactive hide details for "Supalov, Alexander" ---01/13/2010 02:53:18
PM---Thanks. I meant the JOIN per se as a way of establis"Supalov,
Alexander" ---01/13/2010 02:53:18 PM---Thanks. I meant the JOIN per se as a
way of establishing the communicator. Do we need that still? Wh
                                                                           
                                                                           
 From:     "Supalov, Alexander" <alexander.supalov at intel.com>              
                                                                           
                                                                           
 To:       "MPI 3.0 Fault Tolerance and Dynamic Process Control working    
           Group" <mpi3-ft at lists.mpi-forum.org>                            
                                                                           
                                                                           
 Date:     01/13/2010 02:53 PM                                             
                                                                           
                                                                           
 Subject:  Re: [Mpi3-ft] Nonblocking Process Creation and Management       
                                                                           
                                                                           
 Sent by:  mpi3-ft-bounces at lists.mpi-forum.org                             
                                                                           




Thanks. I meant the JOIN per se as a way of establishing the communicator.
Do we need that still? What practically relevant cases can be provided to
justify its continuing existence? If there are none, we should rather
deprecate the JOIN and drop the IJOIN.

Good point on the CANCEL.

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
Sent: Wednesday, January 13, 2010 8:25 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management

Since join() does a handshake to create the new communicator, it
should be delayed by the remote side of the protocol. A nonblocking
version would allow the application to possibly do other computation
while waiting for the remote side.

As far as Cancel, I have been thinking that it might be useful for
Accept and Connect. Though with the normal problems of Cancel, I don't
know how to really specify it. I want to look into it a bit more
before next week to see if anything useful can be said of using Cancel
with Accept/Connect.

-- Josh

On Jan 13, 2010, at 2:01 PM, Supalov, Alexander wrote:

> Hi,
>
> Do we really need the IJOIN thing? I think the JOIN itself should be
> deprecated. Just as CANCEL, by the way.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org
> ] On Behalf Of Josh Hursey
> Sent: Tuesday, January 12, 2010 11:04 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: [Mpi3-ft] Nonblocking Process Creation and Management
>
> I extended and cleaned up the Nonblocking Process Creation and
> Management proposal on the wiki:
>    https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Async-proc-mgmt
>
> I added the rest of the nonblocking interface proposals, and touched
> up some of the language. I do not have an implementation yet, but will
> work on that next. There are a few items that I need to refine a bit
> still (e.g., MPI_Cancel, mixing of blocking and nonblocking), but this
> should give us a foundation to start from.
>
> I would like to talk about this next week during our working group
> slot at the MPI Forum meeting.
>
> Let me know what you think, and if you see any problems.
>
> Thanks,
> Josh
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/85750a74/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/85750a74/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/85750a74/attachment-0003.gif>


More information about the mpiwg-ft mailing list