[Mpi3-ft] Nonblocking Process Creation and Management

Richard Treumann treumann at us.ibm.com
Wed Jan 13 15:19:01 CST 2010


An application that is really trying to do overlap of communication and
computation is likely to post Isends and Irecvs before entering a
computation step. If the computation step discovers the answer and another
iteration is not needed then why require all the sends and receives to be
done? The application already knows the data is useless.

A master/worker application may have an outstanding MPI_Irecv at each
worker with tag 1 to pick up workload and an outstanding MPI_Irecv with tag
2 that is looking for the "all done" message.  When the "all done" shows
up, the workload MPI_Irecv needs to be completed before a disconnect can
proceed. Why make the master send a null workload to each worker just to
clear those obsolete MPI_Irecvs?

The standard has several points at which it states that all outstanding
sends and receives must be complete. If an Isend or Irecv has been posted
there are 2 ways to complete it: Make the matching Send or Recv happen or
call MPI_Cancel.  The pair of operations MPI_Cancel; MPI_Wait will always
complete no matter what the other side does. As long as the application
does not care whether the data is delivered, this is a clean way to satisfy
the requirement that all outstanding sends and receives must be complete.

I have not been following the FT stuff but it seems like MPI_Cancel would
be useful there. If I have posted an MPI_Irecv from task 9 and then learned
task 9 is gone why wouldn't I want the option of doing an MPI_Cancel on the
MPI_Irecv request? That seems cleaner than any other way of getting rid of
the receive descriptor.


Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846         Fax (845) 433-8363



                                                                                                                                           
  From:       "Supalov, Alexander" <alexander.supalov at intel.com>                                                                           
                                                                                                                                           
  To:         "MPI 3.0 Fault Tolerance and Dynamic Process Control working Group" <mpi3-ft at lists.mpi-forum.org>                            
                                                                                                                                           
  Date:       01/13/2010 03:52 PM                                                                                                          
                                                                                                                                           
  Subject:    Re: [Mpi3-ft] Nonblocking Process Creation and Management                                                                    
                                                                                                                                           
  Sent by:    mpi3-ft-bounces at lists.mpi-forum.org                                                                                          
                                                                                                                                           





Thanks. Do we know any active app that uses the JOIN still? If none, why
the heck keep it afloat?

I meant CANCEL in all its varieties. Again, how many apps cannot live
without it?

From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Richard Treumann
Sent: Wednesday, January 13, 2010 9:48 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management



Why would JOIN be any less useful today than in the past? Why would you
want to deprecate it? It is a trivial bootstrap for the simplest form of
ACCEPT/CONNECT. It just hides the ACCEPT and CONNECT operations and lets
the user handle the complexity of identifying the two end points any way he
likes.

I never felt JOIN was needed very badly but it went in as a "what the heck"
decision and I do not see that anything has changed.

Is it because IJOIN seems more difficult than IACCEPT and ICONNECT in some
way?

When you say CANCEL should be deprecated, do you mean MPI_Cancel of an
outstanding ISEND/IRECV request or something else?

Dick Treumann - MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363


Inactive hide details for "Supalov, Alexander" ---01/13/2010 02:53:18
PM---Thanks. I meant the JOIN per se as a way of establis"Supalov,
Alexander" ---01/13/2010 02:53:18 PM---Thanks. I meant the JOIN per se as a
way of establishing the communicator. Do we need that still? Wh
                                                                           
                                                                           
 From:      "Supalov, Alexander" <alexander.supalov at intel.com>             
                                                                           
                                                                           
 To:        "MPI 3.0 Fault Tolerance and Dynamic Process Control working   
            Group" <mpi3-ft at lists.mpi-forum.org>                           
                                                                           
                                                                           
 Date:      01/13/2010 02:53 PM                                            
                                                                           
                                                                           
 Subject:   Re: [Mpi3-ft] Nonblocking Process Creation and Management      
                                                                           
                                                                           
 Sent by:   mpi3-ft-bounces at lists.mpi-forum.org                            
                                                                           





Thanks. I meant the JOIN per se as a way of establishing the communicator.
Do we need that still? What practically relevant cases can be provided to
justify its continuing existence? If there are none, we should rather
deprecate the JOIN and drop the IJOIN.

Good point on the CANCEL.

-----Original Message-----
From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org] On Behalf Of Josh Hursey
Sent: Wednesday, January 13, 2010 8:25 PM
To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
Subject: Re: [Mpi3-ft] Nonblocking Process Creation and Management

Since join() does a handshake to create the new communicator, it
should be delayed by the remote side of the protocol. A nonblocking
version would allow the application to possibly do other computation
while waiting for the remote side.

As far as Cancel, I have been thinking that it might be useful for
Accept and Connect. Though with the normal problems of Cancel, I don't
know how to really specify it. I want to look into it a bit more
before next week to see if anything useful can be said of using Cancel
with Accept/Connect.

-- Josh

On Jan 13, 2010, at 2:01 PM, Supalov, Alexander wrote:

> Hi,
>
> Do we really need the IJOIN thing? I think the JOIN itself should be
> deprecated. Just as CANCEL, by the way.
>
> Best regards.
>
> Alexander
>
> -----Original Message-----
> From: mpi3-ft-bounces at lists.mpi-forum.org [
mailto:mpi3-ft-bounces at lists.mpi-forum.org
> ] On Behalf Of Josh Hursey
> Sent: Tuesday, January 12, 2010 11:04 PM
> To: MPI 3.0 Fault Tolerance and Dynamic Process Control working Group
> Subject: [Mpi3-ft] Nonblocking Process Creation and Management
>
> I extended and cleaned up the Nonblocking Process Creation and
> Management proposal on the wiki:
>    https://svn.mpi-forum.org/trac/mpi-forum-web/wiki/Async-proc-mgmt
>
> I added the rest of the nonblocking interface proposals, and touched
> up some of the language. I do not have an implementation yet, but will
> work on that next. There are a few items that I need to refine a bit
> still (e.g., MPI_Cancel, mixing of blocking and nonblocking), but this
> should give us a foundation to start from.
>
> I would like to talk about this next week during our working group
> slot at the MPI Forum meeting.
>
> Let me know what you think, and if you see any problems.
>
> Thanks,
> Josh
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
> ---------------------------------------------------------------------
> Intel GmbH
> Dornacher Strasse 1
> 85622 Feldkirchen/Muenchen Germany
> Sitz der Gesellschaft: Feldkirchen bei Muenchen
> Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
> Registergericht: Muenchen HRB 47456 Ust.-IdNr.
> VAT Registration No.: DE129385895
> Citibank Frankfurt (BLZ 502 109 00) 600119052
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
>
> _______________________________________________
> mpi3-ft mailing list
> mpi3-ft at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft
---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.


_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft


---------------------------------------------------------------------
Intel GmbH
Dornacher Strasse 1
85622 Feldkirchen/Muenchen Germany
Sitz der Gesellschaft: Feldkirchen bei Muenchen
Geschaeftsfuehrer: Douglas Lusk, Peter Gleissner, Hannes Schwaderer
Registergericht: Muenchen HRB 47456 Ust.-IdNr.
VAT Registration No.: DE129385895
Citibank Frankfurt (BLZ 502 109 00) 600119052

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.
_______________________________________________
mpi3-ft mailing list
mpi3-ft at lists.mpi-forum.org
http://lists.mpi-forum.org/mailman/listinfo.cgi/mpi3-ft

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/db5aa2de/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/db5aa2de/attachment-0002.gif>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ecblank.gif
Type: image/gif
Size: 45 bytes
Desc: not available
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-ft/attachments/20100113/db5aa2de/attachment-0003.gif>


More information about the mpiwg-ft mailing list