[mpiwg-tools] Intel MPI Backend Breakpoint

Jeff Squyres (jsquyres) jsquyres at cisco.com
Wed Jul 26 12:35:21 CDT 2017


I know I'm a bit late to this party, but I was on vacation last week.  So let me chime in now...

1. I think it is unrealistic for the MPI Forum to insist on standardized tool APIs that begin with "MPI" and strictly adhere to MPI conventions.  Ralph is correct in that there are now *many* players in the parallel and distributed computation ecosystem; vendors now need to support more than just MPI these days.  And no one wants to have to support tools attachment that is different in each different parallel / distributed stack.  Hence, labeling a tools-related / attachment standard as "MPI" is a political non-starter.  Additionally, as the Sessions revolution is attempting to recognize, the run time is a system worthy of being a first-class citizen.  It's ok to let run time groups have their own standards, and then figure out how MPI will play nice with them (similar to trying to make OpenMP and MPI play nice with each other).

2. Let's be blunt: MPIR is going nowhere.  MPIR-1 supports basic functionality.  In the Tools WG, we have loosely talked about MPIR-2 for *years*, but nothing has happened.  That's led to (at least) 2 problems:

2a.  The rest of the world is evolving.  They want (need) to do things that MPIR-1 cannot do.  If the Tools WG is not going to advance MPIR, then others are going to.  ...and that is exactly what has happened (e.g., the PMIx project).

2b. Those in the HPC/MPI ecosystems have recognized MPIR's limits (including LLNL) and have asked MPI implementors and tools authors to add non-standard functionality to their MPIR implementations so that they can do some new things.  As Ralph correctly said, this makes fragile MPIR implementations that keep breaking because we've extended MPIR in different ways for specific tool X, for specific debugger Y, ...etc.  The irony here is that these tool-specific extensions were exactly what standardizing MPIR was supposed to prevent.  It hasn't.

Think of it this way: if MPI implementations gave you only exactly what was specified in the MPIR-1 doc, you might not be able to run the most modern tools/debuggers/etc.  Let me be totally clear: the MPIR-1 "standard" is not useful any more because the world -- including the HPC/MPI/MPIR communities -- has moved past MPIR-1 functionality.  Unfortunately, the MPI Forum Tools WG has not addressed that problem.  Others have therefore picked up the slack.

3. Given conflicting messages from the Forum (i.e., "MPIR is critically important!" / "We're not advancing MPIR functionality at all"), engineering efforts that have finite resources have to pick what they can support.  I think it's totally fair for the Open MPI community to publicly announce a definite timeline for when something new must happen.  If that announcement inspires the MPI Forum / Tools WG to produce MPIR2 with all the new functionality that modern tools need, great!  

But keep in mind that any MPIR2 effort is, by definition, way behind other efforts (e.g., PMIx).  Not only will MPIR2 need a new set of APIs, but also someone will need to start implementing those APIs (probably from scratch).  This will take time, resources, and money.

It may be time to reflect on the Tools WG's priorities.  If the Tools WG has not put any resources behind MPIR2 in the last several years, perhaps it would be better to simply recognize that, and recognize that other communities are now far ahead in this area.  As such, it may be better to work with other communities to see how MPI can play well with them.

Make sense?




> On Jul 19, 2017, at 9:28 AM, rhc at open-mpi.org wrote:
> 
> Hi Martin
> 
> This probably isn’t worth doing on a mailing list and could probably best be handled over the phone. Your information is simply incorrect, and I’m rather puzzled by it. For example, we setup PMIx as a separate project two years ago. 
> 
> I don’t understand your comment about the PMIx community resisting being a standard. As I said at SC, we simply don't feel it correct to be part of the MPI Forum standards body since PMIx isn’t part of MPI. We searched for an appropriate existing standards body for quite some time, but didn’t find a good fit.
> 
> Thus, we setup a corresponding standards process over a year ago, based on the IETF process, and follow it rigorously. We now have over 10 member organizations, with more participating in our several working groups (each focused on some area such as OpenMP/MPI coordination). We are in the process of generating an official standards document, but that is taking second priority to completing the API definitions and the reference implementation due to the timeline requirements of our participating members. The standard is very specifically written to be implementation independent. The fact that nobody has opted to write another implementation is solely because they don’t see any benefit from doing so.
> 
> Adoption is always a sensitive subject as I’m not authorized to speak publicly for other organizations. Suffice to say that your comments are out-of-date, and this is quickly becoming a non-issue. I’m happy to fill you in on it offline in private.
> 
> Like it or not, if we only had one person supporting collectives and they retire without anyone in the community interested in supporting them, then yes - they would have to be deprecated and unsupported. That’s how open source communities work. It won’t happen in that area simply because (a) there are multiple people involved, and (b) since it is core to MPI, someone will always step up.
> 
> Runtime support is a tougher problem because it isn’t a core part of the user-facing library. This is why there is so little support for it - the community’s organizations rightly prefer to focus on the user-facing portion of the library. Hence, getting people to work on the runtime at all is difficult, and we are still working thru who will pickup the various pieces. I very much doubt we will have one person fully committed to it as I have been.
> 
> I stand behind my comment about funding sources. DOE didn’t fund OMPI for years, and only recently funded an OMPI-related ECP which has limited scale. We appreciate their support, but it is far from providing for the broader community’s needs. Yes, we enjoy industry participation, and we have managed to remain a vibrant community (though we have our ups-and-downs, as we all do). However, as I said, the primary focus of the participating companies is squarely on the MPI portion of the library, and that isn’t what we are talking about here.
> 
> While it may be disappointing, the fact is that yes - enthusiasm for supporting MPIR is low in our community. There is far more energy and interest in PMIx (rightly or wrongly) due to the capabilities it enables and the role it plays in our member’s products. Thus, getting resources to assume responsibility for maintaining the ORTE-level “glue” is a much smaller problem and easier to resolve.
> 
> To be honest, given the above, I’m really not sure just what it is your so concerned about. Yes, the PMIx standard is not part of the MPI Forum, and the WG has gone around a couple of times over adopting it. The stumbling block, quite frankly, was Kathryn’s understanding that the Forum insists on defining the tools interface in terms of “MPI_foo” function calls. Your comments seem to indicate a welcome change in that thinking, so perhaps this can become a non-issue.
> 
> Other than that, we are pretty much doing what you say you want :-)
> 
> Ralph
> 
>> On Jul 18, 2017, at 9:50 PM, Martin Schulz <schulzm at llnl.gov> wrote:
>> 
>> Hi Ralph, all,
>> 
>> Comments inline
>> 
>>> On Jul 17, 2017, at 3:35 PM, rhc at open-mpi.org wrote:
>>> 
>>> I think you are lacking information and therefore misunderstanding the situation. Let me attempt to clarify.
>>> 
>>> We have been very cooperative and participated in this working group since it was formed. Some of the issues that have hampered progress relate to the misfit between the subject and the contextual environment of the WG - tools must interface to many libraries, not just MPI, 
>> 
>> MPIR actually does this already, even if it’s “just” an MPI driven standard. We, e.g., use it in SLURM for any parallel application independent of the programming model.
>> 
>>> and so it has become an increasingly awkward fit, as we have discussed within the WG and with you. For those and I’m sure a host of other reasons, the WG doesn’t appear to be making discernible progress towards a “standard” that we are told would be acceptable to the MPI Forum. It isn’t clear to me, at least, how things resolve to conclusion.
>> 
>> I think the WG was trying to have a very constructive discussion and devoted a lot of time to this, especially Kathryn, but at the end (at least that was my personal take on it and the two of us even discussed this in SLC) the PMIx group wasn’t quite ready to turn their proposal into a standard. PMIx is a single library that exports an API, it’s not a standard with a deliberate change process (you said yourself that the PMPIx team wanted to maintain the right to change the API as needed, which is counter to what we need for a standard), full community input from all vendors, and the option for multiple implementations. If you would be willing to go that direction, I think this could be a viable path with multiple options:
>> 
>> - Make PMIx a separate standard with its own general body (the MPI forum will happily share it’s rules and bylaws if this helps)
>> - Pull PMIx (as PMPIx) into the MPI standardization process as a new side document - we can easily create a new WG for this
>> - Turn PMIx into part of the MPI standard (which would mean renaming the API)
>> 
>> Either option could work, but it would mean that you would have to give up full control over this and that we have a much broader discussion. I am more than happy to help to facilitate this. This should be done, though, before existing APIs are removed, which our users heavily rely on.
>> 
>> I know standards work is slow and sometimes painful and unsatisfying in the short term, but it will pay off at the end.
>> 
>>> This move has nothing to do with PMIx itself or its current state. The primary motivating factor behind this decision is that I am retiring in two years, and am meanwhile taking on other responsibilities that are soaking up my time and reducing my ability to support OMPI. I have provided the runtime MPIR support in OMPI for 13 years. The MPIR interface is extremely fragile and continually breaking, especially the extensions that I personally implemented to support LLNL’s prior requests long before anyone accepted them in the overall MPIR “community". I quite simply no longer have time to support it, and certainly won’t support it after I retire!
>> 
>> While I understand this from a personal level, this is (IMHO) not a good reason from a project point of view - does this mean, if the person dealing with collectives retires, Open MPI won’t support collectives any more or won’t implement persistent collectives should they be added to the standard? Even though MPIR is in a side document, this doesn’t mean it is less important or less needed by our users. If we don’t have a portable way to debug MPI applications anymore, this could endanger the whole standard.
>> 
>>> Unlike other implementations, we are totally driven by individual developer contributions - we don’t have a DOE or corporation that directly funds this community. 
>> 
>> DOE currently funds both MPICH and Open MPI - I don’t know the exact dollar amounts from the top of my hat, but both projects are substantial (and Open MPI has much more support from industry partners). Even though I personally work with the OMPI-X team, I would say Open MPI has less to complain about than MPICH, which has been working hard to stay alive.
>> 
>> Btw. the reason why we (as in DOE and ECP) want two MPIs and are supporting two MPIs is for risk mitigation - i.e., if we run into bugs, we need to be able to switch MPI implementations for debugging - if now the debugging tool chains are no longer compatible what is the real value of two MPIs, why not just support one? If I would be responsible for the programming models in ECP, I would list Open MPI no longer supporting the only widely accepted and MPI forum approved debugging interface as a significant risk for exascale.
>> 
>>> After apprising the OMPI community of the situation, we asked if anyone was interested/willing to take on this responsibility. The answer was “NO”, except for Nathan Hjelm indicating he would try (not reassuring given everything on his plate).
>>> 
>>> I can fit support for the PMIx tool integration we have implemented, as developed in partnership with John last year, under my evolving responsibilities. This buys OMPI two years. It also makes it easier for others in the community to pickup tool support going forward as community members (e.g., IBM) are aggressively building PMIx-enabled tools.
>> 
>> This assumes that everyone accepts PMPIx as the solution and that is far from clear, especially as long as the API is controlled by a single group tied to one MPI implementation and not by a more general body. The more likely scenario, IMHO, is that each implementation would diverge into their own extensions and then we really have a mess (if Open MPI goes rogue, why can’t others?) - we would need a different debugger for each MPI.
>> 
>>> Thus, the conclusion of the community was that given we don’t have anyone willing to reliably assume responsibility for MPIR support, deprecation provides tool vendors with 1-2 years of warning that the situation will change. 
>> 
>> Again, why is this different from no longer wanting to support collectives? Is support for a portable development environment really that low on the priority scale for Open MPI? That would be really disappointing, especially all the hard work in the tools WG with substantial contributions and time investment from the Open MPI team!
>> 
>>> It also gives this WG the same time to come up with an alternative suggestion, or for someone to stand up and take on the support in OMPI. We happily welcome patches!
>>> 
>>> As for your contracts - LLNL is welcome to add MPIR support to its contracts! 
>> 
>> It’s in there and (I would assume, not my call) will likely continue to be.
>> 
>>> I’m sure a vendor would, if appropriately compensated, be happy to assume the responsibility if you deem it that critical. Please note that I specifically alerted LLNL (and TotalView, due to the prior collaboration) to the situation months ago, so this isn’t something that suddenly jumped out of the bushes.
>> 
>> It’s not just TV, there are other community tools as well, plus resource managers. This step is basically turning a situation with a portable interface (which we finally turned into a real portable solution thanks to the tools WG) into an MxN solution, which can’t be what we want going forward.
>> 
>> I still have to stick with what I said before - I hope the Open MPI team will reconsider and stick with the MPI forum approved interfaces, at least until a new community solution has been accepted. 
>> 
>> Martin
>> 
>> PS: Please don’t take this response personal, but I felt I had to respond so directly, since I see this as a threat to the standard itself- IMHO this could erode the value of what the MPI forum does.
>> 
>> 
>>> 
>>> HTH
>>> Ralph
>>> 
>>>> On Jul 17, 2017, at 11:48 AM, Martin Schulz <schulzm at llnl.gov> wrote:
>>>> 
>>>> Hi Jeff and Ralph,
>>>> 
>>>> I am really concerned about this step and I think this is a huge step in the wrong direction - both from a user and a standards perspective. 
>>>> 
>>>> As of now PMIx is an implementation specific interface (just alone from the fact that the Open MPI community hosts the interface and controls its interface definition); it’s definitely not a community interface, as we have it with the (MPI Forum approved!) MPIR interface. We have contracts that require MPIR for upcoming machines (well beyond the timeframe below)  and we have tools that rely on it -  this step, if really executed, will de facto kill portable debugging for MPI (and, IMHO, one of the nice features we always claim for MPI). Large tools (like TV) can work around it (for a cost, though), but the many smaller tools that are coming from the open source community will have a hard time.
>>>> 
>>>> It also diminishes the role and importance of our MPI side documents, which we have fought for so hard - if they suddenly become optional and only implemented by a subset of implementations, what’s their point?
>>>> 
>>>> If you want PMIx as the MPIR interface (which, I agree, there are some good technical reasons), we should really make this a standard in a much more community effort and control under the umbrella of the MPI forum (or a similar body) and make sure it gets agreed on and accepted by all major implementors before removing the current portable interface. 
>>>> 
>>>> I hope the Open MPI community will rethink this step,
>>>> 
>>>> Martin
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>> On Jul 14, 2017, at 2:14 AM, rhc at open-mpi.org wrote:
>>>>> 
>>>>> We will deprecate for v3.1 (expected out this fall), and may phase it out sometime in 2018 with the release of OMPI 4.0, or maybe as late as 2019. No real schedule has been developed yet. We are just trying to provide folks like you with as much notice as possible. You should plan on at least one year to get ready.
>>>>> 
>>>>>> On Jul 13, 2017, at 9:03 AM, John DelSignore <John.DelSignore at roguewave.com> wrote:
>>>>>> 
>>>>>> Ouch. Have you decided what the deprecation time line looks like yet? In other words, when do you think that Open MPI will stop supporting MPIR?
>>>>>> 
>>>>>> Cheers, John D.
>>>>>> 
>>>>>> 
>>>>>> On 07/13/17 08:00, Jeff Squyres (jsquyres) wrote:
>>>>>>> FWIW, we just decided this week in Open MPI to deprecate the MPIR interface in favor of PMIx.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jul 12, 2017, at 2:02 PM, Durnov, Dmitry <dmitry.durnov at intel.com>
>>>>>>>>  wrote:
>>>>>>>> 
>>>>>>>> Sure.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> BR,
>>>>>>>> Dmitry
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpiwg-tools [
>>>>>>>> mailto:mpiwg-tools-bounces at lists.mpi-forum.org
>>>>>>>> ] On Behalf Of John DelSignore
>>>>>>>> Sent: Wednesday, July 12, 2017 9:52 PM
>>>>>>>> To: 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> 
>>>>>>>> Subject: Re: [mpiwg-tools] Intel MPI Backend Breakpoint
>>>>>>>> 
>>>>>>>> I'd be interested in being included in that discussion. FWIW, I work on the TotalView debugger and wrote-up the MPIR specification.
>>>>>>>> 
>>>>>>>> Cheers, John D.
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpiwg-tools [
>>>>>>>> mailto:mpiwg-tools-bounces at lists.mpi-forum.org
>>>>>>>> ] On Behalf Of Durnov, Dmitry
>>>>>>>> Sent: Wednesday, July 12, 2017 2:44 PM
>>>>>>>> To: 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> 
>>>>>>>> Subject: Re: [mpiwg-tools] Intel MPI Backend Breakpoint
>>>>>>>> 
>>>>>>>> Hi Alex,
>>>>>>>> 
>>>>>>>> I've started a separate mail thread where we may discuss details.
>>>>>>>> 
>>>>>>>> Thanks.
>>>>>>>> 
>>>>>>>> BR,
>>>>>>>> Dmitry
>>>>>>>> 
>>>>>>>> -----Original Message-----
>>>>>>>> From: mpiwg-tools [
>>>>>>>> mailto:mpiwg-tools-bounces at lists.mpi-forum.org
>>>>>>>> ] On Behalf Of Alexander Zahdeh
>>>>>>>> Sent: Wednesday, July 12, 2017 7:27 PM
>>>>>>>> To: 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> 
>>>>>>>> Subject: [mpiwg-tools] Intel MPI Backend Breakpoint
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> This is Alex Zahdeh, one of the debugger tools developers at Cray. I had a question about how Intel MPI handles synchronization according to the MPIR debugging standard. The usual procedure for our debugger is to launch tool daemons to attach to the backend application processes while the application launcher is held at MPIR_Breakpoint. At this point the application process must be in some sort of barrier so the debugger tries to return the user to their own code by setting breakpoints at various initialization symbols for different parallel models, continuing, hitting one of the breakpoints, deleting the rest and finishing the current function. This works if the application is held before the breakpoints we set which does not seem to be the case with Intel MPI. Is there a more standard approach to returning the user to their own code or does it vary by programming model and implementor? And specifically with Intel MPI would there be a good breakpoint to set in this scenari
>>>>>>>> 
>>>>>>>  o?
>>>>>>> 
>>>>>>>> Thanks much,
>>>>>>>> Alex
>>>>>>>> --
>>>>>>>> Alex Zahdeh | PE Debugger Development | Cray Inc.
>>>>>>>> 
>>>>>>>> azahdeh at cray.com
>>>>>>>>  | Office: 651-967-9628 | Cell: 651-300-2005 _______________________________________________
>>>>>>>> mpiwg-tools mailing list
>>>>>>>> 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --------------------------------------------------------------------
>>>>>>>> Joint Stock Company Intel A/O
>>>>>>>> Registered legal address: Krylatsky Hills Business Park,
>>>>>>>> 17 Krylatskaya Str., Bldg 4, Moscow 121614, Russian Federation
>>>>>>>> 
>>>>>>>> This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies.
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpiwg-tools mailing list
>>>>>>>> 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpiwg-tools mailing list
>>>>>>>> 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>>>>>>> 
>>>>>>>> 
>>>>>>>> --------------------------------------------------------------------
>>>>>>>> Joint Stock Company Intel A/O
>>>>>>>> Registered legal address: Krylatsky Hills Business Park,
>>>>>>>> 17 Krylatskaya Str., Bldg 4, Moscow 121614,
>>>>>>>> Russian Federation
>>>>>>>> 
>>>>>>>> This e-mail and any attachments may contain confidential material for
>>>>>>>> the sole use of the intended recipient(s). Any review or distribution
>>>>>>>> by others is strictly prohibited. If you are not the intended
>>>>>>>> recipient, please contact the sender and delete all copies.
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> mpiwg-tools mailing list
>>>>>>>> 
>>>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>>>>> 
>>>>>> _______________________________________________
>>>>>> mpiwg-tools mailing list
>>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>>>> 
>>>>> _______________________________________________
>>>>> mpiwg-tools mailing list
>>>>> mpiwg-tools at lists.mpi-forum.org
>>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>>> 
>>>> _______________________________________________
>>>> mpiwg-tools mailing list
>>>> mpiwg-tools at lists.mpi-forum.org
>>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>>> 
>>> _______________________________________________
>>> mpiwg-tools mailing list
>>> mpiwg-tools at lists.mpi-forum.org
>>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
>> 
>> _______________________________________________
>> mpiwg-tools mailing list
>> mpiwg-tools at lists.mpi-forum.org
>> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools
> 
> _______________________________________________
> mpiwg-tools mailing list
> mpiwg-tools at lists.mpi-forum.org
> https://lists.mpi-forum.org/mailman/listinfo/mpiwg-tools


-- 
Jeff Squyres
jsquyres at cisco.com



More information about the mpiwg-tools mailing list