[Mpi3-tools] MPIT chapter review

Sun Feb 7 21:37:43 CST 2010

On 2/6/10 10:56 PM, Martin Schulz wrote:
> Hi Marty, all,
>
> On Feb 6, 2010, at 11:45 AM, marty.itzkowitz at Sun.COM wrote:
>
>> [Sorry it's taken me so long; as you might imagine, life at Sun, now 
>> a wholly owned
>> subsidiary of Oracle, and soon to disappear as an independent 
>> company, has
>> been "interesting."]
>
> I bet - I hope everything is still OK at the new company and that the 
> old projects
> continue! I see, though, that you are still using "sun.com" email 
> addresses. Will
> this stay this way?

Sun is supposed to disappear as a company on Feb 15, and I (along with 
all the performance
tools people, and the MPI people) will become a genuine Oracle employee 
then.
I imagine they'll keep the email relay for quite a while.
>
>>
>> I've reviewed it, and I've had two of our MPI folks, and one other 
>> tools person
>> review it, too.  The aggregated comments are below.
>
> Excellent - thanks for taking the time.
>
>>
>>   Marty
>>
>> From Marty Itzkowitz:
>>
>> I dislike the use of the feminine pronouns by default.  Either 
>> rewrite so that neither
>>   masculine nor feminine pronouns are needed, or always use "he or she",
>>   "his or her."
>
> I personally don't have a big opinion here, but the rest of the MPI 
> standard is written
> that way and I think we need to be consistent with it (unless we want 
> to suggest
> global changes). In fact, in most cases the feminine versions are 
> straight copies
> of text used in the MPI 2.2 standard.
>
>>
>> p5, line 47.  The meaning does not get more complex, the 
>> interpretation does.
>
> Agreed - however, also this is a straight copy of the existing profile 
> chapter and
> we would need a separate ticket to change this. I'll keep this in mind 
> for when
> we need to make other changes in this chapter to move it under the tools
> chapter.
>
>>
>> p22, line 7, reads better as "*safely *callable from *asynchronous* 
>> signal handlers to allow
>>   their use in sampling-based tools."
>
> Changed - I also tried to describe it more generically to avoid the 
> comments
> we got last time,
>
> As a higher level question: do you see your State proposal represented 
> (or
> implementable) with the MPIT proposal? I think it is, but would 
> definitely
> like to address shortcomings if you don't think so.

I think do.  An MPI that supports it would have that performance 
variable, and it's
readable from a profile signal handler.  We could try it as soon as you 
have a
prototype in, say OpenMPI.

One more thing:  We found a performance cost to running a library with 
the instrumentation,
and set up a means to tell mpirun to use instrumented libraries, rather 
than the default.
It might be nice to have that in the spec, too.
>> From Terry Dontje:
>> 1.  What does MPIT_GET_SETINFO really do?  I guess my problem is it 
>> takes a "set name" but where does one get the set name and how does 
>> that differ from the out parameter "name of the set".  I read the 
>> section preceding the function description and I still just don't get 
>> it.
>
> The idea was that the name used in the set information is just some 
> short name
> with the restricted characterset described in the set section. This 
> function could
> then return a "full" name (which is more explicit and descriptive) 
> along with some
> actual description text. I tried to make this a bit more clear, but we 
> could also drop
> returning the name, if we feel the short and restricted name is 
> sufficient.
>
>>
>> 2.  Seems like a lot of functions for the performance accessors.  So 
>> to do the MPI state acquisition you will need to do the following:
>>
>> a.  Loop on MPIT_PERFORMANCE_QUERY until you find the state 
>> performance variable.
>> b.  Call MPIT_PERFORMANCE_INFO with the state performance variable 
>> name to get the appropriate info you need (specifically datatype/count).
>> c.  Call MPIT_PERFORMANCE_GETHANDLE on the state performance variable 
>> name.
>> d.  Call MPIT_PERFORMANCE_START on the state performance handle to 
>> activate the use of the state code.
>> e.  Periodically call MPIT_PERFORMANCE_READ using the state 
>> performance handle while the MPI code is running
>> f.  Once the MPI code finishes call MPIT_PERFORMANCE_FREEHANDLE
>>
>> I guess it isn't that bad we're just adding 4 additional calls which 
>> are necessary in order to have more than one performance variable.  
>> Just seems like a lot on paper (in the spec).
>
> Not sure how to change this - the first two allow the implementation to
> specify what it wants to deliver, the gethandle step allows us optimize
> the read access without requiring full handles for each variable, which
> could be prohibitive for some implementations, and start/read is the
> typical semantics as in PAPI. Note, though, that a) is optional and can
> be skipped if you know the name (e.g., through a command line
> parameter) beforehand. In this case, also b) could be skipped, if you
> are sure you know the type and count.

It's also one-time overhead at startup, but that's not so bad.
>
>>
>> From Eugene Loh:
>>
>> Have the MPIT interfaces been prototyped by anyone?  [Yukon Maruyama 
>> asked the
>> same question.  None of us feels confident we can understand the 
>> issues before we see
>> a prototype.]
>
> The interface has changed slightly since the last forum, but there is 
> a dummy
> implementation in the SVN for the older version of the API (an conforming
> implementation that provides no variables). I also have an early 
> prototype on
> top of MVAPICH returning both configuration and performance variables and
> implementing this has indeed led to a few changes in the API. I am also
> talking to the MPICH, the MVAPICH, and the Open MPI groups and all
> three said that they are interested in adding a prototype.
>
>>
>> Since the returned information is so implementation-dependent, it's 
>> hard for me to imagine that there could be (m)any MPIT-based tools 
>> that make sense for multiple MPI implementations.  Meanwhile, if a 
>> tool makes sense for only one MPI implementation, it should be part 
>> of the implementation itself.  In general, I'm still scratching my 
>> head over this MPIT stuff.
>
> I see this similar to the CPU counters and their use in PAPI. Each CPU 
> has
> their own counters and they mean something different on each platform,
> but they still have been very helpful in measuring and optimizing 
> performance.
> Also, having a unified interface, like PAPI, has been extremely helpful
> for tools and I expect MPI tools to use the MPIT interface in a 
> similar way.
>
> Many tools will simply measure all variables in a class and then 
> report them
> to the user along with the description text returned by MPI. The user 
> will
> then ultimately be responsible in interpreting the results, but such a 
> profiler
> will run across all platforms.
>
> Further, several people have suggested to create a table with a set of
> names that could be used across implementations. The idea would be
> to describe concepts that are present in most MPIs and then 
> implementations
> would be required to name their variable with the common name, if it
> offers such a variable (but it would not be required to offer it if it 
> doesn't
> support that type of information). Such an extension table could be in
> the MPI standard or in a support layer (like PAPI) outside of MPI.
>
>>
>> And some nits:
>> page 6, line 47:
>>
>>  All identifier covered by this interface carry
>> ->
>>  All identifiers covered by this interface carry
>
> fixed.
>
>>
>> page 7, line 1:
>>
>>  all conventions and principle governing
>> ->
>>  all conventions and principles governing
>
> fixed
>
>>
>> page 7, line 33:
>>
>>  must contain exactly one call to the MPIT initialization routine
>> ->
>>  must execute exactly one call to the MPIT initialization routine
>
> I agree, this would sound better. I took this text from the other parts
> of the MPI standard on MPI_Init, though, for consistency.
>
>>
>> page 8, lines 17 and 26:
>>
>>  before MPIT_INIT and after MPIT_FINALIZE
>> ->
>>  before MPIT_INIT or after MPIT_FINALIZE
>
> fixed
>
>>
>> page 12, line 42:  The table of legal "scope" values seems to be 
>> misplaced to the top of page 13.
>
> I changed this to a reference to the table.
>
>>
>> page 12, line 44:
>>
>>  it is not a guarantee that can be changed.
>> Is there a typo in that line?
>> [Marty]  I'm not quite sure what this line is saying.  Does it mean 
>> that even if the
>> scope says that a variable is changeable, the call to change it is 
>> not guaranteed to
>> succeed?  If that's so, what can the tool implementor rely on?
>
> Yes, that's what it means - the call can fail and a tool should check 
> for it and
> must be able to react appropriately. It is expected that a failure is 
> rare (e.g.,
> because a certain resource is busy or the MPI is currently handling 
> something
> that prevents it from executing the variable access). However, if the 
> scope
> says that it can't be changed at all or after Init, then the tool 
> knows that it
> is hopeless to even try.

This is problematic.  Sure, any call may fail, but it's not clear from 
the spec what
the implementor is supposed to do with the failure.  Retry immediately?  
Loop
until it goes through?

>
> This semantics is not unusual - any access can fail any time if the MPI
> library decides to return an error. Hence, perhaps this sentence is
> not necessary. I'll take it out.

Ok, thanks.

Talk to you tomorrow.