[mpiwg-tools] reset a stopped pvar

Junchao Zhang jczhang at mcs.anl.gov
Wed Sep 25 09:09:31 CDT 2013


I agree.
Bu I think a better error code name is MPI_T_ERR_PVAR_WATERMARK_NOTSTARTED.
If you remember an earlier problem I reported, "read a never started
continuous pvar",  we should also have a MPI_T_ERR_PVAR_NEVERSTARTED.

--Junchao Zhang


On Tue, Sep 24, 2013 at 6:50 PM, Martin Schulz <schulzm at llnl.gov> wrote:

>
> On Sep 19, 2013, at 11:24 AM, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
>
> For a running (i.e., started) watermark, it is reasonable to return the
> starting value.
> But for a stopped one, it is strange to do a read and return what is read.
>
>
> Yes, I agree - I think we are running into a strange case here where
> definition and intended use don't quite match.
>
> Let's consider a watermark on a particular resource with values changing
> as follows:
>
> 30
> 60
> RESET
> 60
> 20
>  70
> 20
> READ(1)
> 20
> 30
> START
> 30
> 40
> 50
> 35
>  45
> READ(2)
> 45
> 40
> STOP
> 40
>  100
> READ(3)
> 100
>
> Intuitively, as also Kathryn described, you want the watermark inside the
> start/stop region, i.e., READ(2) should return 50. Even more important,
> READ(3) should return 50, since this was the watermark inside the
> start/stop region. This requires, though, that the starting value is
> applied at START - if we do it at RESET, the final value at READ(2) is 60,
> which doesn't make sense at all (in particular due to the peak of 70  in
> between), or it would be 70 if you continue updating between RESET and
> START, which also doesn't make sense.
>
> So what should READ(1) return if we keep it completely turned off until we
> reach START. Perhaps we need a new error code NOTSTARTED?
>
> Martin
>
>
>
>
>
>
> --Junchao Zhang
>
>
> On Thu, Sep 19, 2013 at 11:03 AM, Kathryn Mohror <kathryn at llnl.gov> wrote:
>
>> Hi Junchao,
>>
>>
>> Also, for a stopped pvar, after reset and before restarting, what does a
>> pvar_read return?
>> Returning zero sounds good for counters? What about watermarks? Old
>> value, garbage value or MPI_T_ERROR_XXX? I would choose ERROR.
>> The side-effect is that it makes resetting pvars not beautiful.
>>
>>
>> In my interpretation, it returns the starting value of the variable as
>> defined according to the variable class. So, for watermarks, it would be
>> the current value at the time of the reset. I can imagine a scenario where
>> you want to know what the starting value  of a variable is for some reason,
>> so you wouldn't want it to be erroneous for a tool to read a non-started
>> variable.
>>
>> Do others agree with this?
>>
>> Kathryn
>>
>>
>> --Junchao Zhang
>>
>> On Thu, Sep 19, 2013 at 12:32 AM, Martin Schulz <schulzm at llnl.gov> wrote:
>>
>>> Hmm, that is a good catch. I agree with Kathryn's interpretation - in
>>> particular the use case she is laying out. If one does:
>>>
>>> Reset
>>> Start
>>> Stop
>>>
>>> You want the watermark from that interval, i.e., the starting value as
>>> of the start call should be the right thing. This is something we
>>> definitely should clarify.
>>>
>>> Thanks,
>>>
>>> Martin
>>>
>>>
>>>
>>> On Sep 18, 2013, at 8:33 PM, Kathryn Mohror <kathryn at llnl.gov>
>>>  wrote:
>>>
>>> Hi Junchao,
>>>
>>>   What is the right behavior when resetting a stopped pvar? The standard
>>> says setting to its starting value.
>>>   For counters, timers etc, setting them to zero sounds reasonable.
>>>   But for a watermark, setting it to "the current utilization level"
>>> looks weird. It implies that a value caught during the stopped period can
>>> affect its future value when the pvar is re-started.
>>>   Probably, we should reset a stopped watermark to a state as if it has
>>> never been started.
>>>   Any comments?  Thanks
>>>
>>>
>>> Hmm. It makes sense to me, but I'll let others chime in if they
>>> disagree. I think that the moment you start the watermark variable, you
>>> want to know what the "mark" is, so it would be the value of current
>>> utilization. So even if a higher (or lower) value is caught during the
>>> stopped period (which it shouldn't be, because variables aren't supposed to
>>> be updated when stopped), it will be set to the current utilization value
>>> when started. I interpret this as being able to measure the watermark
>>> during different epochs of the program execution. Every time you start the
>>> variable, it's a fresh epoch and you want to know what the watermark was
>>> during that epoch.
>>>
>>> However, I can see how this isn't clear as it could be -- I'll try to
>>> see what we can do to clarify it in the text.
>>>
>>> Thanks again for taking the time to give us this feedback.
>>>
>>> Kathryn
>>>
>>>
>>> --Junchao Zhang
>>>  _______________________________________________
>>> mpiwg-tools mailing list
>>> mpiwg-tools at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>>>
>>>
>>>  ______________________________________________________________
>>> Kathryn Mohror, kathryn at llnl.gov, http://people.llnl.gov/mohror1
>>> CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> mpiwg-tools mailing list
>>> mpiwg-tools at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>>>
>>>
>>> ________________________________________________________________________
>>> Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
>>> CASC @ Lawrence Livermore National Laboratory, Livermore, USA
>>>
>>>
>>> _______________________________________________
>>> mpiwg-tools mailing list
>>> mpiwg-tools at lists.mpi-forum.org
>>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>>>
>>
>> _______________________________________________
>> mpiwg-tools mailing list
>> mpiwg-tools at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>>
>>
>>  ______________________________________________________________
>> Kathryn Mohror, kathryn at llnl.gov, http://people.llnl.gov/mohror1
>> CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA
>>
>>
>>
>>
>>
>> _______________________________________________
>> mpiwg-tools mailing list
>> mpiwg-tools at lists.mpi-forum.org
>> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>>
>
> _______________________________________________
> mpiwg-tools mailing list
> mpiwg-tools at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>
>
> ________________________________________________________________________
> Martin Schulz, schulzm at llnl.gov, http://people.llnl.gov/schulzm
> CASC @ Lawrence Livermore National Laboratory, Livermore, USA
>
>
> _______________________________________________
> mpiwg-tools mailing list
> mpiwg-tools at lists.mpi-forum.org
> http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpi-forum.org/pipermail/mpiwg-tools/attachments/20130925/deeef371/attachment-0001.html>


More information about the mpiwg-tools mailing list