<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">The proliferation of error classes is why I question the decision to not use the code/class structure that has worked so well for the rest of MPI (and requiring support of enough routines to decode error codes, including more detailed error strings, would have been a small change to most or all implementations). The error code system was designed to provide a mechanism for detailed error reporting without creating a zillion error classes.<div><br></div><div>Bill</div><div><br><div>
<span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div style="font-size: 12px; ">William Gropp</div><div style="font-size: 12px; ">Director, Parallel Computing Institute</div><div style="font-size: 12px; ">Deputy Director for Research</div><div style="font-size: 12px; ">Institute for Advanced Computing Applications and Technologies</div></div></div></span><span class="Apple-style-span" style="font-size: 12px; ">Thomas M. Siebel Chair in Computer Science</span><span class="Apple-style-span" style="border-collapse: separate; color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: 0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><div><div style="font-size: 12px; ">University of Illinois Urbana-Champaign</div></div><div><br></div></div></span><br class="Apple-interchange-newline"></span><br class="Apple-interchange-newline">
</div>
<br><div><div>On Sep 25, 2013, at 9:09 AM, Junchao Zhang wrote:</div><br class="Apple-interchange-newline"><blockquote type="cite"><meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"><div dir="ltr">I agree. <div>Bu I think a better error code name is MPI_T_ERR_PVAR_WATERMARK_NOTSTARTED.<div>If you remember an earlier problem I reported, "read a never started continuous pvar", we should also have a MPI_T_ERR_PVAR_NEVERSTARTED.</div>
<div class="gmail_extra">
<br clear="all"><div><div dir="ltr">--Junchao Zhang</div></div>
<br><br><div class="gmail_quote">On Tue, Sep 24, 2013 at 6:50 PM, Martin Schulz <span dir="ltr"><<a href="mailto:schulzm@llnl.gov" target="_blank">schulzm@llnl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><br><div><div><div>On Sep 19, 2013, at 11:24 AM, Junchao Zhang <<a href="mailto:jczhang@mcs.anl.gov" target="_blank">jczhang@mcs.anl.gov</a>> wrote:</div><br><blockquote type="cite">
<div dir="ltr">For a running (i.e., started) watermark, it is reasonable to return the starting value.<div>But for a stopped one, it is strange to do a read and return what is read. </div></div></blockquote><div><br></div>
</div><div>Yes, I agree - I think we are running into a strange case here where definition and intended use don't quite match.</div><div><br></div><div>Let's consider a watermark on a particular resource with values changing as follows:</div>
<div><br></div><div><span style="white-space:pre-wrap"> </span>30</div><div><span style="white-space:pre-wrap"> </span>60</div><div>RESET</div><div><span style="white-space:pre-wrap"> </span>60</div><div><span style="white-space:pre-wrap"> </span>20</div>
<div><span style="white-space:pre-wrap"> </span>70</div><div><span style="white-space:pre-wrap"> </span>20</div><div>READ(1)</div><div><span style="white-space:pre-wrap"> </span>20</div><div><span style="white-space:pre-wrap"> </span>30</div>
<div>START</div><div><span style="white-space:pre-wrap"> </span>30</div><div><span style="white-space:pre-wrap"> </span>40</div><div><span style="white-space:pre-wrap"> </span>50</div><div><span style="white-space:pre-wrap"> </span>35</div>
<div><span style="white-space:pre-wrap"> </span>45</div><div>READ(2)</div><div><span style="white-space:pre-wrap"> </span>45</div><div><span style="white-space:pre-wrap"> </span>40</div><div>STOP</div><div><span style="white-space:pre-wrap"> </span>40</div>
<div><span style="white-space:pre-wrap"> </span>100</div><div>READ(3)</div><div><span style="white-space:pre-wrap"> </span>100</div><div><br></div><div>Intuitively, as also Kathryn described, you want the watermark inside the start/stop region, i.e., READ(2) should return 50. Even more important, READ(3) should return 50, since this was the watermark inside the start/stop region. This requires, though, that the starting value is applied at START - if we do it at RESET, the final value at READ(2) is 60, which doesn't make sense at all (in particular due to the peak of 70 in between), or it would be 70 if you continue updating between RESET and START, which also doesn't make sense. </div>
<div><br></div><div>So what should READ(1) return if we keep it completely turned off until we reach START. Perhaps we need a new error code NOTSTARTED?</div><span><font color="#888888"><div><br></div><div>
Martin</div></font></span><div><div><div><br></div><div><br></div><div><br></div><div><br></div><br><blockquote type="cite"><div class="gmail_extra"><br clear="all">
<div><div dir="ltr">--Junchao Zhang</div></div>
<br><br><div class="gmail_quote">On Thu, Sep 19, 2013 at 11:03 AM, Kathryn Mohror <span dir="ltr"><<a href="mailto:kathryn@llnl.gov" target="_blank">kathryn@llnl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style="word-wrap:break-word"><div>Hi Junchao,</div><div><div><br><blockquote type="cite"><div dir="ltr"><div><span style="font-family:arial,sans-serif;font-size:13px"><br></span></div>
<span style="font-family:arial,sans-serif;font-size:13px">Also, for a stopped pvar, after reset and before restarting, what does a pvar_read return?</span><div style="font-family:arial,sans-serif;font-size:13px">Returning zero sounds good for counters? What about watermarks? Old value, garbage value or MPI_T_ERROR_XXX? I would choose ERROR.</div>
<div style="font-family:arial,sans-serif;font-size:13px">The side-effect is that it makes resetting pvars not beautiful.</div></div></blockquote><div><br></div></div>In my interpretation, it returns the starting value of the variable as defined according to the variable class. So, for watermarks, it would be the current value at the time of the reset. I can imagine a scenario where you want to know what the starting value of a variable is for some reason, so you wouldn't want it to be erroneous for a tool to read a non-started variable.</div>
<div><br></div><div>Do others agree with this?</div><span><font color="#888888"><div><br></div><div>Kathryn</div></font></span><div><div><div><br><blockquote type="cite"><div dir="ltr"><div class="gmail_extra">
<br clear="all"><div><div dir="ltr">--Junchao Zhang</div></div><br><div class="gmail_quote">
On Thu, Sep 19, 2013 at 12:32 AM, Martin Schulz <span dir="ltr"><<a href="mailto:schulzm@llnl.gov" target="_blank">schulzm@llnl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
<div style="word-wrap:break-word">Hmm, that is a good catch. I agree with Kathryn's interpretation - in particular the use case she is laying out. If one does:<div><br></div><div>Reset</div><div>Start</div><div>Stop</div>
<div><br></div><div>You want the watermark from that interval, i.e., the starting value as of the start call should be the right thing. This is something we definitely should clarify.</div><div><br></div><div>Thanks,</div>
<div><br></div><div>Martin</div><div><br></div><div><br></div><div><br><div><div>On Sep 18, 2013, at 8:33 PM, Kathryn Mohror <<a href="mailto:kathryn@llnl.gov" target="_blank">kathryn@llnl.gov</a>></div><div> wrote:</div>
<div><br><blockquote type="cite">
<div style="word-wrap:break-word">Hi Junchao,<div><br><div><blockquote type="cite"><div dir="ltr"><div> What is the right behavior when resetting a stopped pvar? The standard says setting to its starting value.</div><div>
For counters, timers etc, setting them to zero sounds reasonable.</div><div>
But for a watermark, setting it to "the current utilization level" looks weird. It implies that a value caught during the stopped period can affect its future value when the pvar is re-started.</div><div> Probably, we should reset a stopped watermark to a state as if it has never been started.</div>
<div> Any comments? Thanks<br></div></div></blockquote><div><br></div>Hmm. It makes sense to me, but I'll let others chime in if they disagree. I think that the moment you start the watermark variable, you want to know what the "mark" is, so it would be the value of current utilization. So even if a higher (or lower) value is caught during the stopped period (which it shouldn't be, because variables aren't supposed to be updated when stopped), it will be set to the current utilization value when started. I interpret this as being able to measure the watermark during different epochs of the program execution. Every time you start the variable, it's a fresh epoch and you want to know what the watermark was during that epoch.</div>
<div><br></div><div>However, I can see how this isn't clear as it could be -- I'll try to see what we can do to clarify it in the text.</div><div><br></div><div>Thanks again for taking the time to give us this feedback.</div>
<div><br></div><div>Kathryn</div><div><br></div><div><br><blockquote type="cite"><div dir="ltr"><div><div><div dir="ltr">--Junchao Zhang</div></div>
</div>
</div>
_______________________________________________<br>mpiwg-tools mailing list<br><a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a></blockquote>
</div><br><div>
<div style="font-family:Helvetica;font-size:medium;font-style:normal;font-variant:normal;font-weight:normal;letter-spacing:normal;line-height:normal;text-align:-webkit-auto;text-indent:0px;text-transform:none;white-space:normal;word-spacing:0px;word-wrap:break-word">
<div>______________________________________________________________<br>Kathryn Mohror, <a href="mailto:kathryn@llnl.gov" target="_blank">kathryn@llnl.gov</a>, <a href="http://people.llnl.gov/mohror1" target="_blank">http://people.llnl.gov/mohror1</a><br>
CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA</div><div><br></div></div><br><br>
</div>
<br></div></div>_______________________________________________<br>mpiwg-tools mailing list<br><a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a></blockquote>
</div></div><br><div>
<div>________________________________________________________________________<br>Martin Schulz, <a href="mailto:schulzm@llnl.gov" target="_blank">schulzm@llnl.gov</a>, <a href="http://people.llnl.gov/schulzm" target="_blank">http://people.llnl.gov/schulzm</a><br>
CASC @ Lawrence Livermore National Laboratory, Livermore, USA</div>
</div>
<br></div></div><br>_______________________________________________<br>
mpiwg-tools mailing list<br>
<a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a><br></blockquote></div><br></div></div>
_______________________________________________<br>mpiwg-tools mailing list<br><a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a></blockquote>
</div><br><div>
<div style="text-indent:0px;letter-spacing:normal;font-variant:normal;text-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:normal;text-transform:none;font-size:medium;white-space:normal;font-family:Helvetica;word-wrap:break-word;word-spacing:0px">
<div>______________________________________________________________<br>Kathryn Mohror, <a href="mailto:kathryn@llnl.gov" target="_blank">kathryn@llnl.gov</a>, <a href="http://people.llnl.gov/mohror1" target="_blank">http://people.llnl.gov/mohror1</a><br>
CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA</div><div><br></div></div><br><br>
</div>
<br></div></div></div><br>_______________________________________________<br>
mpiwg-tools mailing list<br>
<a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a><br></blockquote></div><br></div>
_______________________________________________<br>mpiwg-tools mailing list<br><a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br><a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a></blockquote>
</div></div></div><div><div><br><div>
<div>________________________________________________________________________<br>Martin Schulz, <a href="mailto:schulzm@llnl.gov" target="_blank">schulzm@llnl.gov</a>, <a href="http://people.llnl.gov/schulzm" target="_blank">http://people.llnl.gov/schulzm</a><br>
CASC @ Lawrence Livermore National Laboratory, Livermore, USA</div>
</div>
<br></div></div></div><br>_______________________________________________<br>
mpiwg-tools mailing list<br>
<a href="mailto:mpiwg-tools@lists.mpi-forum.org" target="_blank">mpiwg-tools@lists.mpi-forum.org</a><br>
<a href="http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools" target="_blank">http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</a><br></blockquote></div><br></div></div></div>
_______________________________________________<br>mpiwg-tools mailing list<br><a href="mailto:mpiwg-tools@lists.mpi-forum.org">mpiwg-tools@lists.mpi-forum.org</a><br>http://lists.mpi-forum.org/mailman/listinfo.cgi/mpiwg-tools</blockquote></div><br></div></body></html>