Bug 186635 - [GTK][buildbot] Raise timeouts for the step benchmark-test on the GTK perf bot
Summary: [GTK][buildbot] Raise timeouts for the step benchmark-test on the GTK perf bot
Status: RESOLVED FIXED
Alias: None
Product: WebKit
Classification: Unclassified
Component: WebKitGTK (show other bugs)
Version: WebKit Nightly Build
Hardware: Unspecified Unspecified
: P2 Normal
Assignee: Carlos Alberto Lopez Perez
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-06-14 16:07 PDT by Carlos Alberto Lopez Perez
Modified: 2018-06-15 00:46 PDT (History)
9 users (show)

See Also:


Attachments
Patch (3.71 KB, patch)
2018-06-14 17:04 PDT, Carlos Alberto Lopez Perez
no flags Details | Formatted Diff | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Carlos Alberto Lopez Perez 2018-06-14 16:07:27 PDT
The step benchmark-test on the GTK perf bot runs all the benchmark plans one by one (one after the other).

Each one of this benchmarks plans have different timeouts.

The problem is that buildbot itself will kill the whole step when no output is produced after a 1200 seconds timeout, but some of the plans for run-benchmark have higher timeouts than 1200... that means that when things go wrong (or the machine/built-product is slower than expected) the buildbot timeout will happen before the plan one.

And when that happens, that means that the whole buildbot step will be stopped instead of having simply one test of the test set failing because its timeout has been reached.

This has been an already an issue on the past, and we already fixed this by growing the buildbot timeout for this step on the past


But it seems now some plans (like motionmark) have grown the plan timeout further, so we should adjust.
Comment 1 Carlos Alberto Lopez Perez 2018-06-14 16:10:36 PDT
Example of the issue:

* Build: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29/builds/3150
* Stdio log: https://build.webkit.org/builders/GTK%20Linux%2064-bit%20Release%20%28Perf%29/builds/3150/steps/benchmark-test/logs/stdio

2018-06-14 00:22:07,450 - INFO - Copied the benchmark into: /tmp/tmpxeBC7R/motionmark
patching file resources/runner/motionmark.js
2018-06-14 00:22:07,455 - INFO - Start the iteration 1 of 1 for current benchmark
2018-06-14 00:22:07,456 - INFO - Launching an http server
2018-06-14 00:22:07,460 - INFO - Start to fetching the port number of the http server
2018-06-14 00:22:07,460 - INFO - Server port is not found this time, retry after 0.500000 seconds
2018-06-14 00:22:07,964 - INFO - HTTP Server is serving at port: 36079
2018-06-14 00:22:07,981 - INFO - Executing: /home/slave/webkitgtk/gtk-linux-64-release-perf-tests/build/Tools/Scripts/run-minibrowser --gtk --geometry=1024x768 http://127.0.0.1:36079/motionmark/index.html

command timed out: 1200 seconds without output running ['python', './Tools/Scripts/run-benchmark', '--allplans', '--browser', 'minibrowser-gtk'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=5473.596568


Buildbot aborts at 1200 seconds without output.. but the motionmark timeout was 1800 according to https://svn.webkit.org/repository/webkit/trunk/Tools/Scripts/webkitpy/benchmark_runner/data/plans/motionmark.plan

so the buildbot one happened before the plan one.. 

we need to avoid this
Comment 2 Carlos Alberto Lopez Perez 2018-06-14 17:04:54 PDT
Created attachment 342779 [details]
Patch
Comment 3 EWS Watchlist 2018-06-14 17:07:05 PDT
Attachment 342779 [details] did not pass style-queue:


ERROR: Tools/BuildSlaveSupport/build.webkit.org-config/factories.py:203:  [BuildAndPerfTestFactory.__init__] Undefined variable 'RunBenchmarkTests'  [pylint/E0602] [5]
ERROR: Tools/BuildSlaveSupport/build.webkit.org-config/factories.py:213:  [DownloadAndPerfTestFactory.__init__] Undefined variable 'RunBenchmarkTests'  [pylint/E0602] [5]
Total errors found: 2 in 2 files


If any of these errors are false positives, please file a bug against check-webkit-style.
Comment 4 Michael Catanzaro 2018-06-14 19:04:24 PDT
Comment on attachment 342779 [details]
Patch

View in context: https://bugs.webkit.org/attachment.cgi?id=342779&action=review

> Tools/BuildSlaveSupport/build.webkit.org-config/steps.py:-831
> -    # Buildbot default timeout without output for a step is 1200.
> -    # The current maximum timeout for a benchmark plan is also 1200.
> -    # So raise the buildbot timeout to avoid aborting this whole step when a test timeouts.
> -    timeout = 1500

Weird that this was added if it has been ignored this whole time... maybe it regressed at some point in buildbot...?
Comment 5 Carlos Alberto Lopez Perez 2018-06-14 19:09:04 PDT
(In reply to Michael Catanzaro from comment #4)
> Comment on attachment 342779 [details]
> Patch
> 
> View in context:
> https://bugs.webkit.org/attachment.cgi?id=342779&action=review
> 
> > Tools/BuildSlaveSupport/build.webkit.org-config/steps.py:-831
> > -    # Buildbot default timeout without output for a step is 1200.
> > -    # The current maximum timeout for a benchmark plan is also 1200.
> > -    # So raise the buildbot timeout to avoid aborting this whole step when a test timeouts.
> > -    timeout = 1500
> 
> Weird that this was added if it has been ignored this whole time... maybe it
> regressed at some point in buildbot...?

I think it never worked.

I guess I just assumed it would, but it didn't ... 

Testing buildbot master changes is kind of difficult because its not easy to access to a test environment.

Now its a bit easier thanks to the script run-buildbot-test.py that was added on r225336 ... I used that now to verify this will work.

But that script was not available back then.
Comment 6 Carlos Alberto Lopez Perez 2018-06-14 19:11:41 PDT
Comment on attachment 342779 [details]
Patch

Clearing flags on attachment: 342779

Committed r232866: <https://trac.webkit.org/changeset/232866>
Comment 7 Carlos Alberto Lopez Perez 2018-06-14 19:11:45 PDT
All reviewed patches have been landed.  Closing bug.