summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorIan Wienand <iwienand@redhat.com>2018-11-27 12:01:45 +1100
committerIan Wienand <iwienand@redhat.com>2018-11-27 20:25:04 +1100
commit18fb9ec37eae7fdf6c89eb17a7c0e4b897db4d49 (patch)
treef64330c48cde011e807deea1a8222f7ff5cb119d
parentac168d1e407c2ce7f8c33641be69e73c31a88e9c (diff)
Add gearman stats reference
The stats emitted under zuul.geard are currently undocumented. Add them to the monitoring guide and add some more details to the geard toubleshooting guide for what to do if the stats look wrong. Change-Id: I831def2f7c22d8ffff62569cc7d657033a85ed19
Notes
Notes (review): Code-Review+2: Tobias Henkel <tobias.henkel@bmw.de> Code-Review+2: Monty Taylor <mordred@inaugust.com> Workflow+1: Monty Taylor <mordred@inaugust.com> Verified+2: Zuul Submitted-by: Zuul Submitted-at: Fri, 30 Nov 2018 07:57:14 +0000 Reviewed-on: https://review.openstack.org/620192 Project: openstack-infra/zuul Branch: refs/heads/master
-rw-r--r--doc/source/admin/monitoring.rst43
-rw-r--r--doc/source/admin/troubleshooting.rst39
2 files changed, 75 insertions, 7 deletions
diff --git a/doc/source/admin/monitoring.rst b/doc/source/admin/monitoring.rst
index 1452a27..4487d18 100644
--- a/doc/source/admin/monitoring.rst
+++ b/doc/source/admin/monitoring.rst
@@ -264,7 +264,10 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
264 .. stat:: current_requests 264 .. stat:: current_requests
265 :type: gauge 265 :type: gauge
266 266
267 The number of outstanding nodepool requests from Zuul. 267 The number of outstanding nodepool requests from Zuul. Ideally
268 this will be at zero, meaning all requests are fulfilled.
269 Persistently high values indicate more testing node resources
270 would be helpful.
268 271
269.. stat:: zuul.mergers 272.. stat:: zuul.mergers
270 273
@@ -283,7 +286,9 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
283 .. stat:: jobs_queued 286 .. stat:: jobs_queued
284 :type: gauge 287 :type: gauge
285 288
286 The number of merge jobs queued. 289 The number of merge jobs waiting for a merger. This should
290 ideally be zero; persistent higher values indicate more merger
291 resources would be useful.
287 292
288.. stat:: zuul.executors 293.. stat:: zuul.executors
289 294
@@ -307,8 +312,40 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
307 .. stat:: jobs_queued 312 .. stat:: jobs_queued
308 :type: gauge 313 :type: gauge
309 314
310 The number of executor jobs queued. 315 The number of jobs allocated nodes, but queued waiting for an
316 executor to run on. This should ideally be at zero; persistent
317 higher values indicate more exectuor resources would be useful.
311 318
319.. stat:: zuul.geard
320
321 Gearman job distribution statistics. Gearman jobs encompass the
322 wide variety of distributed jobs running within the scheduler and
323 across mergers and exectuors. These stats are emitted by the `gear
324 <https://pypi.org/project/gear/>`__ library.
325
326 .. stat:: running
327 :type: gauge
328
329 Jobs that Gearman has actively running. The longest running
330 jobs will usually relate to active job execution so you would
331 expect this to have a lower bound around there. Note this may
332 be lower than active nodes, as a multiple-node job will only
333 have one active Gearman job.
334
335 .. stat:: waiting
336 :type: gauge
337
338 Jobs waiting in the gearman queue. This would be expected to be
339 around zero; note that this is *not* related to the backlogged
340 queue of jobs waiting for a node allocation (node allocations
341 are via Zookeeper). If this is unexpectedly high, see
342 :ref:`debug_gearman` for queue debugging tips to find out which
343 particular function calls are waiting.
344
345 .. stat:: total
346 :type: gauge
347
348 The sum of the `running` and `waiting` jobs.
312 349
313As an example, given a job named `myjob` in `mytenant` triggered by a 350As an example, given a job named `myjob` in `mytenant` triggered by a
314change to `myproject` on the `master` branch in the `gate` pipeline 351change to `myproject` on the `master` branch in the `gate` pipeline
diff --git a/doc/source/admin/troubleshooting.rst b/doc/source/admin/troubleshooting.rst
index a4468e7..9edf0e2 100644
--- a/doc/source/admin/troubleshooting.rst
+++ b/doc/source/admin/troubleshooting.rst
@@ -1,10 +1,41 @@
1Troubleshooting 1Troubleshooting
2--------------- 2---------------
3 3
4You can use telnet to connect to gearman to check which Zuul 4Some advanced troubleshooting options are provided below. These are
5components are online:: 5generally very low-level and are not normally required.
6
7.. _debug_gearman:
8
9Gearman Jobs
10============
11
12Connecting to Gearman can allow you see if any Zuul components appear
13to not be accepting requests correctly.
14
15For unencrypted Gearman connections, you can use telnet to connect to
16and check which Zuul components are online::
6 17
7 telnet <gearman_ip> 4730 18 telnet <gearman_ip> 4730
8 19
9Useful commands are ``workers`` and ``status`` which you can run by just 20For encrypted connections, you will need to provide suitable keys,
10typing those commands once connected to gearman. 21e.g::
22
23 openssl s_client -connect localhost:4730 -cert /etc/zuul/ssl/client.pem -key /etc/zuul/ssl/client.key
24
25Commands available are discussed in the Gearman `administrative
26protocol <http://gearman.org/protocol>`__. Useful commands are
27``workers`` and ``status`` which you can run by just typing those
28commands once connected to gearman.
29
30For ``status`` you will see output for internal Zuul functions in the
31form ``FUNCTION\tTOTAL\tRUNNING\tAVAILABLE_WORKERS``::
32
33 ...
34 executor:resume:ze06.openstack.org 0 0 1
35 zuul:config_errors_list 0 0 1
36 zuul:status_get 0 0 1
37 executor:stop:ze11.openstack.org 0 0 1
38 zuul:job_list 0 0 1
39 zuul:tenant_sql_connection 0 0 1
40 executor:resume:ze09.openstack.org 0 0 1
41 ...