Add gearman stats reference
The stats emitted under zuul.geard are currently undocumented. Add them to the monitoring guide and add some more details to the geard toubleshooting guide for what to do if the stats look wrong. Change-Id: I831def2f7c22d8ffff62569cc7d657033a85ed19
This commit is contained in:
parent
ac168d1e40
commit
18fb9ec37e
|
@ -264,7 +264,10 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
|
|||
.. stat:: current_requests
|
||||
:type: gauge
|
||||
|
||||
The number of outstanding nodepool requests from Zuul.
|
||||
The number of outstanding nodepool requests from Zuul. Ideally
|
||||
this will be at zero, meaning all requests are fulfilled.
|
||||
Persistently high values indicate more testing node resources
|
||||
would be helpful.
|
||||
|
||||
.. stat:: zuul.mergers
|
||||
|
||||
|
@ -283,7 +286,9 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
|
|||
.. stat:: jobs_queued
|
||||
:type: gauge
|
||||
|
||||
The number of merge jobs queued.
|
||||
The number of merge jobs waiting for a merger. This should
|
||||
ideally be zero; persistent higher values indicate more merger
|
||||
resources would be useful.
|
||||
|
||||
.. stat:: zuul.executors
|
||||
|
||||
|
@ -307,8 +312,40 @@ These metrics are emitted by the Zuul :ref:`scheduler`:
|
|||
.. stat:: jobs_queued
|
||||
:type: gauge
|
||||
|
||||
The number of executor jobs queued.
|
||||
The number of jobs allocated nodes, but queued waiting for an
|
||||
executor to run on. This should ideally be at zero; persistent
|
||||
higher values indicate more exectuor resources would be useful.
|
||||
|
||||
.. stat:: zuul.geard
|
||||
|
||||
Gearman job distribution statistics. Gearman jobs encompass the
|
||||
wide variety of distributed jobs running within the scheduler and
|
||||
across mergers and exectuors. These stats are emitted by the `gear
|
||||
<https://pypi.org/project/gear/>`__ library.
|
||||
|
||||
.. stat:: running
|
||||
:type: gauge
|
||||
|
||||
Jobs that Gearman has actively running. The longest running
|
||||
jobs will usually relate to active job execution so you would
|
||||
expect this to have a lower bound around there. Note this may
|
||||
be lower than active nodes, as a multiple-node job will only
|
||||
have one active Gearman job.
|
||||
|
||||
.. stat:: waiting
|
||||
:type: gauge
|
||||
|
||||
Jobs waiting in the gearman queue. This would be expected to be
|
||||
around zero; note that this is *not* related to the backlogged
|
||||
queue of jobs waiting for a node allocation (node allocations
|
||||
are via Zookeeper). If this is unexpectedly high, see
|
||||
:ref:`debug_gearman` for queue debugging tips to find out which
|
||||
particular function calls are waiting.
|
||||
|
||||
.. stat:: total
|
||||
:type: gauge
|
||||
|
||||
The sum of the `running` and `waiting` jobs.
|
||||
|
||||
As an example, given a job named `myjob` in `mytenant` triggered by a
|
||||
change to `myproject` on the `master` branch in the `gate` pipeline
|
||||
|
|
|
@ -1,10 +1,41 @@
|
|||
Troubleshooting
|
||||
---------------
|
||||
|
||||
You can use telnet to connect to gearman to check which Zuul
|
||||
components are online::
|
||||
Some advanced troubleshooting options are provided below. These are
|
||||
generally very low-level and are not normally required.
|
||||
|
||||
.. _debug_gearman:
|
||||
|
||||
Gearman Jobs
|
||||
============
|
||||
|
||||
Connecting to Gearman can allow you see if any Zuul components appear
|
||||
to not be accepting requests correctly.
|
||||
|
||||
For unencrypted Gearman connections, you can use telnet to connect to
|
||||
and check which Zuul components are online::
|
||||
|
||||
telnet <gearman_ip> 4730
|
||||
|
||||
Useful commands are ``workers`` and ``status`` which you can run by just
|
||||
typing those commands once connected to gearman.
|
||||
For encrypted connections, you will need to provide suitable keys,
|
||||
e.g::
|
||||
|
||||
openssl s_client -connect localhost:4730 -cert /etc/zuul/ssl/client.pem -key /etc/zuul/ssl/client.key
|
||||
|
||||
Commands available are discussed in the Gearman `administrative
|
||||
protocol <http://gearman.org/protocol>`__. Useful commands are
|
||||
``workers`` and ``status`` which you can run by just typing those
|
||||
commands once connected to gearman.
|
||||
|
||||
For ``status`` you will see output for internal Zuul functions in the
|
||||
form ``FUNCTION\tTOTAL\tRUNNING\tAVAILABLE_WORKERS``::
|
||||
|
||||
...
|
||||
executor:resume:ze06.openstack.org 0 0 1
|
||||
zuul:config_errors_list 0 0 1
|
||||
zuul:status_get 0 0 1
|
||||
executor:stop:ze11.openstack.org 0 0 1
|
||||
zuul:job_list 0 0 1
|
||||
zuul:tenant_sql_connection 0 0 1
|
||||
executor:resume:ze09.openstack.org 0 0 1
|
||||
...
|
||||
|
|
Loading…
Reference in New Issue