Commit Graph

20 Commits

Author SHA1 Message Date
James E. Blair 21b8451947 Resolve statsd client once at startup
We currently create a new statsd client each time we replace a
provider manager.  This means that if we are unable to resolve
DNS at that time, the new provider may crash due to unhandled
exceptions.

To resolve this, let's adopt the same behavior we have in Zuul
which is to set up the statsd client once at startup and continue
to use the same client object for the life of the process.

This means that operators will still see errors at startup during
a misconfiguration, but any external changes after that will
not affect nodepool.

Change-Id: I65967c71e859fddbea15aee89f6ddae44344c87b
2023-08-14 10:47:53 -07:00
James E. Blair 99d2a361a1 Use cached ids in node iterator more often
There are several places where it is now probably safe to use
cached ids when iterating over ZK nodes.  The main reasons not to
use cached ids are in the case of race conditions or in case the
tree cache may have missed an event and is unaware of a node.  We
have increased confidence in the accuracy of our cache now, so at
least in the cases where we know that races are not an issue, we
can switch those to use cached ids and save a ZK round trip (a
possibly significant one if there is a long list of children).

This change adds the flag in the following places (with
explanations of why it's safe):

* State machine cleanup routines

    Leaked instances have to show up on two subsequent calls to
    be acted upon, so this is not sensitive to timing

* Quota calculation

    If we do get the quota wrong, drivers are expected to handle
    that gracefully anyway.

* Held node cleanup

    Worst case is we wait until next iteration to clean up.

* Stats

    They're a snapshot anyway, so a cache mismatch is really just
    a shift in the snapshot time.

Change-Id: Ie7af2f62188951bf302ffdb64827d868609a1e3c
2023-05-30 13:27:45 -07:00
James E. Blair 1323d0b556 Update some variable names
Now that the component we registered is a "pool" change the call
sites to use "launcher_pools" instead of "launchers".  This may
reduce some ambiguity.

(s/launcher/pool/ might still be ambiguous since it may not be clear
whethere we're talking about our own pools or other pools; thus the
choice of "launcher_pool" for the variable name.)

Also, remove a redundant test assertion.

Change-Id: I865883cdb115bf72a3bd034d9290f60666d64b66
2022-05-23 13:30:50 -07:00
James E. Blair a612aa603c Add the component registry from Zuul
This uses a cache and lets us update metadata about components
and act on changes quickly (as compared to the current launcher
registry which doesn't have provision for live updates).

This removes the launcher registry, so operators should take care
to update all launchers within a short period of time since the
functionality to yield to a specific provider depends on it.

Change-Id: I6409db0edf022d711f4e825e2b3eb487e7a79922
2022-05-23 07:41:27 -07:00
James E. Blair 10df93540f Use Zuul-style ZooKeeper connections
We have made many improvements to connection handling in Zuul.
Bring those back to Nodepool by copying over the zuul/zk directory
which has our base ZK connection classes.

This will enable us to bring other Zuul classes over, such as the
component registry.

The existing connection-related code is removed and the remaining
model-style code is moved to nodepool.zk.zookeeper.  Almost every
file imported the model as nodepool.zk, so import adjustments are
made to compensate while keeping the code more or less as-is.

Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3
2022-05-23 07:40:20 -07:00
Benjamin Schanzel 74c5c00305 Export current tenant limit stats
This adds a new statsd gauge which, in addition to the existing provider
limits, exports the currently configured tenant limits. This is in the
form ``nodepool.tenant_limits.TENANT.[cores,ram,instances]``.

Change-Id: I8e10a0974210d25d071dbbd63849a921fc8b79a2
2022-01-07 09:27:31 +01:00
Fabien Boucher f57ac1881a
Remove uneeded shebang and exec bit on some files
Having python files with exec bit and shebang defined in
/usr/lib/python-*/site-package/ is not fine in a RPM package.

Instead of carrying a patch in nodepool RPM packaging better
to fix this directly upstream.

Change-Id: I5a01e21243f175d28c67376941149e357cdacd26
2019-12-13 19:30:03 +01:00
Tobias Henkel 64487baef0
Asynchronously update node statistics
We currently updarte the node statistics on every node launch or
delete. This cannot use caching at the moment because when the
statistics are updated we might end up pushing slightly outdated
data. If then there is no further update for a longer time we end up
with broken gauges. We already get update events from the node cache
so we can use that to centrally trigger node statistics updates.

This is combined with leader election so there is only a single
launcher that keeps the statistics up to date. This will ensure that
the statistics are not cluttered because of several launchers pushing
their own slightly different view into the stats.

As a side effect this reduces the runtime of a test that creates 200
nodes from 100s to 70s on my local machine.

Change-Id: I77c6edc1db45b5b45be1812cf19eea66fdfab014
2018-11-29 16:48:30 +01:00
Tobias Henkel 56bac6e9cb
Support node caching in the nodeIterator
This adds support to return cached data by the nodeIterator. This can
be done easily by utilizing the TreeCache recipe of kazoo.

Depends-On: https://review.openstack.org/616398
Change-Id: I23a992417d186b712864f2b00e79bc88bbfca967
2018-11-08 11:01:06 +01:00
Tobias Henkel b01c7821e5
Initialize label statistics to zero
The label statistics age gauges that keep their values. Currently
nodepool doesn't initialize the label based node statistics to
zero. Because of this any label state that shows up in the statistics
won't be reset to zero and nodepool is stopping to send updates to
this gauge. This leads to graphs that are mostly stuck to 1 if no
nodes are currently in this state.

This can be fixed by iterating over the supported labels and
initializing all states of them to zero.

Change-Id: I6c7f63f8f64a83b225386f6da567bfae5141be7b
2018-10-18 15:45:42 +02:00
Tobias Henkel bb3c2dbf1b
Fix label name in reported label stats
Now the node type is a list of labels. This is not considered when
updating the node stats so the label in the reported stats is the
string representation of that list. Instead iterate over that list and
increase the counter for every label.

Change-Id: Id64cb933e310fa056deab0b63e9e02d451de5973
2018-09-07 08:20:57 +02:00
Ian Wienand cd9aa75640 Use pipelines for stats keys
Pipelines buffer stats and then send them out in more reasonable sized
chunks, helping to avoid small UDP packets going missing in a flood of
stats.  Use this in stats.py.

This needs a slight change to the assertedStats handler to extract the
combined stats.  This function is ported from Zuul where we updated to
handle pipeline stats (Id4f6f5a6cd66581a81299ed5c67a5c49c95c9b52) so
it is not really new code.

Change-Id: I3f68450c7164d1cf0f1f57f9a31e5dca2f72bc43
2018-07-25 16:46:13 +10:00
Tristan Cacqueray 11f0ffd201 Refactor NodeLauncher to be generic
This change refactors the NodeLauncher object so that it is part
of the generic interface. A nodepool driver handler only need to
return a subclass implementing the launch method.

Moreover this change adapts the StatsReporter class.

Change-Id: I6cfec650b862cb4fa0cb391bcc1248549e30c91b
2018-04-19 02:23:42 +00:00
James E. Blair 882b06ca6b Revert "Add support for STATSD_IPV6"
This reverts commit 58638db5e7.

This change had no docs or tests.

Change-Id: I85a539fc69150986d0644dfa20b24876f7705d1d
2018-03-26 15:31:39 -07:00
Paul Belanger 58638db5e7
Add support for STATSD_IPV6
If we want to support statsd over IPV6, we need to allow a user to
export STATSD_IPV6=True.

Change-Id: I7f9e5fd01e53dafab8e006f56a3598d5fcd0f6f2
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2018-03-25 10:03:30 -04:00
Tobias Henkel 7d79770840 Do pep8 housekeeping according to zuul rules
The pep8 rules used in nodepool are somewhat broken. In preparation to
use the pep8 ruleset from zuul we need to fix the findings upfront.

Change-Id: I9fb2a80db7671c590cdb8effbd1a1102aaa3aff8
2018-01-17 02:17:45 +00:00
Tobias Henkel 9a4570844a Make max-servers optional
Nodepool now respects the quotas of the tenant it is safe to make
max-servers optional for the pool.

Change-Id: I17731036ad0d8e33f35edb395a0caa2632026c24
2017-12-13 20:44:13 +01:00
Tristan Cacqueray 4d201328f5 Collect request handling implementation in an OpenStack driver
This change moves OpenStack related code to a driver. To avoid circular
import, this change also moves the StatsReporter to the stats module so that
the handlers doesn't have to import the launcher.

Change-Id: I319ce8780aa7e81b079c3f31d546b89eca6cf5f4
Story: 2001044
Task: 4614
2017-07-25 14:27:17 +00:00
Ian Wienand d5a2bb2c05 Don't use global statsd
Some history to motivate the change : the original statsd 2.0 era
worked by just doing "import statsd" and it setup a global "statsd"
object based on env variables for you, or set it to None if they
didn't exist.

With statsd 3.0 this changed semantics slightly to provide default
values, so to maintain the status-quo we added the "stats" wrapper
(I4791c9d26f2309f78a556de42af5b9945005aa46) which did the same thing.

However, having a global object set at import time like this isn't the
best idea.  For example, when running unit tests, you may want to set
the statsd host to your fake logger, but if it is already setup at
import time your unit test can't override it when it calls the various
classes.  It creates other ordering problems as we are splitting up
nodepool into more components.

Thus we move to creating a separate client in each object as it is
instantiated.  To maintain the existing behaviour of returning "None"
if the env variables aren't set we keep it in stats.py behind a new
function.  All stats callers are modified to get a client in their
__init__()

See also: Ib84655378bdb7c7c3c66bf6187b462b3be2f908d -- similar changes
for zuul

Change-Id: I6d339a8c631f8508a60e9ef890173780157adefd
2016-01-15 12:38:21 +11:00
Ian Wienand 5662105de4 Convert to use latest statsd version
statsd >= 3.0 changed the way it initializes itself; to start-up from
environment variables you need to import from 'statsd.defaults.env'.
It is also slightly different in that it provides default values; so
we check if the environment variable is set and avoid importing it if
statsd isn't configured.

This moves the statsd object creation into a common module so it can
be shared rather than create multiple clients.

Documentation is also updated to describe how to configure this

Change-Id: I4791c9d26f2309f78a556de42af5b9945005aa46
2015-05-26 15:53:08 +10:00