Commit Graph

25 Commits

Author SHA1 Message Date
James E. Blair 3815cce7aa Change image ID from int sequence to UUID
When we export and import image data (for backup/restore purposes),
we need to reset the ZK sequence counter for image builds in order
to avoid collisions.  The only way we can do that is to create and
then delete a large number of znodes.  Some sites (including
OpenDev) have sequence numbers that are in the hundreds of thousands.

To avoid this time-consuming operation (which is only intended to
be run to restore from backup -- when operators are already under
additional stress!), this change switches the build IDs from integer
sequences to UUIDs.  This avoids the problem with collisions after
import (at least, to the degree that UUIDs avoid collisions).

The actual change is fairly simple, but many unit tests need to be
updated.

Since the change is user-visible in the command output (image lists,
etc), a release note is added.

A related change which updates all of the textual references of
build "number" to build "id" follows this one for clarity and ease
of review.

Change-Id: Ie7c68b094bc9733914a808756eeee8b62f696713
2023-08-02 11:18:15 -07:00
James E. Blair b0a40f0b47 Use image cache when launching nodes
We consult ZooKeeper to determine the most recent image upload
when we decide whether we should accept or decline a request.  If
we accept the request, we also consult it again for the same
information when we start building the node.  In both cases, we
can use the cache to avoid what may potentially be (especially in
the case of a large number of images or uploads) quite a lot of
ZK requests.  Our cache should be almost up to date (typically
milliseconds, or at the worst, seconds behind), and the worst
case is equivalent to what would happen if an image build took
just a few seconds longer.  The tradeoff is worth it.

Similarly, when we create min-ready requests, we can also consult
the cache.

With those 3 changes, all references to getMostRecentImageUpload
in Nodepool use the cache.

The original un-cached method is kept as well, because there are
an enormous number of references to it in the unit tests and they
don't have caching enabled.

In order to reduce the chances of races in many tests, the startup
sequence is normalized to:
1) start the builder
2) wait for an image to be available
3) start the launcher
4) check that the image cache in the launcher matches what
   is actually in ZK

This sequence (apart from #4) was already used by a minority of
tests (mostly newer tests).  Older tests have been updated.
A helper method, startPool, implements #4 and additionally includes
the wait_for_config method which was used by a random assortment
of tests.

Change-Id: Iac1ff8adfbdb8eb9a286929a59cf07cd0b4ac7ad
2023-04-10 15:57:01 -07:00
James E. Blair be3edd3e17 Convert openstack driver to statemachine
This updates the OpenStack driver to use the statemachine framework.

The goal is to revise all remaining drivers to use the statemachine
framework for two reasons:

1) We can dramatically reduce the number of threads in Nodepool which
is our biggest scaling bottleneck.  The OpenStack driver already
includes some work in that direction, but in a way that is unique
to it and not easily shared by other drivers.  The statemachine
framework is an extension of that idea implemented so that every driver
can use it.  This change further reduces the number of threads needed
even for the openstack driver.

2) By unifying all the drivers with a simple interface, we can prepare
to move them into Zuul.

There are a few updates to the statemachine framework to accomodate some
features that only the OpenStack driver used to date.

A number of tests need slight alteration since the openstack driver is
the basis of the "fake" driver used for tests.

Change-Id: Ie59a4e9f09990622b192ad840d9c948db717cce2
2023-01-10 10:30:14 -08:00
Zuul 9d98386b62 Merge "Add username to detailed node list output" 2022-12-09 20:26:39 +00:00
Zuul 9dd883107a Merge "Add hold command to disable nodes" 2022-11-30 20:05:41 +00:00
Dr. Jens Harbott abe6ba9759 Add username to detailed node list output
With the static driver it is possible to have multiple static nodes
defined that only differ by their username. In order to be able to
distinguish them, include the username in the output of the
"nodepool list --detail" output.

Signed-off-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: I702ccbf412731ef3400eb722386629356a244334
2022-11-14 10:24:50 +01:00
Benedikt Loeffler 44c708dd26 Cleanup local builds without .d folder
When using a custom image build tool the .d folder does not exists and
the cleanup of those local builds is skipped.
For these reason we should not relei on the .d folder, instead we
should look on all create images files and determinate the local builds
based on these.

Change-Id: I1c60af3347868089ebe489ddcadfbad2dc8fadde
2022-10-13 14:50:49 +02:00
mbecker 1658aa9851 Add hold command to disable nodes
This allows nodes to be set in an idle state
so that they will not have jobs scheduled
while e.g. maintenance tasks are performed.
This is probably most useful for static nodes.

Change-Id: Iebc6b909f370fca11fab2be0b8805d4daef33afe
2022-10-13 12:43:34 +02:00
James E. Blair eb9121a733 Include user/driver data in node detail list
This lets operators see the user_data and driver_data in the node
list when passing --detail.  That can be useful for identifying
issues with the metastatic driver, or any other debugging which
would benefit from seeing that data.

Change-Id: I2f36ce98a183b7a8e289376f2228b6370900a057
2022-06-30 15:25:25 -07:00
Zuul e668905f08 Merge "Update ZooKeeper class connection methods" 2022-06-29 20:40:40 +00:00
James E. Blair 7bbdfdc9fd Update ZooKeeper class connection methods
This updates the ZooKeeper class to inherit from ZooKeeperBase
and utilize its connection methods.

It also moves the connection loss detection used by the builder
to be more localized and removes unused methods.

Change-Id: I6c9dbe17976560bc024f74cd31bdb6305d51168d
2022-06-29 07:46:34 -07:00
James E. Blair 6386170914 Support deleting DIB images while builder is offline
We don't currently support deleting a DIB image while the builder
that built it is offline.  The reason for this is to ensure that
we actually remove the files from disk on the builder.  The mechanism
is for all other builders to defer handling "DELETING" image nodes
in ZK to the builder which built them.

This can be problematic if the builder is offline for an extended
period, or permanently.

To address this case without compromising the original goal, we now
let any builder delete the uploads and ZK nodes for a DIB build.
Subsequently, every builder will now look for DIB manifest directories
within its image-dir, and if it finds one that does not have a
corresponding ZK node, it garbage collects that image from disk.

Change-Id: I65efb31ca02cea4bcf7ef8f962a00b5263ccf69c
2022-06-27 13:03:27 -07:00
Zuul b4e9e4a52c Merge "Convert dib-request-list to image-status command" 2022-06-23 20:10:21 +00:00
Zuul 72a0b622b2 Merge "Expose image build requests in web UI and cli" 2022-06-23 20:02:16 +00:00
James E. Blair 138b68a5a7 Convert dib-request-list to image-status command
This augments the dib-request list (which shows what images have
manual build requests) with information about whether the image
is paused.  The resulting command is renamed to "image-status".

Change-Id: If75a8757b4ec93563e47bfdf0a239a9c21660c45
2022-06-21 14:12:22 -07:00
Simon Westphahl d6e8bd72df Expose image build requests in web UI and cli
Image build requests can now be retrieved through the /dib-request-list
endpoint or via the dib-request-list sub-command. The list will show the
age of the request and if it is still pending or if there is already a
build in progress.

Change-Id: If73d6c9fcd5bd94318f389771248604a7f51c449
2022-06-21 13:32:35 -07:00
James E. Blair cacef76d3a Avoid collisions after ZK image data import
When image data are imported, if there are holes in the sequence
numbers, ZooKeeper may register a collision after nodepool-builder
builds or uploads a new image.  This is because ZooKeeper stores
a sequence node counter in the parent node, and we lose that
information when exporting/importing.  Newly built images can end
up with the same sequence numbers as imported images.  To avoid this,
re-create missing sequence nodes so that the import state more
closely matches the export state.

Change-Id: I0b96ebecc53dcf47324b8a009af749a3c04e574c
2022-06-20 13:00:05 -07:00
James E. Blair 10df93540f Use Zuul-style ZooKeeper connections
We have made many improvements to connection handling in Zuul.
Bring those back to Nodepool by copying over the zuul/zk directory
which has our base ZK connection classes.

This will enable us to bring other Zuul classes over, such as the
component registry.

The existing connection-related code is removed and the remaining
model-style code is moved to nodepool.zk.zookeeper.  Almost every
file imported the model as nodepool.zk, so import adjustments are
made to compensate while keeping the code more or less as-is.

Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3
2022-05-23 07:40:20 -07:00
likui 6d5cf05253 Replace deprecated assertEquals
The assertEquals method has been deprecated since it was renamed
to assertEqual in Python 3.2.

https: //docs.python.org/3/library/unittest.html#deprecated-aliases
Change-Id: I306d43862eb6c7a36dad1d3a50822c2758fae5fe
2021-11-29 17:29:22 +08:00
James E. Blair 0b1fa1d57d Add commands to export/import image data from ZK
Change-Id: Id1ac6403f4fe80059b90900c519e56bca7dee0a0
2021-08-24 10:28:39 -07:00
James E. Blair 00d62af3c3 Add image-pause CLI command
This adds a CLI commend to set a flag in ZK for images indicating
that the image should be paused.  This can be used to quickly pause
the building and uploading of one or more images globally.  This
will effectively be boolean OR'd with the pause value for diskimage
builds in the config file.

In particular, this can be used to pause images for short durations,
either because a fix is imminent, or to allow the system to remain
stable while a configuration change goes through the CI/CD workflow.

Change-Id: I21a573dfc337c51f319afe3695d5446b2c91d70b
2020-08-20 15:48:03 -07:00
David Shrewsbury 394d549c07 Support image uploads in 'info' CLI command
Change the 'info' command output to include image upload data.
For each image, we'll now output each build and the uploads for the build.

Change-Id: Ib25ce30d30ed718b2b6083c2127fdb214c3691f4
2020-03-19 15:03:34 -04:00
Tobias Henkel 35094dbb62
Add second level cache of nodes
An earlier change introduced zNode caching of Nodes. This first
version still required frequent re-parsing of the node json data. This
change introduces a second level cache that updates a dict of the
current nodes based on the events emitted by the TreeCache. Using this
we can reduce json parsing and make node caching mode effective.

Change-Id: I4834a7ea722cf2ac7df79455ce077832ae966e63
2018-11-26 20:04:04 +01:00
Tobias Henkel 56bac6e9cb
Support node caching in the nodeIterator
This adds support to return cached data by the nodeIterator. This can
be done easily by utilizing the TreeCache recipe of kazoo.

Depends-On: https://review.openstack.org/616398
Change-Id: I23a992417d186b712864f2b00e79bc88bbfca967
2018-11-08 11:01:06 +01:00
David Shrewsbury 511ffd9c29
Add tox functional testing for drivers
Reorganizes the unit tests into a new subdirectory, and
adds a new functional test subdirectory (that will be further
divided into subdirs for drivers).

Change-Id: I027bba619deef9379d293fdd0457df6a1377aea8
2018-11-01 15:33:44 +01:00