Commit Graph

296 Commits

Author SHA1 Message Date
James E. Blair fd454706ca Add delete-after-upload option
This allows operators to delete large diskimage files after uploads
are complete, in order to save space.

A setting is also provided to keep certain formats, so that if
operators would like to delete large formats such as "raw" while
retaining a qcow2 copy (which, in an emergency, could be used to
inspect the image, or manually converted and uploaded for use),
that is possible.

Change-Id: I97ca3422044174f956d6c5c3c35c2dbba9b4cadf
2024-03-09 06:51:56 -08:00
James E. Blair 4ef3ebade8 Update references of build "number" to "id"
This follows the previous change and is intended to have little
or no behavior changes (only a few unit tests are updated to use
different placeholder values).  It updates all textual references
of build numbers to build ids to better reflect that they are
UUIDs instead of integers.

Change-Id: I04b5eec732918f5b9b712f8caab2ea4ec90e9a9f
2023-08-02 11:18:15 -07:00
James E. Blair 1dff39fccd Add REPL
As we continue to debug performance, it can be useful to have access
to a REPL as in Zuul.  Since there is no command socket in Nodepool
as there is in Zuul, the REPL must be engaged with a CLI argument
at startup.

Change-Id: Ieece1628494f79f39bd04216fc9ad7b725e541d8
2023-06-22 13:15:27 -07:00
James E. Blair be3edd3e17 Convert openstack driver to statemachine
This updates the OpenStack driver to use the statemachine framework.

The goal is to revise all remaining drivers to use the statemachine
framework for two reasons:

1) We can dramatically reduce the number of threads in Nodepool which
is our biggest scaling bottleneck.  The OpenStack driver already
includes some work in that direction, but in a way that is unique
to it and not easily shared by other drivers.  The statemachine
framework is an extension of that idea implemented so that every driver
can use it.  This change further reduces the number of threads needed
even for the openstack driver.

2) By unifying all the drivers with a simple interface, we can prepare
to move them into Zuul.

There are a few updates to the statemachine framework to accomodate some
features that only the OpenStack driver used to date.

A number of tests need slight alteration since the openstack driver is
the basis of the "fake" driver used for tests.

Change-Id: Ie59a4e9f09990622b192ad840d9c948db717cce2
2023-01-10 10:30:14 -08:00
Zuul 9d98386b62 Merge "Add username to detailed node list output" 2022-12-09 20:26:39 +00:00
Zuul 2c3d825885 Merge "Remove unecessary function" 2022-11-30 21:33:05 +00:00
Zuul 9dd883107a Merge "Add hold command to disable nodes" 2022-11-30 20:05:41 +00:00
James E. Blair 44bf72785d Remove unecessary function
There was only one caller to the change_node_state function, and
it was the one-line hold function.  Remove the unecessary indirection.

Change-Id: I13efc73edf807c8cb6b430f207bf7cb067520ba3
2022-11-30 08:29:17 -08:00
Dr. Jens Harbott abe6ba9759 Add username to detailed node list output
With the static driver it is possible to have multiple static nodes
defined that only differ by their username. In order to be able to
distinguish them, include the username in the output of the
"nodepool list --detail" output.

Signed-off-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: I702ccbf412731ef3400eb722386629356a244334
2022-11-14 10:24:50 +01:00
mbecker 1658aa9851 Add hold command to disable nodes
This allows nodes to be set in an idle state
so that they will not have jobs scheduled
while e.g. maintenance tasks are performed.
This is probably most useful for static nodes.

Change-Id: Iebc6b909f370fca11fab2be0b8805d4daef33afe
2022-10-13 12:43:34 +02:00
Clark Boylan 00b20b0b39 Check labels use valid diskimages in config validator
OpenDev tripped over this. Add validation to the validator tool that
provider pool labels use diskimages that are defined in the main
diskimages list.

Change-Id: Icbfaaa6342dfcc1d555f9b45f278d0e59467f2b3
2022-09-20 12:37:00 -07:00
James E. Blair 916d62a374 Allow specifying diskimage metadata/tags
For drivers that support tagging/metadata (openstack, aws, azure),
Add or enhance support for supplying tags for uploaded diskimages.

This allows users to set metadata on the global diskimage object
which will then be used as default values for metadata on the
provider diskimage values.  The resulting merged dictionary forms
the basis of metadata to be associated with the uploaded image.

The changes needed to reconcile this for the three drivers mentioned
above are:

All: the diskimages[].meta key is added to supply the default values
for provider metadata.

OpenStack: provider diskimage metadata is already supported using
providers[].diskimages[].meta, so no further changes are needed.

AWS, Azure: provider diskimage tags are added using the key
providers[].diskimages[].tags since these providers already use
the "tags" nomenclature for instances.

This results in the somewhat incongruous situation where we have
diskimage "metadata" being combined with provider "tags", but it's
either that or have images with "metadata" while we have instances
with "tags", both of which are "tags" in EC2.  The chosen approach
has consistency within the driver.

Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8
2022-08-23 06:39:08 -07:00
Zuul 2455b4de12 Merge "Include user/driver data in node detail list" 2022-08-05 23:47:17 +00:00
Zuul d39e651dc2 Merge "Validation check for missing openstack diskimages" 2022-08-05 21:04:36 +00:00
Dr. Jens Harbott 0293064ed9 Validation check for missing openstack diskimages
Having a diskimage in an openstack provider which isn't defined as a
top-level diskimage causes nodepool-builder to fail. Check for this
condition in the config-validator.

Change-Id: I2862386e20292fd370635b5ff45086937482dfde
2022-08-05 17:44:29 +00:00
Zuul 123a32f922 Merge "AWS multi quota support" 2022-07-29 17:01:09 +00:00
James E. Blair 207d8ac63c AWS multi quota support
This adds support for AWS quotas that are specific to instance types.

The current quota support in AWS assumes only the "standard" instance types,
but AWS has several additional types with particular specialties (high memory,
GPU, etc).  This adds automatic support for those by encoding their service
quota codes (like 'L-1216C47A') into the QuotaInformation object.

QuotaInformation accepts not only cores, ram, and instances as resource
values, but now also accepts arbitraly keys such as 'L-1216C47A'.
Extra testing of QI is added to ensure we handle the arithmetic correctly
in cases where one or the other operand does not have a resource counter.

The statemachine drivers did not encode their resource information into
the ZK Node record, so tenant quota was not operating correctly.  This is
now fixed.

The AWS driver now accepts max_cores, _instances, and _ram values similar
to the OpenStack driver.  It additionally accepts max_resources which can
be used to specify limits for arbitrary quotas like 'L-1216C47A'.

The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A'
so that, for example, high memory nodes may be limited by tenant.

The mapping of instance types to quota is manually maintained, however,
AWS doesn't seem to add new instance types too often, and those it does are
highly specialized.  If a new instance type is not handled internally, the
driver will not be able to calculate expected quota usage, but will still
operate until the new type is added to the mapping.

Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36
2022-07-25 14:41:07 -07:00
James E. Blair eb9121a733 Include user/driver data in node detail list
This lets operators see the user_data and driver_data in the node
list when passing --detail.  That can be useful for identifying
issues with the metastatic driver, or any other debugging which
would benefit from seeing that data.

Change-Id: I2f36ce98a183b7a8e289376f2228b6370900a057
2022-06-30 15:25:25 -07:00
James E. Blair 138b68a5a7 Convert dib-request-list to image-status command
This augments the dib-request list (which shows what images have
manual build requests) with information about whether the image
is paused.  The resulting command is renamed to "image-status".

Change-Id: If75a8757b4ec93563e47bfdf0a239a9c21660c45
2022-06-21 14:12:22 -07:00
Simon Westphahl d6e8bd72df Expose image build requests in web UI and cli
Image build requests can now be retrieved through the /dib-request-list
endpoint or via the dib-request-list sub-command. The list will show the
age of the request and if it is still pending or if there is already a
build in progress.

Change-Id: If73d6c9fcd5bd94318f389771248604a7f51c449
2022-06-21 13:32:35 -07:00
James E. Blair a63f128d73 Suppress component registry logging in command
Users of the "nodepool" command don't need to see the component
registry logs at info level (which output at least a line for each
connected component).  Set the minimum level to warning to avoid
that.

The component registry may still be useful for command-line use
in the future, so we leave it in place rather than disabling it
entirely.

Change-Id: I8c0937d7304ddc536773cf74fc40bbf6e79918d4
2022-06-20 07:07:01 -07:00
James E. Blair 10df93540f Use Zuul-style ZooKeeper connections
We have made many improvements to connection handling in Zuul.
Bring those back to Nodepool by copying over the zuul/zk directory
which has our base ZK connection classes.

This will enable us to bring other Zuul classes over, such as the
component registry.

The existing connection-related code is removed and the remaining
model-style code is moved to nodepool.zk.zookeeper.  Almost every
file imported the model as nodepool.zk, so import adjustments are
made to compensate while keeping the code more or less as-is.

Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3
2022-05-23 07:40:20 -07:00
Zuul fc2e592d0d Merge "Add zookeeper-timeout connection config" 2022-03-24 15:23:02 +00:00
Tobias Henkel ec55126f6b
Add zookeeper-timeout connection config
The default zookeeper session timout is 10 seconds which is not enough
on a highly loaded nodepool. Like in zuul make this configurable so we
can avoid session losses.

Change-Id: Id7087141174c84c6cdcbb3933c233f5fa0e7d569
2022-02-23 23:01:11 +01:00
Zuul 56ebfbf885 Merge "Add commands to export/import image data from ZK" 2021-09-15 22:01:46 +00:00
Benjamin Schanzel ee90100852 Add Tenant-Scoped Resource Quota
This change adds the option to put quota on resources on a per-tenant
basis (i.e. Zuul tenants).

It adds a new top-level config structure ``tenant-resource-limits``
under which one can specify a number of tenants, each with
``max-servers``, ``max-cores``, and ``max-ram`` limits.  These limits
are valid globally, i.e., for all providers. This is contrary to
currently existing provider and pool quotas, which only are consindered
for nodes of the same provider.

Change-Id: I0c0154db7d5edaa91a9fe21ebf6936e14cef4db7
2021-09-01 09:07:43 +02:00
James E. Blair 0b1fa1d57d Add commands to export/import image data from ZK
Change-Id: Id1ac6403f4fe80059b90900c519e56bca7dee0a0
2021-08-24 10:28:39 -07:00
James E. Blair 63f38dfd6c Support threadless deletes
The launcher implements deletes using threads, and unlike with
launches, does not give drivers an opportunity to override that
and handle them without threads (as we want to do in the state
machine driver).

To correct this, we move the NodeDeleter class from the launcher
to driver utils, and add a new driver Provider method that returns
the NodeDeleter thread.  This is added in the base Provider class
so all drivers get this behavior by default.

In the state machine driver, we override the method so that instead
of returning a thread, we start a state machine and add it to a list
of state machines that our internal state machine runner thread
should drive.

Change-Id: Iddb7ed23c741824b5727fe2d89c9ddbfc01cd7d7
2021-03-21 14:39:01 -07:00
Zuul 6473a0049c Merge "config: add environment variable substitution" 2020-08-21 20:57:17 +00:00
James E. Blair 00d62af3c3 Add image-pause CLI command
This adds a CLI commend to set a flag in ZK for images indicating
that the image should be paused.  This can be used to quickly pause
the building and uploading of one or more images globally.  This
will effectively be boolean OR'd with the pause value for diskimage
builds in the config file.

In particular, this can be used to pause images for short durations,
either because a fix is imminent, or to allow the system to remain
stable while a configuration change goes through the CI/CD workflow.

Change-Id: I21a573dfc337c51f319afe3695d5446b2c91d70b
2020-08-20 15:48:03 -07:00
Tristan Cacqueray eb9af85025 config: add environment variable substitution
This change enables setting configuration values through
environment variables. This is useful to manage user defined
configuration, such as user password, in Kubernetes deployment.

Change-Id: Iafbb63ebbb388ef3038f45fd3a929c3e7e2dc343
2020-05-20 11:44:49 +00:00
Zuul 4c521fd208 Merge "config_validator: refactor the schema to a static method" 2020-04-16 03:15:28 +00:00
Zuul 775cd32028 Merge "Add ZooKeeper TLS support" 2020-04-15 01:41:47 +00:00
James E. Blair b62fa3313d Add ZooKeeper TLS support
Change-Id: I009d9f90b32881aaef2d0694da6ff28074f48f8e
2020-04-14 16:03:53 -07:00
Tristan Cacqueray c31158e2f7 config_validator: refactor the schema to a static method
This change moves the top_level schema to a static method so that
it can be used externally.

Change-Id: Ifa4849e3de7731957b90130e080bf3331be44fa9
2020-04-11 13:47:30 +00:00
Zuul cdcedc63be Merge "Support image uploads in 'info' CLI command" 2020-04-07 22:02:52 +00:00
Ian Wienand b5b20b6e2c Add parent and abstract flags for diskimages
While YAML does have inbuilt support for anchors to greatly reduce
duplicated sections, anchors have no support for merging values.  For
diskimages, this can result in a lot of duplicated values for each
image which you can not otherwise avoid.

This provides two new values for diskimages; a "parent" and
"abstract".

Specifying a parent means you inherit all the configuration values
from that image.  Anything specified within the child image overwrites
the parent values as you would expect; caveats, as described in the
documentation, are that the elements field appends and the env-vars
field has update() semantics.

An "abstract" diskimage is not instantiated into a real image, it is
only used for configuration inheritance.  This way you can make a
abstrat "base" image with common values and inherit that everywhere
without having to worry about bringing in values you don't want.

You can also chain parents together and the inheritance flows through.

Documentation is updated, and several tests are added to ensure the
correct parenting, merging and override behaviour of the new values.

Change-Id: I170016ef7d8443b9830912b9b0667370e6afcde7
2020-03-20 07:53:08 +11:00
Ian Wienand 340df68a7b diskimage: make name primary key
Ensure 'name' is a primary key for diskimages.

Change the constructor to take the name as an argument.  Update the
config validator to ensure there is a name, and that it is unique.

Add tests for both these cases.

Change-Id: I3931dc1457c023154cde0df2bb7b0a41cc6f20d3
2020-03-20 07:53:08 +11:00
David Shrewsbury 394d549c07 Support image uploads in 'info' CLI command
Change the 'info' command output to include image upload data.
For each image, we'll now output each build and the uploads for the build.

Change-Id: Ib25ce30d30ed718b2b6083c2127fdb214c3691f4
2020-03-19 15:03:34 -04:00
Tobias Henkel 58ad5123f1
Fix resource warnings when running tests
There are some open calls that are not protected using with.

Change-Id: I98a45c4df38c7a22282fa6abf911a1815fb6bbfa
2019-12-21 11:52:58 +01:00
Fabien Boucher f57ac1881a
Remove uneeded shebang and exec bit on some files
Having python files with exec bit and shebang defined in
/usr/lib/python-*/site-package/ is not fine in a RPM package.

Instead of carrying a patch in nodepool RPM packaging better
to fix this directly upstream.

Change-Id: I5a01e21243f175d28c67376941149e357cdacd26
2019-12-13 19:30:03 +01:00
Ian Wienand ddbcf1b07d Validate openstack provider pool labels have top-level labels
We broke nodepool configuration with
I3795fee1530045363e3f629f0793cbe6e95c23ca by not having the labels
defined in the OpenStack provider in the top-level label list.

The added check here would have found such a case.

The validate() function is reworked slightly; previously it would
return various exceptions from the tools it was calling (YAML,
voluptuous, etc.).  Now we have more testing (and I'd imagine we could
do even more, similar vaildations too) we'd have to keep adding
exception types.  Just make the function return a value; this also
makes sure the regular exit paths are taken from the caller in
nodepoolcmd.py, rather than dying with an exception at whatever point.

A unit test is added.

Co-Authored-By: Mohammed Naser <mnaser@vexxhost.com>
Change-Id: I5455f5d7eb07abea34c11a3026d630dee62f2185
2019-10-15 15:32:32 +11:00
Ian Wienand 9367cf8ed8 Add a dib-cmd option for diskimages
This change allows you to specify a dib-cmd parameter for disk images,
which overrides the default call to "disk-image-create".  This allows
you to essentially decide the disk-image-create binary to be called
for each disk image configured.

It is inspired by a couple of things:

The "--fake" argument to nodepool-builder has always been a bit of a
wart; a case of testing-only functionality leaking across into the
production code.  It would be clearer if the tests used exposed
methods to configure themselves to use the fake builder.

Because disk-image-create is called from the $PATH, it makes it more
difficult to use nodepool from a virtualenv.  You can not just run
"nodepool-builder"; you have to ". activate" the virtualenv before
running the daemon so that the path is set to find the virtualenv
disk-image-create.

In addressing activation issues by automatically choosing the
in-virtualenv binary in Ie0e24fa67b948a294aa46f8164b077c8670b4025, it
was pointed out that others are already using wrappers in various ways
where preferring the co-installed virtualenv version would break.

With this, such users can ensure they call the "disk-image-create"
binary they want.  We can then make a change to prefer the
co-installed version without fear of breaking.

In theory, there's no reason why a totally separate
"/custom/venv/bin/disk-image-create" would not be valid if you
required a customised dib for some reason for just one image.  This is
not currently possible, even modulo PATH hacks, etc., all images will
use the same binary to build.  It is for this flexibility I think this
is best at the diskimage level, rather than as, say a global setting
for the whole builder instance.

Thus add a dib-cmd option for diskimages.  In the testing case, this
points to the fake-image-create script, and the --fake command-line
option and related bits are removed.

It should have no backwards compatibility effects; documentation and a
release note is added.

Change-Id: I6677e11823df72f8c69973c83039a987b67eb2af
2019-08-22 10:09:00 +10:00
Zuul e87a1e0ed8 Merge "Fix version number when installing dev release" 2019-05-10 17:13:07 +00:00
Paul Belanger a748468adb Fix version number when installing dev release
Now that we publish dev releases to pypi, we should also allow those
versions to be displayed with --version flag.

Change-Id: I045c9d5382a1035cd7678f9882e32d371f108555
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2019-05-08 14:18:37 -04:00
Tristan Cacqueray 76aa62230c Add python-path option to node
This change adds a new python_path Node attribute so that zuul executor
can remove the default hard-coded ansible_python_interpreter.

Change-Id: Iddf2cc6b2df579636ec39b091edcfe85a4a4ed10
2019-05-07 02:22:45 +00:00
David Shrewsbury 0a4d00fd82 Add support for yappi and objgraph output
This duplicates the logic in zuul, and makes us consistent with
current nodepool documentation that says we already support this.

Change-Id: Ib92272b302a5225726a830ee50571fb7ad96e457
2019-04-17 16:15:33 -04:00
Zuul 2b4fdb03d9 Merge "Remove unused use_taskmanager flag" 2019-04-05 19:58:37 +00:00
Monty Taylor 7618b714e2 Remove unused use_taskmanager flag
Now that there is no more TaskManager class, nor anything using
one, the use_taskmanager flag is vestigal. Clean it up so that we
don't have to pass it around to things anymore.

Change-Id: I7c1f766f948ad965ee5f07321743fbaebb54288a
2019-04-02 12:11:07 +00:00
David Shrewsbury 15fed047e1 Use yaml.safe_load instead of load
Change Ie14935f604f23b0928eed0dd8e28dff49699a2d1 altered one use of
this method, but this one was missed.

Change-Id: I299a12d73a6524f5097712f97342aed640786eea
2019-03-28 11:16:10 -04:00