Commit Graph

127 Commits

Author SHA1 Message Date
James E. Blair d6b13dd47f Fix inheritance of delete-after-upload option
This diskimage option was not correctly inherited when using
a parent diskimage.

Change-Id: If48c218a0b8b9432e54a55bd947b04016db78479
2024-03-28 09:19:15 -07:00
James E. Blair fd454706ca Add delete-after-upload option
This allows operators to delete large diskimage files after uploads
are complete, in order to save space.

A setting is also provided to keep certain formats, so that if
operators would like to delete large formats such as "raw" while
retaining a qcow2 copy (which, in an emergency, could be used to
inspect the image, or manually converted and uploaded for use),
that is possible.

Change-Id: I97ca3422044174f956d6c5c3c35c2dbba9b4cadf
2024-03-09 06:51:56 -08:00
Simon Westphahl 4ae0a6f9a6
Refactor config loading in builder and launcher
In I93400cc156d09ea1add4fc753846df923242c0e6 we've refactore the
launcher config loading to use the last modified timestamps of the
config files to detect if a reload is necessary.

In the builder the situation is even worse as we reload and compare the
config much more often e.g. in the build worker when checking for manual
or scheduled image updates.

With a larger config (2-3MB range) this is a significant performance
problem that can lead to builders being busy with config loading instead
of building images.

Yappi profile (performed with the optimization proposed in
I786daa20ca428039a44d14b1e389d4d3fd62a735, which doesn't fully solve the
problem):

name                                  ncall  tsub      ttot      tavg
..py:880 AwsProviderDiskImage.__eq__  812..  17346.57  27435.41  0.000034
..odepool/config.py:281 Label.__eq__  155..  1.189220  27403.11  0.176285
..643 BuildWorker._checkConfigRecent  58     0.000000  27031.40  466.0586
..depool/config.py:118 Config.__eq__  58     0.000000  26733.50  460.9225

Change-Id: I929bdb757eb9e077012b530f6f872bea96ec8bbc
2024-01-30 13:59:36 +01:00
James E. Blair de02ac5a20 Add OpenStack volume quota
This adds support for staying within OpenStack volume quota limits
on instances that utilize boot-from-volume.

Change-Id: I1b7bc177581d23cecd9443a392fb058176409c46
2023-02-13 06:56:03 -08:00
James E. Blair 916d62a374 Allow specifying diskimage metadata/tags
For drivers that support tagging/metadata (openstack, aws, azure),
Add or enhance support for supplying tags for uploaded diskimages.

This allows users to set metadata on the global diskimage object
which will then be used as default values for metadata on the
provider diskimage values.  The resulting merged dictionary forms
the basis of metadata to be associated with the uploaded image.

The changes needed to reconcile this for the three drivers mentioned
above are:

All: the diskimages[].meta key is added to supply the default values
for provider metadata.

OpenStack: provider diskimage metadata is already supported using
providers[].diskimages[].meta, so no further changes are needed.

AWS, Azure: provider diskimage tags are added using the key
providers[].diskimages[].tags since these providers already use
the "tags" nomenclature for instances.

This results in the somewhat incongruous situation where we have
diskimage "metadata" being combined with provider "tags", but it's
either that or have images with "metadata" while we have instances
with "tags", both of which are "tags" in EC2.  The chosen approach
has consistency within the driver.

Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8
2022-08-23 06:39:08 -07:00
Zuul 123a32f922 Merge "AWS multi quota support" 2022-07-29 17:01:09 +00:00
James E. Blair 207d8ac63c AWS multi quota support
This adds support for AWS quotas that are specific to instance types.

The current quota support in AWS assumes only the "standard" instance types,
but AWS has several additional types with particular specialties (high memory,
GPU, etc).  This adds automatic support for those by encoding their service
quota codes (like 'L-1216C47A') into the QuotaInformation object.

QuotaInformation accepts not only cores, ram, and instances as resource
values, but now also accepts arbitraly keys such as 'L-1216C47A'.
Extra testing of QI is added to ensure we handle the arithmetic correctly
in cases where one or the other operand does not have a resource counter.

The statemachine drivers did not encode their resource information into
the ZK Node record, so tenant quota was not operating correctly.  This is
now fixed.

The AWS driver now accepts max_cores, _instances, and _ram values similar
to the OpenStack driver.  It additionally accepts max_resources which can
be used to specify limits for arbitrary quotas like 'L-1216C47A'.

The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A'
so that, for example, high memory nodes may be limited by tenant.

The mapping of instance types to quota is manually maintained, however,
AWS doesn't seem to add new instance types too often, and those it does are
highly specialized.  If a new instance type is not handled internally, the
driver will not be able to calculate expected quota usage, but will still
operate until the new type is added to the mapping.

Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36
2022-07-25 14:41:07 -07:00
Zuul 9641b719f9 Merge "Use libyaml parsing when available" 2022-07-25 09:34:19 +00:00
James E. Blair 10df93540f Use Zuul-style ZooKeeper connections
We have made many improvements to connection handling in Zuul.
Bring those back to Nodepool by copying over the zuul/zk directory
which has our base ZK connection classes.

This will enable us to bring other Zuul classes over, such as the
component registry.

The existing connection-related code is removed and the remaining
model-style code is moved to nodepool.zk.zookeeper.  Almost every
file imported the model as nodepool.zk, so import adjustments are
made to compensate while keeping the code more or less as-is.

Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3
2022-05-23 07:40:20 -07:00
Zuul fc2e592d0d Merge "Add zookeeper-timeout connection config" 2022-03-24 15:23:02 +00:00
Tobias Henkel ec55126f6b
Add zookeeper-timeout connection config
The default zookeeper session timout is 10 seconds which is not enough
on a highly loaded nodepool. Like in zuul make this configurable so we
can avoid session losses.

Change-Id: Id7087141174c84c6cdcbb3933c233f5fa0e7d569
2022-02-23 23:01:11 +01:00
Benjamin Schanzel ee90100852 Add Tenant-Scoped Resource Quota
This change adds the option to put quota on resources on a per-tenant
basis (i.e. Zuul tenants).

It adds a new top-level config structure ``tenant-resource-limits``
under which one can specify a number of tenants, each with
``max-servers``, ``max-cores``, and ``max-ram`` limits.  These limits
are valid globally, i.e., for all providers. This is contrary to
currently existing provider and pool quotas, which only are consindered
for nodes of the same provider.

Change-Id: I0c0154db7d5edaa91a9fe21ebf6936e14cef4db7
2021-09-01 09:07:43 +02:00
Albin Vass 0c84b7fa4e Add shell-type config
Ansible needs to know which shell type the node uses to operate
correctly, especially for ssh connections for windows nodes because
otherwise ansible defaults to trying bash.

Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538
2021-03-05 15:14:29 +01:00
James E. Blair 4c5fa46540 Require TLS
Require TLS Zookeeper connections before making the 4.0 release.

Change-Id: I69acdcec0deddfdd191f094f13627ec1618142af
Depends-On: https://review.opendev.org/776696
2021-02-19 18:42:33 +00:00
Tobias Henkel 61963917a0
Use libyaml parsing when available
Nodepool periodically reloads its config. On systems with large
configs this can cause a substantial amount of cpu usage. Our
profiling of an idle provider revealed that the yaml parsing accounted
for ~15% cpu load while the provider was idle.

This could be improved by utilizing the native libyaml bindings when
available.

Change-Id: Icbed393c5aa00abd52d1e13ccbbad8546ec6cf97
2021-01-22 11:52:52 +01:00
Ian Wienand 45704178e9 rename imagesdir/elementdir with underscore
This makes absolutely no functional difference, but in a follow-on I
wish to add images_dir_required_free and the mix of underscores with
these values is inconsistent.

Change-Id: I2566afbd5c78c39bfa8f2fce32e2374a589b45aa
2020-11-26 15:15:37 +11:00
Zuul 6473a0049c Merge "config: add environment variable substitution" 2020-08-21 20:57:17 +00:00
Simon Westphahl 2ec2661655 Remove default qcow2 format in diskimage config
When removing a label from a provider that previously required raw
images (while still keeping the diskimage config), the image was
automatically rebuilt in qcow2 format.

It seems the original intent [0] of having the diskimage formats was to
allow building diskimages without needing a provider.

Because manually triggering a diskimage build without a format lead to a
failure, the qcow2 default was added [1] and later fixed [2] to only
provide a default when the diskimage wasn't used by any provider.

By removing the qcow2 default and preventing builds without a format, we
retain the ability to allow diskimage only builds when a format is
given. Otherwise we don't assume a default image format and prevent
builds with no image format.

[0] https://review.opendev.org/#/c/412160/
[1] https://review.opendev.org/#/c/566437/
[2] https://review.opendev.org/#/c/572836/

Change-Id: I374f40b5f9cfcd55e7a4f567fd6480c940f2bc20
2020-07-15 14:31:07 +02:00
Tristan Cacqueray eb9af85025 config: add environment variable substitution
This change enables setting configuration values through
environment variables. This is useful to manage user defined
configuration, such as user password, in Kubernetes deployment.

Change-Id: Iafbb63ebbb388ef3038f45fd3a929c3e7e2dc343
2020-05-20 11:44:49 +00:00
Zuul 06aacd17be Merge "diskimage.username setting was not read from configuration file" 2020-04-15 19:35:14 +00:00
Zuul 775cd32028 Merge "Add ZooKeeper TLS support" 2020-04-15 01:41:47 +00:00
Michał Suszko 314a6ac48a diskimage.username setting was not read from configuration file
Change-Id: I615e530decbee6a46167a40748342d2193851c02
2020-04-15 09:48:45 +10:00
James E. Blair b62fa3313d Add ZooKeeper TLS support
Change-Id: I009d9f90b32881aaef2d0694da6ff28074f48f8e
2020-04-14 16:03:53 -07:00
Ian Wienand 7cc22b4d3c Move config merge into DiskImage object
This was suggested in review of
I170016ef7d8443b9830912b9b0667370e6afcde7; the config merging fits
more naturally as a function in the DiskImage object.  Otherwise this
is a no-op refactor.

Change-Id: I01b49e1d8919fd18b687e5f56bbdda5b51aa1062
2020-03-20 07:53:08 +11:00
Ian Wienand b5b20b6e2c Add parent and abstract flags for diskimages
While YAML does have inbuilt support for anchors to greatly reduce
duplicated sections, anchors have no support for merging values.  For
diskimages, this can result in a lot of duplicated values for each
image which you can not otherwise avoid.

This provides two new values for diskimages; a "parent" and
"abstract".

Specifying a parent means you inherit all the configuration values
from that image.  Anything specified within the child image overwrites
the parent values as you would expect; caveats, as described in the
documentation, are that the elements field appends and the env-vars
field has update() semantics.

An "abstract" diskimage is not instantiated into a real image, it is
only used for configuration inheritance.  This way you can make a
abstrat "base" image with common values and inherit that everywhere
without having to worry about bringing in values you don't want.

You can also chain parents together and the inheritance flows through.

Documentation is updated, and several tests are added to ensure the
correct parenting, merging and override behaviour of the new values.

Change-Id: I170016ef7d8443b9830912b9b0667370e6afcde7
2020-03-20 07:53:08 +11:00
Ian Wienand 340df68a7b diskimage: make name primary key
Ensure 'name' is a primary key for diskimages.

Change the constructor to take the name as an argument.  Update the
config validator to ensure there is a name, and that it is unique.

Add tests for both these cases.

Change-Id: I3931dc1457c023154cde0df2bb7b0a41cc6f20d3
2020-03-20 07:53:08 +11:00
Ian Wienand 9f1a074ae8 DiskImage : alphabetise entries
This is a no-op refactor to just keep the fields in alphabetical
order.  This makes it much easier to visually keep the initalizer and
comparision functions in sync across any future changes that may
update the fields.

Note name is kept first, out of order, to signal it is a primary key
(see next follow-on I3931dc1457c023154cde0df2bb7b0a41cc6f20d3 where
this is enforced)

Change-Id: I0c177b3b0edee3cd811c029d3f03a08c9e11a5fb
2020-03-20 07:51:46 +11:00
Ian Wienand 107383df17 Correct dib_path typo
There's no dib_path, it should be dib_cmd

Change-Id: I0db2841118c95a96d529d7cc6eaa5e22b67794fb
2020-03-17 12:57:27 +00:00
Tobias Henkel 58ad5123f1
Fix resource warnings when running tests
There are some open calls that are not protected using with.

Change-Id: I98a45c4df38c7a22282fa6abf911a1815fb6bbfa
2019-12-21 11:52:58 +01:00
Fabien Boucher f57ac1881a
Remove uneeded shebang and exec bit on some files
Having python files with exec bit and shebang defined in
/usr/lib/python-*/site-package/ is not fine in a RPM package.

Instead of carrying a patch in nodepool RPM packaging better
to fix this directly upstream.

Change-Id: I5a01e21243f175d28c67376941149e357cdacd26
2019-12-13 19:30:03 +01:00
Ian Wienand db87a0845f Set default python-path to "auto"
The "python-path" configuration option makes its way through to Zuul
where it sets the "ansible_interpreter_path" in the inventory.
Currently this defaults to "/usr/bin/python2" which is wrong for
Python 3-only distributions.

Ansible >=2.8 provides for automated discovery of the interpreter to
avoid runtime errors choosing an invalid interpreter [1].  Using this
should mean that "python-path" doesn't need to be explicitly for any
common case.  As more distributions become Python 3 only, this should
"do the right thing" without further configuration.

This switches the default python-path to "auto".  The dependent change
updates Zuul to accept this and use it when running with Ansible
>=2.8, or default back to "/usr/bin/python2" for earlier Ansible
versions.

Testing and documentation is updated, and a release note added.

[1] https://docs.ansible.com/ansible/2.8/reference_appendices/interpreter_discovery.html

Depends-On: https://review.opendev.org/682275
Change-Id: I02a1a618c8806b150049e91b644ec3c0cb826ba4
2019-10-17 09:17:50 +11:00
Ian Wienand 9367cf8ed8 Add a dib-cmd option for diskimages
This change allows you to specify a dib-cmd parameter for disk images,
which overrides the default call to "disk-image-create".  This allows
you to essentially decide the disk-image-create binary to be called
for each disk image configured.

It is inspired by a couple of things:

The "--fake" argument to nodepool-builder has always been a bit of a
wart; a case of testing-only functionality leaking across into the
production code.  It would be clearer if the tests used exposed
methods to configure themselves to use the fake builder.

Because disk-image-create is called from the $PATH, it makes it more
difficult to use nodepool from a virtualenv.  You can not just run
"nodepool-builder"; you have to ". activate" the virtualenv before
running the daemon so that the path is set to find the virtualenv
disk-image-create.

In addressing activation issues by automatically choosing the
in-virtualenv binary in Ie0e24fa67b948a294aa46f8164b077c8670b4025, it
was pointed out that others are already using wrappers in various ways
where preferring the co-installed virtualenv version would break.

With this, such users can ensure they call the "disk-image-create"
binary they want.  We can then make a change to prefer the
co-installed version without fear of breaking.

In theory, there's no reason why a totally separate
"/custom/venv/bin/disk-image-create" would not be valid if you
required a customised dib for some reason for just one image.  This is
not currently possible, even modulo PATH hacks, etc., all images will
use the same binary to build.  It is for this flexibility I think this
is best at the diskimage level, rather than as, say a global setting
for the whole builder instance.

Thus add a dib-cmd option for diskimages.  In the testing case, this
points to the fake-image-create script, and the --fake command-line
option and related bits are removed.

It should have no backwards compatibility effects; documentation and a
release note is added.

Change-Id: I6677e11823df72f8c69973c83039a987b67eb2af
2019-08-22 10:09:00 +10:00
Tristan Cacqueray 76aa62230c Add python-path option to node
This change adds a new python_path Node attribute so that zuul executor
can remove the default hard-coded ansible_python_interpreter.

Change-Id: Iddf2cc6b2df579636ec39b091edcfe85a4a4ed10
2019-05-07 02:22:45 +00:00
Tristan Cacqueray 51074f5e66 config: use yaml.safe_load instead of load
This change replaces an insecure yaml.load procedure by yaml.safe_load.

Change-Id: Ie14935f604f23b0928eed0dd8e28dff49699a2d1
2019-03-18 04:14:43 +00:00
David Shrewsbury 890ea4975e Revert "Revert "Add a timeout for the image build""
This reverts commit ccf40a462a.

The previous version would not work properly when daemonized
because there was no stdout. This version maintains stdout and
uses select/poll with non-blocking stdout to capture the output
to a log file.

Depends-On: https://review.openstack.org/634266

Change-Id: I7f0617b91e071294fe6051d14475ead1d7df56b7
2019-01-31 11:36:47 -05:00
David Shrewsbury ccf40a462a Revert "Add a timeout for the image build"
This reverts commit 7225354ec0.

The disk-image-create command does not appear to be starting.

Change-Id: I81abe25a253a385cae08a57561129a678546f18f
2019-01-25 17:36:31 +00:00
David Shrewsbury 7225354ec0 Add a timeout for the image build
A builder thread can wedge if the build process wedges. Add a timeout
to the subprocess. Since it was the call to readline() that would block,
we change the process to have DIB write directly to the log. This allows
us to set a timeout in the Popen.wait() call. And we kill the dib
subprocess, as well.

The timeout value can be controlled in the diskimage configuration and
defaults to 8 hours.

Change-Id: I188e8a74dc39b55a4b50ade5c1a96832fea76a7d
2019-01-23 16:27:19 -05:00
Zuul 1b4c92262a Merge "builder: support setting diskimage env-vars in secure configuration" 2018-07-23 16:04:00 +00:00
Artem Goncharov fc1f80b6d1
Replace shade and os-client-config with openstacksdk.
os-client-config is now just a wrapper around openstacksdk. The shade
code has been imported into openstacksdk. To reduce complexity, just use
openstacksdk directly.

openstacksdk's TaskManager has had to grow some features to deal with
SwiftService. Making nodepool's TaskManager a subclass of openstacksdk's
TaskManager ensures that we get the thread pool set up properly.

Change-Id: I3a01eb18ae31cc3b61509984f3817378db832b47
2018-07-14 08:44:03 -05:00
Zuul 0d5f50b4e2 Merge "Fix adding qcow2 format without need" 2018-06-06 21:18:02 +00:00
Tobias Henkel 6ec75970b3
Fix adding qcow2 format without need
Currently nodepool is defaulting to the qcow2 if no format was given
in the diskimage config. However nodepool also can auto-detect the
image format based on clouds.yaml. This is not taken into account
leading to qcow2 added when using raw-only clouds and auto-detection.

This can be fixed by delaying the fallback to the point where we
actually do the build.

Change-Id: Ic1d9b5abd837a3225cc13da30a80ebf1dda61b64
2018-06-06 21:58:21 +02:00
James E. Blair e20858755f Have Drivers create Providers
Use the new Driver class to create instances of Providers

Change-Id: Idfbde8d773a971133b49fbc318385893be293fac
2018-06-06 14:57:40 -04:00
James E. Blair 82d8c51250 Create a base Driver class
This is, like the drivers in zuul, designed to be a single instance
per driver that survives for the life of the process.  It is used
to further instantiate driver-specific interfaces.

Here we have it return the config object for the driver (replacing
the previous system which loaded it from specific config files).

We also move the reset method from the ProviderConfig to the Driver
object.  It's currently only used to clear a global os client config
object, so this better matches its lifecycle.

Change-Id: I1f5a7be9c597be842bfe4dbea8f153d7a96d7b9a
2018-06-06 14:11:43 -04:00
Zuul b73049c910 Merge "Fix ConfigValue comparisons" 2018-06-04 16:46:46 +00:00
Tristan Cacqueray eca37d13ea builder: support setting diskimage env-vars in secure configuration
This change enables using diskimage-builder elements with secret securely.
For example, a rhel diskimage needs a REG_PASSWORD that could be define in
the secure file like so:

diskimages:
  - name: rhel-7
    env-vars:
      REG_PASSWORD: secret-password

Change-Id: I814318ae0b5c9e4665f3fa3f011d8a687b540fac
2018-05-29 23:48:13 +00:00
Artem Goncharov 483e51ed82 Default image creation to qcow2 type
It is described in the docs, that nodepool will build image only if it
is referred in one of the clouds, but it is possible to force that
creation: `nodepool image-build fedora-28`.

This leads to inability to define image types and an empty "-t"
argument to DIB; which causes the command invocation to fail.

Default the type value to qcow2.  This matches the default type that
dib will create and keeps with the general principle of nodepool
figuring out what type of image to make for you.

Change-Id: I27f8f45ba62e5ee0cca0ea7cd27893092b726ebd
Story: 2001963
2018-05-14 10:33:49 +10:00
David Shrewsbury 814160fb71 Fix ConfigValue comparisons
Our code was terribly inconsistent with implementing the __eq__()
method for ConfigValue based objects (if we did at all). If this
was implemented, not all of the attributes were used in comparison.

This change forces a ConfigValue derived object to implement the
comparison method, and fixes the places we neglected attributes.

Adds some very simple tests to at least exercise this code.

Change-Id: Ia2f600a9a4f3770087372bbc9f07531d5ea569e1
2018-05-11 15:17:33 -04:00
David Shrewsbury cf920b86b4 Clean up configuration file loading
This extends the config.Config() class to contain methods to
parse YAML top-level configuration sections. It makes the code
that loads the config easier to digest, and makes portions of
it reusable (e.g., setZooKeeperServers()).

The 'db' config attribute is removed since it was unused.

Change-Id: Ia6aa64f8ad378f022f0c422a06935610e8c50de4
2018-05-08 09:38:22 -04:00
mhuin a59528c8e8 Clean held nodes automatically after configurable timeout
Introduce a new configuration setting, "max_hold_age", that specifies
the maximum uptime of held instances. If set to 0, held instances
are kept until manually deleted. A custom value can be provided
at the rpcclient level.

Change-Id: I9a09728e5728c537ee44721f5d5e774dc0dcefa7
2018-02-20 16:13:55 +01:00
James E. Blair baa831192f Store build logs automatically
This updates the builder to store individual build logs in dedicated
files, one per build, named for the image and build id.  Old logs are
automatically pruned.  By default, they are stored in
/var/log/nodepool/builds, but this can be changed.

This removes the need to specially configure logging handler for the
image build logs.

Change-Id: Ia7415d2fbbb320f8eddc4e46c3a055414df5f997
2018-02-09 07:50:20 -08:00