Commit Graph

94 Commits

Author SHA1 Message Date
James E. Blair fd454706ca Add delete-after-upload option
This allows operators to delete large diskimage files after uploads
are complete, in order to save space.

A setting is also provided to keep certain formats, so that if
operators would like to delete large formats such as "raw" while
retaining a qcow2 copy (which, in an emergency, could be used to
inspect the image, or manually converted and uploaded for use),
that is possible.

Change-Id: I97ca3422044174f956d6c5c3c35c2dbba9b4cadf
2024-03-09 06:51:56 -08:00
Clark Boylan 00b20b0b39 Check labels use valid diskimages in config validator
OpenDev tripped over this. Add validation to the validator tool that
provider pool labels use diskimages that are defined in the main
diskimages list.

Change-Id: Icbfaaa6342dfcc1d555f9b45f278d0e59467f2b3
2022-09-20 12:37:00 -07:00
James E. Blair 916d62a374 Allow specifying diskimage metadata/tags
For drivers that support tagging/metadata (openstack, aws, azure),
Add or enhance support for supplying tags for uploaded diskimages.

This allows users to set metadata on the global diskimage object
which will then be used as default values for metadata on the
provider diskimage values.  The resulting merged dictionary forms
the basis of metadata to be associated with the uploaded image.

The changes needed to reconcile this for the three drivers mentioned
above are:

All: the diskimages[].meta key is added to supply the default values
for provider metadata.

OpenStack: provider diskimage metadata is already supported using
providers[].diskimages[].meta, so no further changes are needed.

AWS, Azure: provider diskimage tags are added using the key
providers[].diskimages[].tags since these providers already use
the "tags" nomenclature for instances.

This results in the somewhat incongruous situation where we have
diskimage "metadata" being combined with provider "tags", but it's
either that or have images with "metadata" while we have instances
with "tags", both of which are "tags" in EC2.  The chosen approach
has consistency within the driver.

Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8
2022-08-23 06:39:08 -07:00
Zuul d39e651dc2 Merge "Validation check for missing openstack diskimages" 2022-08-05 21:04:36 +00:00
Dr. Jens Harbott 0293064ed9 Validation check for missing openstack diskimages
Having a diskimage in an openstack provider which isn't defined as a
top-level diskimage causes nodepool-builder to fail. Check for this
condition in the config-validator.

Change-Id: I2862386e20292fd370635b5ff45086937482dfde
2022-08-05 17:44:29 +00:00
James E. Blair 207d8ac63c AWS multi quota support
This adds support for AWS quotas that are specific to instance types.

The current quota support in AWS assumes only the "standard" instance types,
but AWS has several additional types with particular specialties (high memory,
GPU, etc).  This adds automatic support for those by encoding their service
quota codes (like 'L-1216C47A') into the QuotaInformation object.

QuotaInformation accepts not only cores, ram, and instances as resource
values, but now also accepts arbitraly keys such as 'L-1216C47A'.
Extra testing of QI is added to ensure we handle the arithmetic correctly
in cases where one or the other operand does not have a resource counter.

The statemachine drivers did not encode their resource information into
the ZK Node record, so tenant quota was not operating correctly.  This is
now fixed.

The AWS driver now accepts max_cores, _instances, and _ram values similar
to the OpenStack driver.  It additionally accepts max_resources which can
be used to specify limits for arbitrary quotas like 'L-1216C47A'.

The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A'
so that, for example, high memory nodes may be limited by tenant.

The mapping of instance types to quota is manually maintained, however,
AWS doesn't seem to add new instance types too often, and those it does are
highly specialized.  If a new instance type is not handled internally, the
driver will not be able to calculate expected quota usage, but will still
operate until the new type is added to the mapping.

Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36
2022-07-25 14:41:07 -07:00
Zuul fc2e592d0d Merge "Add zookeeper-timeout connection config" 2022-03-24 15:23:02 +00:00
Tobias Henkel ec55126f6b
Add zookeeper-timeout connection config
The default zookeeper session timout is 10 seconds which is not enough
on a highly loaded nodepool. Like in zuul make this configurable so we
can avoid session losses.

Change-Id: Id7087141174c84c6cdcbb3933c233f5fa0e7d569
2022-02-23 23:01:11 +01:00
Benjamin Schanzel ee90100852 Add Tenant-Scoped Resource Quota
This change adds the option to put quota on resources on a per-tenant
basis (i.e. Zuul tenants).

It adds a new top-level config structure ``tenant-resource-limits``
under which one can specify a number of tenants, each with
``max-servers``, ``max-cores``, and ``max-ram`` limits.  These limits
are valid globally, i.e., for all providers. This is contrary to
currently existing provider and pool quotas, which only are consindered
for nodes of the same provider.

Change-Id: I0c0154db7d5edaa91a9fe21ebf6936e14cef4db7
2021-09-01 09:07:43 +02:00
Tristan Cacqueray eb9af85025 config: add environment variable substitution
This change enables setting configuration values through
environment variables. This is useful to manage user defined
configuration, such as user password, in Kubernetes deployment.

Change-Id: Iafbb63ebbb388ef3038f45fd3a929c3e7e2dc343
2020-05-20 11:44:49 +00:00
Zuul 4c521fd208 Merge "config_validator: refactor the schema to a static method" 2020-04-16 03:15:28 +00:00
Zuul 775cd32028 Merge "Add ZooKeeper TLS support" 2020-04-15 01:41:47 +00:00
James E. Blair b62fa3313d Add ZooKeeper TLS support
Change-Id: I009d9f90b32881aaef2d0694da6ff28074f48f8e
2020-04-14 16:03:53 -07:00
Tristan Cacqueray c31158e2f7 config_validator: refactor the schema to a static method
This change moves the top_level schema to a static method so that
it can be used externally.

Change-Id: Ifa4849e3de7731957b90130e080bf3331be44fa9
2020-04-11 13:47:30 +00:00
Ian Wienand b5b20b6e2c Add parent and abstract flags for diskimages
While YAML does have inbuilt support for anchors to greatly reduce
duplicated sections, anchors have no support for merging values.  For
diskimages, this can result in a lot of duplicated values for each
image which you can not otherwise avoid.

This provides two new values for diskimages; a "parent" and
"abstract".

Specifying a parent means you inherit all the configuration values
from that image.  Anything specified within the child image overwrites
the parent values as you would expect; caveats, as described in the
documentation, are that the elements field appends and the env-vars
field has update() semantics.

An "abstract" diskimage is not instantiated into a real image, it is
only used for configuration inheritance.  This way you can make a
abstrat "base" image with common values and inherit that everywhere
without having to worry about bringing in values you don't want.

You can also chain parents together and the inheritance flows through.

Documentation is updated, and several tests are added to ensure the
correct parenting, merging and override behaviour of the new values.

Change-Id: I170016ef7d8443b9830912b9b0667370e6afcde7
2020-03-20 07:53:08 +11:00
Ian Wienand 340df68a7b diskimage: make name primary key
Ensure 'name' is a primary key for diskimages.

Change the constructor to take the name as an argument.  Update the
config validator to ensure there is a name, and that it is unique.

Add tests for both these cases.

Change-Id: I3931dc1457c023154cde0df2bb7b0a41cc6f20d3
2020-03-20 07:53:08 +11:00
Tobias Henkel 58ad5123f1
Fix resource warnings when running tests
There are some open calls that are not protected using with.

Change-Id: I98a45c4df38c7a22282fa6abf911a1815fb6bbfa
2019-12-21 11:52:58 +01:00
Ian Wienand ddbcf1b07d Validate openstack provider pool labels have top-level labels
We broke nodepool configuration with
I3795fee1530045363e3f629f0793cbe6e95c23ca by not having the labels
defined in the OpenStack provider in the top-level label list.

The added check here would have found such a case.

The validate() function is reworked slightly; previously it would
return various exceptions from the tools it was calling (YAML,
voluptuous, etc.).  Now we have more testing (and I'd imagine we could
do even more, similar vaildations too) we'd have to keep adding
exception types.  Just make the function return a value; this also
makes sure the regular exit paths are taken from the caller in
nodepoolcmd.py, rather than dying with an exception at whatever point.

A unit test is added.

Co-Authored-By: Mohammed Naser <mnaser@vexxhost.com>
Change-Id: I5455f5d7eb07abea34c11a3026d630dee62f2185
2019-10-15 15:32:32 +11:00
Ian Wienand 9367cf8ed8 Add a dib-cmd option for diskimages
This change allows you to specify a dib-cmd parameter for disk images,
which overrides the default call to "disk-image-create".  This allows
you to essentially decide the disk-image-create binary to be called
for each disk image configured.

It is inspired by a couple of things:

The "--fake" argument to nodepool-builder has always been a bit of a
wart; a case of testing-only functionality leaking across into the
production code.  It would be clearer if the tests used exposed
methods to configure themselves to use the fake builder.

Because disk-image-create is called from the $PATH, it makes it more
difficult to use nodepool from a virtualenv.  You can not just run
"nodepool-builder"; you have to ". activate" the virtualenv before
running the daemon so that the path is set to find the virtualenv
disk-image-create.

In addressing activation issues by automatically choosing the
in-virtualenv binary in Ie0e24fa67b948a294aa46f8164b077c8670b4025, it
was pointed out that others are already using wrappers in various ways
where preferring the co-installed virtualenv version would break.

With this, such users can ensure they call the "disk-image-create"
binary they want.  We can then make a change to prefer the
co-installed version without fear of breaking.

In theory, there's no reason why a totally separate
"/custom/venv/bin/disk-image-create" would not be valid if you
required a customised dib for some reason for just one image.  This is
not currently possible, even modulo PATH hacks, etc., all images will
use the same binary to build.  It is for this flexibility I think this
is best at the diskimage level, rather than as, say a global setting
for the whole builder instance.

Thus add a dib-cmd option for diskimages.  In the testing case, this
points to the fake-image-create script, and the --fake command-line
option and related bits are removed.

It should have no backwards compatibility effects; documentation and a
release note is added.

Change-Id: I6677e11823df72f8c69973c83039a987b67eb2af
2019-08-22 10:09:00 +10:00
Tristan Cacqueray 76aa62230c Add python-path option to node
This change adds a new python_path Node attribute so that zuul executor
can remove the default hard-coded ansible_python_interpreter.

Change-Id: Iddf2cc6b2df579636ec39b091edcfe85a4a4ed10
2019-05-07 02:22:45 +00:00
David Shrewsbury 15fed047e1 Use yaml.safe_load instead of load
Change Ie14935f604f23b0928eed0dd8e28dff49699a2d1 altered one use of
this method, but this one was missed.

Change-Id: I299a12d73a6524f5097712f97342aed640786eea
2019-03-28 11:16:10 -04:00
David Shrewsbury 890ea4975e Revert "Revert "Add a timeout for the image build""
This reverts commit ccf40a462a.

The previous version would not work properly when daemonized
because there was no stdout. This version maintains stdout and
uses select/poll with non-blocking stdout to capture the output
to a log file.

Depends-On: https://review.openstack.org/634266

Change-Id: I7f0617b91e071294fe6051d14475ead1d7df56b7
2019-01-31 11:36:47 -05:00
David Shrewsbury ccf40a462a Revert "Add a timeout for the image build"
This reverts commit 7225354ec0.

The disk-image-create command does not appear to be starting.

Change-Id: I81abe25a253a385cae08a57561129a678546f18f
2019-01-25 17:36:31 +00:00
David Shrewsbury 7225354ec0 Add a timeout for the image build
A builder thread can wedge if the build process wedges. Add a timeout
to the subprocess. Since it was the call to readline() that would block,
we change the process to have DIB write directly to the log. This allows
us to set a timeout in the Popen.wait() call. And we kill the dib
subprocess, as well.

The timeout value can be controlled in the diskimage configuration and
defaults to 8 hours.

Change-Id: I188e8a74dc39b55a4b50ade5c1a96832fea76a7d
2019-01-23 16:27:19 -05:00
David Shrewsbury d6ef934b70 Extract common config parsing for ProviderConfig
Adds a ProviderConfig class method that can be called to get
the config schema for the common config options in a Provider.
Drivers are modified to call this method.

Change-Id: Ib67256dddc06d13eb7683226edaa8c8c10a73326
2019-01-07 12:34:05 +00:00
David Shrewsbury f19f89777d Rename get_schema to getSchema
This corrects an inconsistency with a class method name, where
camel case is the norm.

Change-Id: I0e02f2425c89b46a780ec99d8053fad4b04d3f9a
2018-03-07 21:24:24 -05:00
mhuin a59528c8e8 Clean held nodes automatically after configurable timeout
Introduce a new configuration setting, "max_hold_age", that specifies
the maximum uptime of held instances. If set to 0, held instances
are kept until manually deleted. A custom value can be provided
at the rpcclient level.

Change-Id: I9a09728e5728c537ee44721f5d5e774dc0dcefa7
2018-02-20 16:13:55 +01:00
James E. Blair baa831192f Store build logs automatically
This updates the builder to store individual build logs in dedicated
files, one per build, named for the image and build id.  Old logs are
automatically pruned.  By default, they are stored in
/var/log/nodepool/builds, but this can be changed.

This removes the need to specially configure logging handler for the
image build logs.

Change-Id: Ia7415d2fbbb320f8eddc4e46c3a055414df5f997
2018-02-09 07:50:20 -08:00
Tristan Cacqueray 6a716af6a2 Refactor provider config to driver module
This change adds a new ProviderConfig driver interface so that driver can
load and validate their config.

This change also adds a new provider abstract method 'cleanupLeakedResources'
that the openstack driver implements to clean floating ip. This removes the
need for a shared clean-floating-ip provider config.

Change-Id: I20319aa660ebf5fbe8df5d6af1d77028e1b18350
2017-11-29 05:22:12 +00:00
Rui Chen 32e1e0b616 Apply floating ip for node according to configuration
When we deploy nodepool and zuul instances in virtual machine of
cloud provider, the provisioned nodes may be in the same internal
network with nodepool and zuul instances, in that case we don't
have to allocate floating ip for nodes, zuul can talk with nodes
via fixed ip of virtual machines. So if we can customize the behavior,
save the quota of floating ip, it's awesome.

Note: Although option "floating_ip_source: None" in clouds.yaml can
decide to apply floating ip or not for specified cloud, but that impact
all the SDKs and tools that use the clouds.yaml, we should control
nodepool behavior flexibly and independently.

This patch add a bool option "auto-floating-ip" into each pool of
"provider" section in nodepool.conf

Change-Id: Ia9a1bed6dd4f6e39015bde660f52e4cd6addb26e
2017-11-22 08:34:57 +00:00
Jamie Lennox af0d58e985 Add username to build and upload information
The username should be included in the stored information so that when
this is passed over to zuul it can ssh to the correct username.

Change-Id: Ife0daa79f319aea04ed32513f99c73c460156941
2017-11-13 22:54:09 +01:00
Tristan Cacqueray c0e6d5112b Extend Nodepool configuration syntax to support multiple drivers
Change-Id: I220e8e71c1205174a0a7515899c9bb6c4cc6adcb
Story: 2001044
Task: 4616
2017-07-25 14:27:17 +00:00
James E. Blair a9952312c2 Add image-id and image-name options to cloud-images
The cloud-image name is currently used both to specify the image
in the cloud, and also as a cross-referencing key within the
nodepool config.  As such, it ends up being repeated within the config
(possibly quite often in large configurations).

Separate these functions so that an image can be identified once in
a cloud provider, and referenced from multiple labels with the internal
key.  This makes for improved readability in some cases (such as long
cloud image names, or specifying images by uuid), and reduces churn
when cloud image identifiers change.

Change-Id: I83f2902be4b9b73a949461b7f14da548066b9562
2017-06-14 15:18:07 -07:00
Tristan Cacqueray a0159428d7 Add webapp port and listen_address configuration
This change adds a webapp settings to nodepool.yaml to enable custom setting
for port and listen_address.

Change-Id: I0f41a0b131bc2a09c47a448c65471e052c0a9e88
2017-06-09 13:56:36 +00:00
Paul Belanger 1d0990a1c1
Add boot-from-volume support for nodes
For example, a cloud may get better preformance from a cinder volume
then the local compute drive. As a result, give nodepool to option to
choose if the server should boot from volume or not.

Change-Id: I3faefe99096fef1fe28816ac0a4b28c05ff7f0ec
Depends-On: If58cd96b0b9ce4569120d60fbceb2c23b2f7641d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-05-30 14:23:24 -04:00
Paul Belanger 1a804c7859
Add console-log to config-validate
Sadly, I missed this on our previous commit. Also update coverage from
nodepool dsvm job.

Change-Id: I6966957ac8162a588531c38bd69a93fb58a15258
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-05-29 15:31:54 -04:00
Jenkins df6051b4aa Merge "Support externally managed images" into feature/zuulv3 2017-05-18 15:34:11 +00:00
Tobias Henkel fbd45ba266 Support externally managed images
This adds support for using images which are not built and managed by
nodepool.

Change-Id: Iabfcf2e2f0d42622c0504b16e5f10ec7dfba97ca
2017-05-18 10:42:03 +02:00
Tobias Henkel ac6406679e Add max-ready-age to label config
This adds the max-ready-age setting to the label config. With this one
can specify how long nodes should live unused in READY state. This
enables the following use cases:

- When switching nodepool between a 'working-hours' and a
  'non-working-hours' configuration with high or low min-ready
  settings this can trigger a (delayed) scale down of unused
  resources. this can be important when using a cloud provider with
  ondemand billing model.

- Renewing old nodes without having to run a job on it. This can be
  useful for capping the age of the cached data inside the nodes.

Change-Id: Id705f0a5e478ab658ed3a396f92d6eb6694c1c8f
2017-05-18 10:31:20 +02:00
Monty Taylor 8037855400 Add support for specifying key-name per label
In order to support putting less things into images via puppet in Infra,
we'd like to be able to pre-populate our clouds with keypairs for the
infra-root accounts and have nova add those at boot time.

Change-Id: I9e2c990040342de722f68de09f273005f57a699f
2017-04-27 13:49:37 -07:00
James E. Blair 02d137a777 Validate flavor specification in config
Validate that at least one of min-ram or flavor-name are present.

Change-Id: I2e42b0d6176e5a15e1ceb2a77b9d557bc0704d50
2017-04-27 13:48:59 -07:00
Monty Taylor 642f14c076 Add ability to select flavor by name or id
It's possible that it's easier for a nodepool user to just specify a
name or id of a flavor in their config instead of the combo of min-ram
and name-filter.

In order to not have two name related items, and also to not have the
pure flavor-name case use a term called "name-filter" - change
name-filter to flavor-name, and introduce the semantics that if
flavor-name is given by itself, it will look for an exact match on
flavor name or id, and if it's given with min-ram it will behave as
name-filter did already.

Change-Id: I8b98314958d03818ceca5abf4e3b537c8998f248
2017-04-27 13:44:25 -07:00
Monty Taylor 6b949f8abb Enforce cloud as a required config value
We require clouds.yaml files now. It's just the way it is. If we don't
have one, os-client-config will become unpleased - but it will do so in
a hard to understand error message (that's the best we can do there for
$reasons) ... so make sure that we present a config validation error and
not "keystoneauth1.exceptions.auth_plugins.MissingRequiredOptions: Auth
plugin requires parameters which were not given: auth_url"

Change-Id: I84e36400f38eecd5d798b772c09d768002f535f5
2017-04-27 08:37:02 -07:00
Joshua Hesketh 94f33cb666 Merge branch 'master' into feature/zuulv3
The nodepool_id feature may need to be removed. I've kept it to simplify
merging both now and if we do it again later.

A couple of the tests are disabled and need reworking in a subsquent
commit.

Change-Id: I948f9f69ad911778fabb1c498aebd23acce8c89c
2017-03-30 21:46:15 +11:00
Jenkins 79bc3908ef Merge "Docs: Remove cron references" into feature/zuulv3 2017-03-28 15:41:55 +00:00
David Shrewsbury 6da49fe732 Docs: Remove cron references
Cron support is gone. Remove the doc/config file references, and
config supporting code.

Change-Id: I6587c7c3122dc1eb16f2c58520e7d76de31624f3
2017-03-27 16:43:15 -04:00
Monty Taylor 34cabe207a Remove ipv6-preferred and rely on interface_ip
shade/occ have a force-ipv4 setting which can be used to change
autodetected behavior, but also have detection for ipv6 viability.
This makes us aggressively use IPv6 and only us v4 if v6 is not
available or has been explicitly disabled. Yay us.

Incidentally, this should also help people use zuul in places that are
completely non-public - as a zuul running in a cloud with a private
network on it and spinning up nodes that only have private networks
means public_v4 won't really have anything in it - but clouds.yaml
supports a private=True setting which will cause the private ip to be
listed as the ip that is desired.

Change-Id: I2b4d992e3b21c00cefe98023267347c02dd961dc
2017-03-27 11:35:25 -07:00
James E. Blair 440c427662 Remove deprecated networks syntax
And simplify.

Change-Id: I8be53c228de9be5dc3cb39ff9d90cda6bbde9124
2017-03-27 11:35:12 -07:00
James E. Blair 8b2dd5f600 Remove api-timeout and provider.image-type
We defer to OCC for both of these.

Change-Id: Ic81972c3ccf2b05beaae6a89f22f8aee2dbc79d2
2017-03-27 11:33:50 -07:00
James E. Blair dcc3b5e071 Update nodepool config syntax
This implements the changes described in:

http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html

It also removes some, but not all, extraneous keys from test config files.

Change-Id: Iebc941b4505d6ad46c882799b6230eb23545e5c0
2017-03-27 09:34:02 -07:00