Commit Graph

243 Commits

Author SHA1 Message Date
James E. Blair fd454706ca Add delete-after-upload option
This allows operators to delete large diskimage files after uploads
are complete, in order to save space.

A setting is also provided to keep certain formats, so that if
operators would like to delete large formats such as "raw" while
retaining a qcow2 copy (which, in an emergency, could be used to
inspect the image, or manually converted and uploaded for use),
that is possible.

Change-Id: I97ca3422044174f956d6c5c3c35c2dbba9b4cadf
2024-03-09 06:51:56 -08:00
James E. Blair de02ac5a20 Add OpenStack volume quota
This adds support for staying within OpenStack volume quota limits
on instances that utilize boot-from-volume.

Change-Id: I1b7bc177581d23cecd9443a392fb058176409c46
2023-02-13 06:56:03 -08:00
Christian von Schultz a828513ae8 Fix AWS quota limits for vCPUs
In the AWS adapter, when getting the quota for an instance type, set
the quota for the AWS service quota code to be the number of vCPUs
rather than the number of cores. The number of vCPUs is typically
twice the number of cores. This fixes "VcpuLimitExceeded" errors from
AWS.

Change-Id: I880e6abb84b0527363893576057aa105a5a448a5
2022-12-14 14:13:47 +01:00
James E. Blair 916d62a374 Allow specifying diskimage metadata/tags
For drivers that support tagging/metadata (openstack, aws, azure),
Add or enhance support for supplying tags for uploaded diskimages.

This allows users to set metadata on the global diskimage object
which will then be used as default values for metadata on the
provider diskimage values.  The resulting merged dictionary forms
the basis of metadata to be associated with the uploaded image.

The changes needed to reconcile this for the three drivers mentioned
above are:

All: the diskimages[].meta key is added to supply the default values
for provider metadata.

OpenStack: provider diskimage metadata is already supported using
providers[].diskimages[].meta, so no further changes are needed.

AWS, Azure: provider diskimage tags are added using the key
providers[].diskimages[].tags since these providers already use
the "tags" nomenclature for instances.

This results in the somewhat incongruous situation where we have
diskimage "metadata" being combined with provider "tags", but it's
either that or have images with "metadata" while we have instances
with "tags", both of which are "tags" in EC2.  The chosen approach
has consistency within the driver.

Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8
2022-08-23 06:39:08 -07:00
Zuul 123a32f922 Merge "AWS multi quota support" 2022-07-29 17:01:09 +00:00
James E. Blair 74c95832b2 Clarify disjoint builders in docs
There's a nuance to dealing with diskimages on disjoint builders;
clarify that.

Change-Id: I354877a655b7673c3fbb76177378b931ea283d8d
2022-07-28 10:31:03 -07:00
James E. Blair 207d8ac63c AWS multi quota support
This adds support for AWS quotas that are specific to instance types.

The current quota support in AWS assumes only the "standard" instance types,
but AWS has several additional types with particular specialties (high memory,
GPU, etc).  This adds automatic support for those by encoding their service
quota codes (like 'L-1216C47A') into the QuotaInformation object.

QuotaInformation accepts not only cores, ram, and instances as resource
values, but now also accepts arbitraly keys such as 'L-1216C47A'.
Extra testing of QI is added to ensure we handle the arithmetic correctly
in cases where one or the other operand does not have a resource counter.

The statemachine drivers did not encode their resource information into
the ZK Node record, so tenant quota was not operating correctly.  This is
now fixed.

The AWS driver now accepts max_cores, _instances, and _ram values similar
to the OpenStack driver.  It additionally accepts max_resources which can
be used to specify limits for arbitrary quotas like 'L-1216C47A'.

The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A'
so that, for example, high memory nodes may be limited by tenant.

The mapping of instance types to quota is manually maintained, however,
AWS doesn't seem to add new instance types too often, and those it does are
highly specialized.  If a new instance type is not handled internally, the
driver will not be able to calculate expected quota usage, but will still
operate until the new type is added to the mapping.

Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36
2022-07-25 14:41:07 -07:00
James E. Blair ea35fd5152 Add provider/pool priority support
This lets users configure providers which should fulfill requests
before other providers.  This facilitates using a less expensive
cloud before using a more expensive one.

The default priority is 100, to facilitate either raising above
or lowering below the default (while using only positive integers
in order to avoid confusion).

Change-Id: I969ea821e10a7773a0a8d135a4f13407319362ee
2022-05-23 13:28:21 -07:00
Zuul fc2e592d0d Merge "Add zookeeper-timeout connection config" 2022-03-24 15:23:02 +00:00
James E. Blair 50bc4cea49 Add IBM Cloud VPC driver
This is a driver for the IBM Cloud VPC service, which has a
new and distinct API.

Change-Id: I7de7297138f5f50380840e4eef43600f9a761181
2022-03-15 06:49:57 -07:00
Tobias Henkel ec55126f6b
Add zookeeper-timeout connection config
The default zookeeper session timout is 10 seconds which is not enough
on a highly loaded nodepool. Like in zuul make this configurable so we
can avoid session losses.

Change-Id: Id7087141174c84c6cdcbb3933c233f5fa0e7d569
2022-02-23 23:01:11 +01:00
James E. Blair 5862bef141 Add metastatic driver
This driver supplies "static" nodes that are actually backed by
another nodepool node.  The use case is to be able to request a single
large node (a "backing node") from a cloud provider, and then divide
that node up into smaller nodes that are actually used ("requested
nodes").  A backing node can support one or more requested nodes, and
backing nodes should scale up or down as necessary.

Change-Id: I29d78705a87a53ee07dce6022b81a1ce97c54f1d
2021-12-09 11:08:48 -08:00
Benjamin Schanzel ee90100852 Add Tenant-Scoped Resource Quota
This change adds the option to put quota on resources on a per-tenant
basis (i.e. Zuul tenants).

It adds a new top-level config structure ``tenant-resource-limits``
under which one can specify a number of tenants, each with
``max-servers``, ``max-cores``, and ``max-ram`` limits.  These limits
are valid globally, i.e., for all providers. This is contrary to
currently existing provider and pool quotas, which only are consindered
for nodes of the same provider.

Change-Id: I0c0154db7d5edaa91a9fe21ebf6936e14cef4db7
2021-09-01 09:07:43 +02:00
Albin Vass 0c84b7fa4e Add shell-type config
Ansible needs to know which shell type the node uses to operate
correctly, especially for ssh connections for windows nodes because
otherwise ansible defaults to trying bash.

Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538
2021-03-05 15:14:29 +01:00
Albin Vass 7665407799 Reorganize drivers into separate documents
Change-Id: I4274d8d87058a2a5c91da3e994a32d61b2f2aafe
2020-11-11 08:49:16 +00:00
Benjamin Schanzel 19be1a2e26 OpenShift/k8s Provider: Basic Support for k8s nodeSelectors
This adds support to specify node selectors on Pod node labels.
They are used by the k8s scheduler to place a Pod on specific nodes with
corresponding labels.
This allows to place a build node/Pod on k8s nodes with certain
capabilities (e.g. storage types, number of CPU cores, etc.)

Change-Id: Ic00a84181c8ef66189e4259ef6434dc62b81c3c6
2020-08-14 16:39:04 +02:00
Zuul b0fa778ded Merge "OpenShift/k8s Provider: Allow passing env vars to Pods" 2020-07-30 20:17:47 +00:00
Simon Westphahl 2ec2661655 Remove default qcow2 format in diskimage config
When removing a label from a provider that previously required raw
images (while still keeping the diskimage config), the image was
automatically rebuilt in qcow2 format.

It seems the original intent [0] of having the diskimage formats was to
allow building diskimages without needing a provider.

Because manually triggering a diskimage build without a format lead to a
failure, the qcow2 default was added [1] and later fixed [2] to only
provide a default when the diskimage wasn't used by any provider.

By removing the qcow2 default and preventing builds without a format, we
retain the ability to allow diskimage only builds when a format is
given. Otherwise we don't assume a default image format and prevent
builds with no image format.

[0] https://review.opendev.org/#/c/412160/
[1] https://review.opendev.org/#/c/566437/
[2] https://review.opendev.org/#/c/572836/

Change-Id: I374f40b5f9cfcd55e7a4f567fd6480c940f2bc20
2020-07-15 14:31:07 +02:00
Benjamin Schanzel b76a0f458e OpenShift/k8s Provider: Allow passing env vars to Pods
For the OpenShift and Kubernetes drivers, allow passing env vars to the
Pod nodes via their label config.
It is not possible to set persistent env vars in containers on run time
because there is no login shell available. Thus, we need to pass in any
env vars during node launch. This allows to set, e.g., ``http_proxy``
variables.

The env vars are passed as a list of dicts with ``name`` and ``value``
fields as per the k8s Pod YAML schema. [1]

```
- name: pod-fedora
  type: pod
  image: docker.io/fedora:28
  env:
  - name: foo
    value: bar
```

[1] https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/

Change-Id: Ibbd9222fcd8f7dc5be227e7f5c8d8772a4c594e2
2020-07-13 17:11:01 +02:00
Zuul 369799dea6 Merge "aws: add support for attaching instance profiles" 2020-07-01 23:43:47 +00:00
Zuul 64e7d82268 Merge "doc: openshiftpods handles python-path too" 2020-06-29 18:25:35 +00:00
Benjamin Schanzel baf5407adc Kubernetes Driver: Allow cpu/mem resource limits
In the OpenShift and OpenShiftPods drivers, it is possible to configure
resource requests and limits for the container per label attributes.
This feature was missing in the Kubernetes driver, thus this change
introduces it analogously to the OpenShift driver.

Change-Id: I7e67aebf892d10939672bdf76b8b3eb543124f9a
2020-06-19 15:00:25 +02:00
Graham Hayes c1a914fa4a Implement an Azure driver
This change adds an Azure driver.

Supports:
    * Public IPv4 address per VM
    * Private IPv6 address per VM (optional, and not useful yet)
    * Standard Flavors
    * Resource Tagging (for billing / cleanup)

Change-Id: Ief0f8574832df69db472d8704ea3710bc6ca5c59
Co-authored-by: Tristan Cacqueray <tdecacqu@redhat.com>
Co-authored-by: Tobias Henkel <tobias.henkel@bmw.de>
Signed-off-by: Graham Hayes <gr@ham.ie>
2020-06-15 19:57:11 +01:00
Albin Vass 2d59dc461c aws: add support for attaching instance profiles
Change-Id: Ie338f5f9c8f88c7e5584bce02c9b0d081f068da7
2020-06-12 12:22:50 +02:00
Pierre-Louis Bonicoli ae85030108
doc: openshiftpods handles python-path too
Change-Id: I0893032f05c561aa428b3b77a14b55eadbe6c5f1
2020-05-28 01:38:18 +02:00
Ian Wienand b9f6f6bf62 Allow disabling build-log-retention
This allows setting build-log-retention to -1 to disable automatic
collection of logs.  This would facilitate managing these logs with an
external tool like logrotate.  Another case is where you have the
builds failing very quickly -- say, one of the builds has destroyed
the container and so builds fail to even exec dib correctly.  In this
case it's difficult to get to the root-cause of the problem because
the first build's logs (the one that destroyed the container) have
been repead just seconds after the failure.

Change-Id: I259c78e6a0e30b4c0a8d2f4c12a6941a2d227c38
2020-04-29 13:07:07 +10:00
Zuul 775cd32028 Merge "Add ZooKeeper TLS support" 2020-04-15 01:41:47 +00:00
James E. Blair b62fa3313d Add ZooKeeper TLS support
Change-Id: I009d9f90b32881aaef2d0694da6ff28074f48f8e
2020-04-14 16:03:53 -07:00
David Shrewsbury 8528322cf0 Update tests for node-attributes
This found problems with the openshift and openshiftpods drivers,
so that is fixed. Also update the docs to reflect the fact that
node-attributes is supported across all drivers.

Note that we do not appear to have GCE driver tests, so that one
is just assumed to work.  :(

Change-Id: I98b6f871815d2b564d1550d960e682c180bac7c2
2020-04-02 12:39:56 -07:00
Zuul 24db91f96b Merge "Support node-attributes in static driver" 2020-04-02 19:21:43 +00:00
Zuul 169b69accb Merge "Add parent and abstract flags for diskimages" 2020-03-29 22:08:33 +00:00
David Shrewsbury e389ae2af0 Support node-attributes in static driver
Because the static driver doesn't go through the common driver code
launch process (its nodes are pre-launched), it is in charge of setting
the node attributes itself. It wasn't setting the node-attributes attribute.

Change-Id: I865c3b15711f8c5559964859db92cb4499b901ae
2020-03-24 11:07:29 -04:00
Ian Wienand b5b20b6e2c Add parent and abstract flags for diskimages
While YAML does have inbuilt support for anchors to greatly reduce
duplicated sections, anchors have no support for merging values.  For
diskimages, this can result in a lot of duplicated values for each
image which you can not otherwise avoid.

This provides two new values for diskimages; a "parent" and
"abstract".

Specifying a parent means you inherit all the configuration values
from that image.  Anything specified within the child image overwrites
the parent values as you would expect; caveats, as described in the
documentation, are that the elements field appends and the env-vars
field has update() semantics.

An "abstract" diskimage is not instantiated into a real image, it is
only used for configuration inheritance.  This way you can make a
abstrat "base" image with common values and inherit that everywhere
without having to worry about bringing in values you don't want.

You can also chain parents together and the inheritance flows through.

Documentation is updated, and several tests are added to ensure the
correct parenting, merging and override behaviour of the new values.

Change-Id: I170016ef7d8443b9830912b9b0667370e6afcde7
2020-03-20 07:53:08 +11:00
Albin Vass 2ce664ec14 Enable setting label and instance name separately
At the moment nodepools aws driver uses the label to set the instance
name in aws and fails to launch the instance if "Name" is supplied
as a tag.
This makes it possible to supply name as a tag.

Change-Id: I9585db8fe4b4ad6f5b588fb67a7201296c2fc954
2020-03-12 17:15:32 +01:00
James E. Blair 5d37a0a6e1 Fix GCE volume parameters
We were ignoring the volume-type and volume-size parameters for
GCE; correct that.

Also add a release note.  We forgot to do that.  We may as well
attach it to the next version since it's a new feature, and only
with this change does it actually work as documented.

Change-Id: I6cad4fa7a661997771f9c7ccf622a5f9828bd750
2020-02-27 09:49:35 -08:00
Andy Ladjadj 5bae6272f4 add ebs-optimized support for aws provider
Change-Id: I1f6330a71b85f23e6fbe3abd636764e5f3b8a61d
2020-02-04 18:59:24 +01:00
Clément Mondion 49482e157c add tags support for aws provider
Change-Id: Ib871bfda41192a74ee02b0b3d2e422fde21f2801
2020-01-23 10:32:08 +01:00
Tobias Henkel 52f7d4fb62
Make public ip configurable in aws
When running nodepool against private cloud rooms it can be desirable
that the nodes don't get a public ip address. Let the user specify
this on pool level.

Change-Id: I3d636517837fd8a6593c12e4309372da5c062b06
2019-12-21 13:47:08 +01:00
Tobias Henkel 761a9ee00e
Support userdata for instances in aws
In some cases we need to be able to launch instances with custom
userdata also in aws.

Change-Id: I0891961f16bb3bd728622d3413bd185978d79324
2019-12-21 13:35:00 +01:00
James E. Blair f343dbb05a GCE: add use-internal-ip option
This adds an option to the GCE driver to tell nodepool to use the
private ip address even when an external one is provided.

Also add a missing schema entry for rate-limit.

Change-Id: Ib15bdc76fe500dc0fe6bb98f870514e9e157c1a5
2019-12-13 14:46:41 -08:00
James E. Blair 13104ab0ff Add Google Cloud provider
Also add a TaskManager and a SimpleTaskManagerDriver.

Change-Id: I5c44b24600838ae9afcc6a39c482c67933548bc0
2019-12-12 14:33:43 -08:00
Zuul c790ec4721 Merge "Aws cloud-image is referred to from pool labels section" 2019-12-10 14:58:19 +00:00
Albin Vass b829726909 Aws cloud-image is referred to from pool labels section
Change-Id: I50596aed6da3bec6e2bf8049b277aa91e9e685c3
2019-12-09 12:51:53 +01:00
Zuul e391572495 Merge "Documentation fixes" 2019-12-06 23:57:22 +00:00
Albin Vass bb6475177e Documentation fixes
Change-Id: I23d677d5522aec94d3723a71f98f12e58355eeba
2019-12-06 12:54:22 +01:00
Tobias Henkel 0dc40d33e4
Support optional post upload hooks
There are several scenarios where it can be useful hook into nodepool
after an image got uploaded but before it is taken into use by the
launchers. One use case is to be able to run validations on the image
(e.g. image size, boot test, etc.) before nodepool tries to use that
image and causing potentially node_failures. Another more advanced use
case is to be able to pre-distribute an image to all compute nodes in
a cloud before an image is used at scale.

To facilitate these use cases this adds a new config option
post-upload-hook to the provider config. This takes a path to a user
defined executable script which then can perform various tasks. If the
process fails with an rc != 0 the image gets deleted again and the
upload fails.

Change-Id: I099cf1243b1bd262b8ee96ab323dbd34c7578c10
2019-11-25 13:37:28 +01:00
Zuul 915be0a5be Merge "AWS driver: add ability to determine AMI id using filters" 2019-10-24 18:45:20 +00:00
Zuul b72a9195e1 Merge "Set default python-path to "auto"" 2019-10-17 05:26:10 +00:00
Ian Wienand db87a0845f Set default python-path to "auto"
The "python-path" configuration option makes its way through to Zuul
where it sets the "ansible_interpreter_path" in the inventory.
Currently this defaults to "/usr/bin/python2" which is wrong for
Python 3-only distributions.

Ansible >=2.8 provides for automated discovery of the interpreter to
avoid runtime errors choosing an invalid interpreter [1].  Using this
should mean that "python-path" doesn't need to be explicitly for any
common case.  As more distributions become Python 3 only, this should
"do the right thing" without further configuration.

This switches the default python-path to "auto".  The dependent change
updates Zuul to accept this and use it when running with Ansible
>=2.8, or default back to "/usr/bin/python2" for earlier Ansible
versions.

Testing and documentation is updated, and a release note added.

[1] https://docs.ansible.com/ansible/2.8/reference_appendices/interpreter_discovery.html

Depends-On: https://review.opendev.org/682275
Change-Id: I02a1a618c8806b150049e91b644ec3c0cb826ba4
2019-10-17 09:17:50 +11:00
Jan Gutter c733541633 Fix typo in port-cleanup-interval description
* This is a follow-up to https://review.opendev.org/687024
* An earlier version of the patch had a different field name, this
  clears up the confusing term.

Change-Id: I213746f9af4ead0b4b5a25e4d67ec1bcb7b2a785
2019-10-14 18:03:32 +02:00