This allows operators to delete large diskimage files after uploads
are complete, in order to save space.
A setting is also provided to keep certain formats, so that if
operators would like to delete large formats such as "raw" while
retaining a qcow2 copy (which, in an emergency, could be used to
inspect the image, or manually converted and uploaded for use),
that is possible.
Change-Id: I97ca3422044174f956d6c5c3c35c2dbba9b4cadf
This adds support for staying within OpenStack volume quota limits
on instances that utilize boot-from-volume.
Change-Id: I1b7bc177581d23cecd9443a392fb058176409c46
In the AWS adapter, when getting the quota for an instance type, set
the quota for the AWS service quota code to be the number of vCPUs
rather than the number of cores. The number of vCPUs is typically
twice the number of cores. This fixes "VcpuLimitExceeded" errors from
AWS.
Change-Id: I880e6abb84b0527363893576057aa105a5a448a5
For drivers that support tagging/metadata (openstack, aws, azure),
Add or enhance support for supplying tags for uploaded diskimages.
This allows users to set metadata on the global diskimage object
which will then be used as default values for metadata on the
provider diskimage values. The resulting merged dictionary forms
the basis of metadata to be associated with the uploaded image.
The changes needed to reconcile this for the three drivers mentioned
above are:
All: the diskimages[].meta key is added to supply the default values
for provider metadata.
OpenStack: provider diskimage metadata is already supported using
providers[].diskimages[].meta, so no further changes are needed.
AWS, Azure: provider diskimage tags are added using the key
providers[].diskimages[].tags since these providers already use
the "tags" nomenclature for instances.
This results in the somewhat incongruous situation where we have
diskimage "metadata" being combined with provider "tags", but it's
either that or have images with "metadata" while we have instances
with "tags", both of which are "tags" in EC2. The chosen approach
has consistency within the driver.
Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8
This adds support for AWS quotas that are specific to instance types.
The current quota support in AWS assumes only the "standard" instance types,
but AWS has several additional types with particular specialties (high memory,
GPU, etc). This adds automatic support for those by encoding their service
quota codes (like 'L-1216C47A') into the QuotaInformation object.
QuotaInformation accepts not only cores, ram, and instances as resource
values, but now also accepts arbitraly keys such as 'L-1216C47A'.
Extra testing of QI is added to ensure we handle the arithmetic correctly
in cases where one or the other operand does not have a resource counter.
The statemachine drivers did not encode their resource information into
the ZK Node record, so tenant quota was not operating correctly. This is
now fixed.
The AWS driver now accepts max_cores, _instances, and _ram values similar
to the OpenStack driver. It additionally accepts max_resources which can
be used to specify limits for arbitrary quotas like 'L-1216C47A'.
The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A'
so that, for example, high memory nodes may be limited by tenant.
The mapping of instance types to quota is manually maintained, however,
AWS doesn't seem to add new instance types too often, and those it does are
highly specialized. If a new instance type is not handled internally, the
driver will not be able to calculate expected quota usage, but will still
operate until the new type is added to the mapping.
Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36
This lets users configure providers which should fulfill requests
before other providers. This facilitates using a less expensive
cloud before using a more expensive one.
The default priority is 100, to facilitate either raising above
or lowering below the default (while using only positive integers
in order to avoid confusion).
Change-Id: I969ea821e10a7773a0a8d135a4f13407319362ee
The default zookeeper session timout is 10 seconds which is not enough
on a highly loaded nodepool. Like in zuul make this configurable so we
can avoid session losses.
Change-Id: Id7087141174c84c6cdcbb3933c233f5fa0e7d569
This driver supplies "static" nodes that are actually backed by
another nodepool node. The use case is to be able to request a single
large node (a "backing node") from a cloud provider, and then divide
that node up into smaller nodes that are actually used ("requested
nodes"). A backing node can support one or more requested nodes, and
backing nodes should scale up or down as necessary.
Change-Id: I29d78705a87a53ee07dce6022b81a1ce97c54f1d
This change adds the option to put quota on resources on a per-tenant
basis (i.e. Zuul tenants).
It adds a new top-level config structure ``tenant-resource-limits``
under which one can specify a number of tenants, each with
``max-servers``, ``max-cores``, and ``max-ram`` limits. These limits
are valid globally, i.e., for all providers. This is contrary to
currently existing provider and pool quotas, which only are consindered
for nodes of the same provider.
Change-Id: I0c0154db7d5edaa91a9fe21ebf6936e14cef4db7
Ansible needs to know which shell type the node uses to operate
correctly, especially for ssh connections for windows nodes because
otherwise ansible defaults to trying bash.
Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538
This adds support to specify node selectors on Pod node labels.
They are used by the k8s scheduler to place a Pod on specific nodes with
corresponding labels.
This allows to place a build node/Pod on k8s nodes with certain
capabilities (e.g. storage types, number of CPU cores, etc.)
Change-Id: Ic00a84181c8ef66189e4259ef6434dc62b81c3c6
When removing a label from a provider that previously required raw
images (while still keeping the diskimage config), the image was
automatically rebuilt in qcow2 format.
It seems the original intent [0] of having the diskimage formats was to
allow building diskimages without needing a provider.
Because manually triggering a diskimage build without a format lead to a
failure, the qcow2 default was added [1] and later fixed [2] to only
provide a default when the diskimage wasn't used by any provider.
By removing the qcow2 default and preventing builds without a format, we
retain the ability to allow diskimage only builds when a format is
given. Otherwise we don't assume a default image format and prevent
builds with no image format.
[0] https://review.opendev.org/#/c/412160/
[1] https://review.opendev.org/#/c/566437/
[2] https://review.opendev.org/#/c/572836/
Change-Id: I374f40b5f9cfcd55e7a4f567fd6480c940f2bc20
For the OpenShift and Kubernetes drivers, allow passing env vars to the
Pod nodes via their label config.
It is not possible to set persistent env vars in containers on run time
because there is no login shell available. Thus, we need to pass in any
env vars during node launch. This allows to set, e.g., ``http_proxy``
variables.
The env vars are passed as a list of dicts with ``name`` and ``value``
fields as per the k8s Pod YAML schema. [1]
```
- name: pod-fedora
type: pod
image: docker.io/fedora:28
env:
- name: foo
value: bar
```
[1] https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/
Change-Id: Ibbd9222fcd8f7dc5be227e7f5c8d8772a4c594e2
In the OpenShift and OpenShiftPods drivers, it is possible to configure
resource requests and limits for the container per label attributes.
This feature was missing in the Kubernetes driver, thus this change
introduces it analogously to the OpenShift driver.
Change-Id: I7e67aebf892d10939672bdf76b8b3eb543124f9a
This change adds an Azure driver.
Supports:
* Public IPv4 address per VM
* Private IPv6 address per VM (optional, and not useful yet)
* Standard Flavors
* Resource Tagging (for billing / cleanup)
Change-Id: Ief0f8574832df69db472d8704ea3710bc6ca5c59
Co-authored-by: Tristan Cacqueray <tdecacqu@redhat.com>
Co-authored-by: Tobias Henkel <tobias.henkel@bmw.de>
Signed-off-by: Graham Hayes <gr@ham.ie>
This allows setting build-log-retention to -1 to disable automatic
collection of logs. This would facilitate managing these logs with an
external tool like logrotate. Another case is where you have the
builds failing very quickly -- say, one of the builds has destroyed
the container and so builds fail to even exec dib correctly. In this
case it's difficult to get to the root-cause of the problem because
the first build's logs (the one that destroyed the container) have
been repead just seconds after the failure.
Change-Id: I259c78e6a0e30b4c0a8d2f4c12a6941a2d227c38
This found problems with the openshift and openshiftpods drivers,
so that is fixed. Also update the docs to reflect the fact that
node-attributes is supported across all drivers.
Note that we do not appear to have GCE driver tests, so that one
is just assumed to work. :(
Change-Id: I98b6f871815d2b564d1550d960e682c180bac7c2
Because the static driver doesn't go through the common driver code
launch process (its nodes are pre-launched), it is in charge of setting
the node attributes itself. It wasn't setting the node-attributes attribute.
Change-Id: I865c3b15711f8c5559964859db92cb4499b901ae
While YAML does have inbuilt support for anchors to greatly reduce
duplicated sections, anchors have no support for merging values. For
diskimages, this can result in a lot of duplicated values for each
image which you can not otherwise avoid.
This provides two new values for diskimages; a "parent" and
"abstract".
Specifying a parent means you inherit all the configuration values
from that image. Anything specified within the child image overwrites
the parent values as you would expect; caveats, as described in the
documentation, are that the elements field appends and the env-vars
field has update() semantics.
An "abstract" diskimage is not instantiated into a real image, it is
only used for configuration inheritance. This way you can make a
abstrat "base" image with common values and inherit that everywhere
without having to worry about bringing in values you don't want.
You can also chain parents together and the inheritance flows through.
Documentation is updated, and several tests are added to ensure the
correct parenting, merging and override behaviour of the new values.
Change-Id: I170016ef7d8443b9830912b9b0667370e6afcde7
At the moment nodepools aws driver uses the label to set the instance
name in aws and fails to launch the instance if "Name" is supplied
as a tag.
This makes it possible to supply name as a tag.
Change-Id: I9585db8fe4b4ad6f5b588fb67a7201296c2fc954
We were ignoring the volume-type and volume-size parameters for
GCE; correct that.
Also add a release note. We forgot to do that. We may as well
attach it to the next version since it's a new feature, and only
with this change does it actually work as documented.
Change-Id: I6cad4fa7a661997771f9c7ccf622a5f9828bd750
When running nodepool against private cloud rooms it can be desirable
that the nodes don't get a public ip address. Let the user specify
this on pool level.
Change-Id: I3d636517837fd8a6593c12e4309372da5c062b06
This adds an option to the GCE driver to tell nodepool to use the
private ip address even when an external one is provided.
Also add a missing schema entry for rate-limit.
Change-Id: Ib15bdc76fe500dc0fe6bb98f870514e9e157c1a5
There are several scenarios where it can be useful hook into nodepool
after an image got uploaded but before it is taken into use by the
launchers. One use case is to be able to run validations on the image
(e.g. image size, boot test, etc.) before nodepool tries to use that
image and causing potentially node_failures. Another more advanced use
case is to be able to pre-distribute an image to all compute nodes in
a cloud before an image is used at scale.
To facilitate these use cases this adds a new config option
post-upload-hook to the provider config. This takes a path to a user
defined executable script which then can perform various tasks. If the
process fails with an rc != 0 the image gets deleted again and the
upload fails.
Change-Id: I099cf1243b1bd262b8ee96ab323dbd34c7578c10
The "python-path" configuration option makes its way through to Zuul
where it sets the "ansible_interpreter_path" in the inventory.
Currently this defaults to "/usr/bin/python2" which is wrong for
Python 3-only distributions.
Ansible >=2.8 provides for automated discovery of the interpreter to
avoid runtime errors choosing an invalid interpreter [1]. Using this
should mean that "python-path" doesn't need to be explicitly for any
common case. As more distributions become Python 3 only, this should
"do the right thing" without further configuration.
This switches the default python-path to "auto". The dependent change
updates Zuul to accept this and use it when running with Ansible
>=2.8, or default back to "/usr/bin/python2" for earlier Ansible
versions.
Testing and documentation is updated, and a release note added.
[1] https://docs.ansible.com/ansible/2.8/reference_appendices/interpreter_discovery.html
Depends-On: https://review.opendev.org/682275
Change-Id: I02a1a618c8806b150049e91b644ec3c0cb826ba4
* This is a follow-up to https://review.opendev.org/687024
* An earlier version of the patch had a different field name, this
clears up the confusing term.
Change-Id: I213746f9af4ead0b4b5a25e4d67ec1bcb7b2a785