nodepool

Commit Graph

Author	SHA1	Message	Date
James E. Blair	fd454706ca	Add delete-after-upload option This allows operators to delete large diskimage files after uploads are complete, in order to save space. A setting is also provided to keep certain formats, so that if operators would like to delete large formats such as "raw" while retaining a qcow2 copy (which, in an emergency, could be used to inspect the image, or manually converted and uploaded for use), that is possible. Change-Id: I97ca3422044174f956d6c5c3c35c2dbba9b4cadf	2024-03-09 06:51:56 -08:00
James E. Blair	de02ac5a20	Add OpenStack volume quota This adds support for staying within OpenStack volume quota limits on instances that utilize boot-from-volume. Change-Id: I1b7bc177581d23cecd9443a392fb058176409c46	2023-02-13 06:56:03 -08:00
Christian von Schultz	a828513ae8	Fix AWS quota limits for vCPUs In the AWS adapter, when getting the quota for an instance type, set the quota for the AWS service quota code to be the number of vCPUs rather than the number of cores. The number of vCPUs is typically twice the number of cores. This fixes "VcpuLimitExceeded" errors from AWS. Change-Id: I880e6abb84b0527363893576057aa105a5a448a5	2022-12-14 14:13:47 +01:00
James E. Blair	916d62a374	Allow specifying diskimage metadata/tags For drivers that support tagging/metadata (openstack, aws, azure), Add or enhance support for supplying tags for uploaded diskimages. This allows users to set metadata on the global diskimage object which will then be used as default values for metadata on the provider diskimage values. The resulting merged dictionary forms the basis of metadata to be associated with the uploaded image. The changes needed to reconcile this for the three drivers mentioned above are: All: the diskimages[].meta key is added to supply the default values for provider metadata. OpenStack: provider diskimage metadata is already supported using providers[].diskimages[].meta, so no further changes are needed. AWS, Azure: provider diskimage tags are added using the key providers[].diskimages[].tags since these providers already use the "tags" nomenclature for instances. This results in the somewhat incongruous situation where we have diskimage "metadata" being combined with provider "tags", but it's either that or have images with "metadata" while we have instances with "tags", both of which are "tags" in EC2. The chosen approach has consistency within the driver. Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8	2022-08-23 06:39:08 -07:00
Zuul	123a32f922	Merge "AWS multi quota support"	2022-07-29 17:01:09 +00:00
James E. Blair	74c95832b2	Clarify disjoint builders in docs There's a nuance to dealing with diskimages on disjoint builders; clarify that. Change-Id: I354877a655b7673c3fbb76177378b931ea283d8d	2022-07-28 10:31:03 -07:00
James E. Blair	207d8ac63c	AWS multi quota support This adds support for AWS quotas that are specific to instance types. The current quota support in AWS assumes only the "standard" instance types, but AWS has several additional types with particular specialties (high memory, GPU, etc). This adds automatic support for those by encoding their service quota codes (like 'L-1216C47A') into the QuotaInformation object. QuotaInformation accepts not only cores, ram, and instances as resource values, but now also accepts arbitraly keys such as 'L-1216C47A'. Extra testing of QI is added to ensure we handle the arithmetic correctly in cases where one or the other operand does not have a resource counter. The statemachine drivers did not encode their resource information into the ZK Node record, so tenant quota was not operating correctly. This is now fixed. The AWS driver now accepts max_cores, _instances, and _ram values similar to the OpenStack driver. It additionally accepts max_resources which can be used to specify limits for arbitrary quotas like 'L-1216C47A'. The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A' so that, for example, high memory nodes may be limited by tenant. The mapping of instance types to quota is manually maintained, however, AWS doesn't seem to add new instance types too often, and those it does are highly specialized. If a new instance type is not handled internally, the driver will not be able to calculate expected quota usage, but will still operate until the new type is added to the mapping. Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36	2022-07-25 14:41:07 -07:00
James E. Blair	ea35fd5152	Add provider/pool priority support This lets users configure providers which should fulfill requests before other providers. This facilitates using a less expensive cloud before using a more expensive one. The default priority is 100, to facilitate either raising above or lowering below the default (while using only positive integers in order to avoid confusion). Change-Id: I969ea821e10a7773a0a8d135a4f13407319362ee	2022-05-23 13:28:21 -07:00
Zuul	fc2e592d0d	Merge "Add zookeeper-timeout connection config"	2022-03-24 15:23:02 +00:00
James E. Blair	50bc4cea49	Add IBM Cloud VPC driver This is a driver for the IBM Cloud VPC service, which has a new and distinct API. Change-Id: I7de7297138f5f50380840e4eef43600f9a761181	2022-03-15 06:49:57 -07:00
Tobias Henkel	ec55126f6b	Add zookeeper-timeout connection config The default zookeeper session timout is 10 seconds which is not enough on a highly loaded nodepool. Like in zuul make this configurable so we can avoid session losses. Change-Id: Id7087141174c84c6cdcbb3933c233f5fa0e7d569	2022-02-23 23:01:11 +01:00
James E. Blair	5862bef141	Add metastatic driver This driver supplies "static" nodes that are actually backed by another nodepool node. The use case is to be able to request a single large node (a "backing node") from a cloud provider, and then divide that node up into smaller nodes that are actually used ("requested nodes"). A backing node can support one or more requested nodes, and backing nodes should scale up or down as necessary. Change-Id: I29d78705a87a53ee07dce6022b81a1ce97c54f1d	2021-12-09 11:08:48 -08:00
Benjamin Schanzel	ee90100852	Add Tenant-Scoped Resource Quota This change adds the option to put quota on resources on a per-tenant basis (i.e. Zuul tenants). It adds a new top-level config structure ``tenant-resource-limits`` under which one can specify a number of tenants, each with ``max-servers``, ``max-cores``, and ``max-ram`` limits. These limits are valid globally, i.e., for all providers. This is contrary to currently existing provider and pool quotas, which only are consindered for nodes of the same provider. Change-Id: I0c0154db7d5edaa91a9fe21ebf6936e14cef4db7	2021-09-01 09:07:43 +02:00
Albin Vass	0c84b7fa4e	Add shell-type config Ansible needs to know which shell type the node uses to operate correctly, especially for ssh connections for windows nodes because otherwise ansible defaults to trying bash. Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538	2021-03-05 15:14:29 +01:00
Albin Vass	7665407799	Reorganize drivers into separate documents Change-Id: I4274d8d87058a2a5c91da3e994a32d61b2f2aafe	2020-11-11 08:49:16 +00:00
Benjamin Schanzel	19be1a2e26	OpenShift/k8s Provider: Basic Support for k8s nodeSelectors This adds support to specify node selectors on Pod node labels. They are used by the k8s scheduler to place a Pod on specific nodes with corresponding labels. This allows to place a build node/Pod on k8s nodes with certain capabilities (e.g. storage types, number of CPU cores, etc.) Change-Id: Ic00a84181c8ef66189e4259ef6434dc62b81c3c6	2020-08-14 16:39:04 +02:00
Zuul	b0fa778ded	Merge "OpenShift/k8s Provider: Allow passing env vars to Pods"	2020-07-30 20:17:47 +00:00
Simon Westphahl	2ec2661655	Remove default qcow2 format in diskimage config When removing a label from a provider that previously required raw images (while still keeping the diskimage config), the image was automatically rebuilt in qcow2 format. It seems the original intent [0] of having the diskimage formats was to allow building diskimages without needing a provider. Because manually triggering a diskimage build without a format lead to a failure, the qcow2 default was added [1] and later fixed [2] to only provide a default when the diskimage wasn't used by any provider. By removing the qcow2 default and preventing builds without a format, we retain the ability to allow diskimage only builds when a format is given. Otherwise we don't assume a default image format and prevent builds with no image format. [0] https://review.opendev.org/#/c/412160/ [1] https://review.opendev.org/#/c/566437/ [2] https://review.opendev.org/#/c/572836/ Change-Id: I374f40b5f9cfcd55e7a4f567fd6480c940f2bc20	2020-07-15 14:31:07 +02:00
Benjamin Schanzel	b76a0f458e	OpenShift/k8s Provider: Allow passing env vars to Pods For the OpenShift and Kubernetes drivers, allow passing env vars to the Pod nodes via their label config. It is not possible to set persistent env vars in containers on run time because there is no login shell available. Thus, we need to pass in any env vars during node launch. This allows to set, e.g., ``http_proxy`` variables. The env vars are passed as a list of dicts with ``name`` and ``value`` fields as per the k8s Pod YAML schema. [1] ``` - name: pod-fedora type: pod image: docker.io/fedora:28 env: - name: foo value: bar ``` [1] https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/ Change-Id: Ibbd9222fcd8f7dc5be227e7f5c8d8772a4c594e2	2020-07-13 17:11:01 +02:00
Zuul	369799dea6	Merge "aws: add support for attaching instance profiles"	2020-07-01 23:43:47 +00:00
Zuul	64e7d82268	Merge "doc: openshiftpods handles python-path too"	2020-06-29 18:25:35 +00:00
Benjamin Schanzel	baf5407adc	Kubernetes Driver: Allow cpu/mem resource limits In the OpenShift and OpenShiftPods drivers, it is possible to configure resource requests and limits for the container per label attributes. This feature was missing in the Kubernetes driver, thus this change introduces it analogously to the OpenShift driver. Change-Id: I7e67aebf892d10939672bdf76b8b3eb543124f9a	2020-06-19 15:00:25 +02:00
Graham Hayes	c1a914fa4a	Implement an Azure driver This change adds an Azure driver. Supports: * Public IPv4 address per VM * Private IPv6 address per VM (optional, and not useful yet) * Standard Flavors * Resource Tagging (for billing / cleanup) Change-Id: Ief0f8574832df69db472d8704ea3710bc6ca5c59 Co-authored-by: Tristan Cacqueray <tdecacqu@redhat.com> Co-authored-by: Tobias Henkel <tobias.henkel@bmw.de> Signed-off-by: Graham Hayes <gr@ham.ie>	2020-06-15 19:57:11 +01:00
Albin Vass	2d59dc461c	aws: add support for attaching instance profiles Change-Id: Ie338f5f9c8f88c7e5584bce02c9b0d081f068da7	2020-06-12 12:22:50 +02:00
Pierre-Louis Bonicoli	ae85030108	doc: openshiftpods handles python-path too Change-Id: I0893032f05c561aa428b3b77a14b55eadbe6c5f1	2020-05-28 01:38:18 +02:00
Ian Wienand	b9f6f6bf62	Allow disabling build-log-retention This allows setting build-log-retention to -1 to disable automatic collection of logs. This would facilitate managing these logs with an external tool like logrotate. Another case is where you have the builds failing very quickly -- say, one of the builds has destroyed the container and so builds fail to even exec dib correctly. In this case it's difficult to get to the root-cause of the problem because the first build's logs (the one that destroyed the container) have been repead just seconds after the failure. Change-Id: I259c78e6a0e30b4c0a8d2f4c12a6941a2d227c38	2020-04-29 13:07:07 +10:00
Zuul	775cd32028	Merge "Add ZooKeeper TLS support"	2020-04-15 01:41:47 +00:00
James E. Blair	b62fa3313d	Add ZooKeeper TLS support Change-Id: I009d9f90b32881aaef2d0694da6ff28074f48f8e	2020-04-14 16:03:53 -07:00
David Shrewsbury	8528322cf0	Update tests for node-attributes This found problems with the openshift and openshiftpods drivers, so that is fixed. Also update the docs to reflect the fact that node-attributes is supported across all drivers. Note that we do not appear to have GCE driver tests, so that one is just assumed to work. :( Change-Id: I98b6f871815d2b564d1550d960e682c180bac7c2	2020-04-02 12:39:56 -07:00
Zuul	24db91f96b	Merge "Support node-attributes in static driver"	2020-04-02 19:21:43 +00:00
Zuul	169b69accb	Merge "Add parent and abstract flags for diskimages"	2020-03-29 22:08:33 +00:00
David Shrewsbury	e389ae2af0	Support node-attributes in static driver Because the static driver doesn't go through the common driver code launch process (its nodes are pre-launched), it is in charge of setting the node attributes itself. It wasn't setting the node-attributes attribute. Change-Id: I865c3b15711f8c5559964859db92cb4499b901ae	2020-03-24 11:07:29 -04:00
Ian Wienand	b5b20b6e2c	Add parent and abstract flags for diskimages While YAML does have inbuilt support for anchors to greatly reduce duplicated sections, anchors have no support for merging values. For diskimages, this can result in a lot of duplicated values for each image which you can not otherwise avoid. This provides two new values for diskimages; a "parent" and "abstract". Specifying a parent means you inherit all the configuration values from that image. Anything specified within the child image overwrites the parent values as you would expect; caveats, as described in the documentation, are that the elements field appends and the env-vars field has update() semantics. An "abstract" diskimage is not instantiated into a real image, it is only used for configuration inheritance. This way you can make a abstrat "base" image with common values and inherit that everywhere without having to worry about bringing in values you don't want. You can also chain parents together and the inheritance flows through. Documentation is updated, and several tests are added to ensure the correct parenting, merging and override behaviour of the new values. Change-Id: I170016ef7d8443b9830912b9b0667370e6afcde7	2020-03-20 07:53:08 +11:00
Albin Vass	2ce664ec14	Enable setting label and instance name separately At the moment nodepools aws driver uses the label to set the instance name in aws and fails to launch the instance if "Name" is supplied as a tag. This makes it possible to supply name as a tag. Change-Id: I9585db8fe4b4ad6f5b588fb67a7201296c2fc954	2020-03-12 17:15:32 +01:00
James E. Blair	5d37a0a6e1	Fix GCE volume parameters We were ignoring the volume-type and volume-size parameters for GCE; correct that. Also add a release note. We forgot to do that. We may as well attach it to the next version since it's a new feature, and only with this change does it actually work as documented. Change-Id: I6cad4fa7a661997771f9c7ccf622a5f9828bd750	2020-02-27 09:49:35 -08:00
Andy Ladjadj	5bae6272f4	add ebs-optimized support for aws provider Change-Id: I1f6330a71b85f23e6fbe3abd636764e5f3b8a61d	2020-02-04 18:59:24 +01:00
Clément Mondion	49482e157c	add tags support for aws provider Change-Id: Ib871bfda41192a74ee02b0b3d2e422fde21f2801	2020-01-23 10:32:08 +01:00
Tobias Henkel	52f7d4fb62	Make public ip configurable in aws When running nodepool against private cloud rooms it can be desirable that the nodes don't get a public ip address. Let the user specify this on pool level. Change-Id: I3d636517837fd8a6593c12e4309372da5c062b06	2019-12-21 13:47:08 +01:00
Tobias Henkel	761a9ee00e	Support userdata for instances in aws In some cases we need to be able to launch instances with custom userdata also in aws. Change-Id: I0891961f16bb3bd728622d3413bd185978d79324	2019-12-21 13:35:00 +01:00
James E. Blair	f343dbb05a	GCE: add use-internal-ip option This adds an option to the GCE driver to tell nodepool to use the private ip address even when an external one is provided. Also add a missing schema entry for rate-limit. Change-Id: Ib15bdc76fe500dc0fe6bb98f870514e9e157c1a5	2019-12-13 14:46:41 -08:00
James E. Blair	13104ab0ff	Add Google Cloud provider Also add a TaskManager and a SimpleTaskManagerDriver. Change-Id: I5c44b24600838ae9afcc6a39c482c67933548bc0	2019-12-12 14:33:43 -08:00
Zuul	c790ec4721	Merge "Aws cloud-image is referred to from pool labels section"	2019-12-10 14:58:19 +00:00
Albin Vass	b829726909	Aws cloud-image is referred to from pool labels section Change-Id: I50596aed6da3bec6e2bf8049b277aa91e9e685c3	2019-12-09 12:51:53 +01:00
Zuul	e391572495	Merge "Documentation fixes"	2019-12-06 23:57:22 +00:00
Albin Vass	bb6475177e	Documentation fixes Change-Id: I23d677d5522aec94d3723a71f98f12e58355eeba	2019-12-06 12:54:22 +01:00
Tobias Henkel	0dc40d33e4	Support optional post upload hooks There are several scenarios where it can be useful hook into nodepool after an image got uploaded but before it is taken into use by the launchers. One use case is to be able to run validations on the image (e.g. image size, boot test, etc.) before nodepool tries to use that image and causing potentially node_failures. Another more advanced use case is to be able to pre-distribute an image to all compute nodes in a cloud before an image is used at scale. To facilitate these use cases this adds a new config option post-upload-hook to the provider config. This takes a path to a user defined executable script which then can perform various tasks. If the process fails with an rc != 0 the image gets deleted again and the upload fails. Change-Id: I099cf1243b1bd262b8ee96ab323dbd34c7578c10	2019-11-25 13:37:28 +01:00
Zuul	915be0a5be	Merge "AWS driver: add ability to determine AMI id using filters"	2019-10-24 18:45:20 +00:00
Zuul	b72a9195e1	Merge "Set default python-path to "auto""	2019-10-17 05:26:10 +00:00
Ian Wienand	db87a0845f	Set default python-path to "auto" The "python-path" configuration option makes its way through to Zuul where it sets the "ansible_interpreter_path" in the inventory. Currently this defaults to "/usr/bin/python2" which is wrong for Python 3-only distributions. Ansible >=2.8 provides for automated discovery of the interpreter to avoid runtime errors choosing an invalid interpreter [1]. Using this should mean that "python-path" doesn't need to be explicitly for any common case. As more distributions become Python 3 only, this should "do the right thing" without further configuration. This switches the default python-path to "auto". The dependent change updates Zuul to accept this and use it when running with Ansible >=2.8, or default back to "/usr/bin/python2" for earlier Ansible versions. Testing and documentation is updated, and a release note added. [1] https://docs.ansible.com/ansible/2.8/reference_appendices/interpreter_discovery.html Depends-On: https://review.opendev.org/682275 Change-Id: I02a1a618c8806b150049e91b644ec3c0cb826ba4	2019-10-17 09:17:50 +11:00
Jan Gutter	c733541633	Fix typo in port-cleanup-interval description * This is a follow-up to https://review.opendev.org/687024 * An earlier version of the patch had a different field name, this clears up the confusing term. Change-Id: I213746f9af4ead0b4b5a25e4d67ec1bcb7b2a785	2019-10-14 18:03:32 +02:00

1 2 3 4 5

243 Commits