Commit Graph

22 Commits

Author SHA1 Message Date
James E. Blair c78fe769f2 Allow custom k8s pod specs
This change adds the ability to use the k8s (and friends) drivers
to create pods with custom specs.  This will allow nodepool admins
to define labels that create pods with options not otherwise supported
by Nodepool, as well as pods with multiple containers.

This can be used to implement the versatile sidecar pattern, which,
in a system where it is difficult to background a system process (such
as a database server or container runtime) is useful to run jobs with
such requirements.

It is still the case that a single resource is returned to Zuul, so
a single pod will be added to the inventory.  Therefore, the expectation
that it should be possible to shell into the first container in the
pod is documented.

Change-Id: I4a24a953a61239a8a52c9e7a2b68a7ec779f7a3d
2024-01-30 15:59:34 -08:00
Benjamin Schanzel 4660bb9aa7
Kubernetes/OpenShift drivers: allow setting dynamic k8s labels
Just like for the OpenStack/AWS/Azure drivers, allow to configure
dynamic metadata (labels) for kubernetes resources with information
about the corresponding node request.

Change-Id: I5d174edc6b7a49c2ab579a9a0b1b560389d6de82
2023-09-11 10:49:27 +02:00
mbecker 3fa6821437 Add gpu support for k8s/openshift pods
This adds the option to request GPUs for kubernetes and openshift pods.

Since the resource name depends on the GPU vendor and the cluster
installation, this option is left for the user to define it in the
node pool.
To leverage the ability of some schedulers to use fractional GPUs,
the actual GPU value is read as a string.

For GPUs, requests and limits cannot be decoupled (cf.
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/),
so the same value will be used for requests and limits.

Change-Id: Ibe33b06c374a431f164080edb34c3a501c360df7
2023-07-11 07:10:30 -07:00
James E. Blair ac187302a3 Add extra-resources quota handling to the k8s driver
Some k8s schedulers like run.ai use custom pod annotations rather
than standard k8s resources to specify required resources such as
gpus.  To facilitate quota handling for these resources in nodepool,
this change adds an extra-resources attribute to labels that can be
used to ensure nodepool doesn't try to launch more resources than
can be handled.

Users can already specify a 'max-resources' limit for arbitrary
resources in the nodepool config; this change allows them to also
specify arbitrary resource consumption with 'extra-resources'.

Change-Id: I3d2612a7d168bf415d58029aa295e60c3c83cecd
2023-06-27 14:06:08 -07:00
mbecker 1822976350 Add k8s annotations to pods
This allows adding key/value pairs under
metadata.annotations in the kubernetes
resource specification.
This information can be used by different tools
to govern handling of resources.

One particular use-case is the runai-scheduler which
uses annotations to allocate fractional GPU resources
to a pod.

Change-Id: Ib319caffe51e00bedda2861e8e1f2bbe04340322
2023-06-27 14:06:01 -07:00
James E. Blair 669552f6f9 Add support for specifying pod resource limits
We currently allow users to specify pod resource requests and limits
for cpu, ram, and ephemeral storage.  But if a user specifies one of
these, the value is used for both the request and the limit.

This updates the specification to allow the use of separate request
and limit values.

It also normalizes related behavior across all 3 pod drivers,
including adding resource reporting to the openshift drivers.

Change-Id: I49f918b01f83d6fd0fd07f61c3e9a975aa8e59fb
2023-02-12 07:14:30 -08:00
James E. Blair 9bf44b4a4c Add scheduler, volumes, and labels to k8s/openshift
This adds support for specifying the scheduler name, volumes (and
volume mounts), and additional metadata labels to the Kubernetes
and OpenShift (and OpenShift pods) drivers.

This also extends the k8s and openshift test frameworks so that we
can exercise the new code paths (as well as some previous similar
settings).  Tests and assertions for both a minimal (mostly defaults)
configuration as well as a configuration that uses all the optional
settings are added.

Change-Id: I648e88a518c311b53c8ee26013a324a5013f3be3
2023-02-11 12:03:45 -08:00
James E. Blair aa8580ce32 Add support for privileged containers
To allow users to run docker-in-docker style workloads on k8s
and openshift clusters, add support for adding the privileged
flag to containers created in k8s and openshift pods.

Change-Id: I349d61bf200d7fb6d1effe112f7505815b06e9a8
2023-01-25 11:09:25 -08:00
Benjamin Schanzel 6c9c219eb0
Add config option to limit ephemeral storage on K8s Pod labels
This adds config options for limiting the amount of ephemeral storage
allocatable by a container of a K8s Pod-type label.
This optional config translates to K8s settings

* spec.containers[].resources.limits.ephemeral-storage
* spec.containers[].resources.requests.ephemeral-storage

This is to provide a mechanism that prevents Pods from filling up their
hosts storage and thereby interfering with or breaking other workloads
on the same host (esp. on shared clusters).

Like for cpu and memory limits, a pool-scoped default can also be
specified.

Change-Id: I23e90ae53cc2b2eb0e51cc9e3dc5802c86cc0ac9
2022-10-13 13:56:43 +02:00
Benjamin Schanzel d60a27a787
Default limits for k8s labels and quota support
This adds config options to enforce default resource (cpu,mem) limits on
k8s pod labels. With this, we can ensure all pod nodes have resource
information set on them. This allows to account for max-cores and
max-ram quotas for k8s pod nodes. Therefore also adding these config
options. Also tenant-quotas can then be considered for pod nodes.

Change-Id: Ida121c20b32828bba65a319318baef25b562aef2
2022-05-02 11:35:04 +02:00
James E. Blair 94fcc70a59 Azure: reconcile config objects
The config objects in the Azure driver have drifted a bit.  This
updates them to match the actual used configuration.  It also
reorganizes them to be a little easier to maintain by moving the
initializers into the individual objects.

Finally, the verbose __eq__ methods are removed in favor of a
simpler __eq__ method in the superclass.

Since the OpenStack, k8s, and OpenShift drivers calls super() in
__eq__ methods, they need to be updated at the same time.

This also corrects an unrelated error with a misnamed parameter
in the fake k8s used in the k8s tests.

Change-Id: Id6971ca002879d3fb056fedc7e4ca6ec35dd7434
2021-03-22 10:39:53 -07:00
Albin Vass 0c84b7fa4e Add shell-type config
Ansible needs to know which shell type the node uses to operate
correctly, especially for ssh connections for windows nodes because
otherwise ansible defaults to trying bash.

Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538
2021-03-05 15:14:29 +01:00
Benjamin Schanzel 19be1a2e26 OpenShift/k8s Provider: Basic Support for k8s nodeSelectors
This adds support to specify node selectors on Pod node labels.
They are used by the k8s scheduler to place a Pod on specific nodes with
corresponding labels.
This allows to place a build node/Pod on k8s nodes with certain
capabilities (e.g. storage types, number of CPU cores, etc.)

Change-Id: Ic00a84181c8ef66189e4259ef6434dc62b81c3c6
2020-08-14 16:39:04 +02:00
Benjamin Schanzel b76a0f458e OpenShift/k8s Provider: Allow passing env vars to Pods
For the OpenShift and Kubernetes drivers, allow passing env vars to the
Pod nodes via their label config.
It is not possible to set persistent env vars in containers on run time
because there is no login shell available. Thus, we need to pass in any
env vars during node launch. This allows to set, e.g., ``http_proxy``
variables.

The env vars are passed as a list of dicts with ``name`` and ``value``
fields as per the k8s Pod YAML schema. [1]

```
- name: pod-fedora
  type: pod
  image: docker.io/fedora:28
  env:
  - name: foo
    value: bar
```

[1] https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/

Change-Id: Ibbd9222fcd8f7dc5be227e7f5c8d8772a4c594e2
2020-07-13 17:11:01 +02:00
Benjamin Schanzel baf5407adc Kubernetes Driver: Allow cpu/mem resource limits
In the OpenShift and OpenShiftPods drivers, it is possible to configure
resource requests and limits for the container per label attributes.
This feature was missing in the Kubernetes driver, thus this change
introduces it analogously to the OpenShift driver.

Change-Id: I7e67aebf892d10939672bdf76b8b3eb543124f9a
2020-06-19 15:00:25 +02:00
Ian Wienand db87a0845f Set default python-path to "auto"
The "python-path" configuration option makes its way through to Zuul
where it sets the "ansible_interpreter_path" in the inventory.
Currently this defaults to "/usr/bin/python2" which is wrong for
Python 3-only distributions.

Ansible >=2.8 provides for automated discovery of the interpreter to
avoid runtime errors choosing an invalid interpreter [1].  Using this
should mean that "python-path" doesn't need to be explicitly for any
common case.  As more distributions become Python 3 only, this should
"do the right thing" without further configuration.

This switches the default python-path to "auto".  The dependent change
updates Zuul to accept this and use it when running with Ansible
>=2.8, or default back to "/usr/bin/python2" for earlier Ansible
versions.

Testing and documentation is updated, and a release note added.

[1] https://docs.ansible.com/ansible/2.8/reference_appendices/interpreter_discovery.html

Depends-On: https://review.opendev.org/682275
Change-Id: I02a1a618c8806b150049e91b644ec3c0cb826ba4
2019-10-17 09:17:50 +11:00
Mohammed Naser 67bdd85425 k8s: make context optional
The Kubernetes driver can optionally use the credentials inside
the pod which means that context should be optional.

Change-Id: Iaf42cd03bd4a92e36133fd2ec7157869f8747d6b
2019-09-13 14:16:15 -04:00
Flavio Percoco e819f4b4b1 Allow nodepool for using in-cluster configs
When running within a kubernetes cluster, it's easier to configure a
service account and mounting it on the nodepool POD than creating a
kubeconfig file and making it available to nodepool.

For this reason, this commit adds the ability to load configs from the
in-cluster service account paths. It does this as a fallback when the
kubeconfig path doesn't exist.

This commit also makes `context` a non required configuration option,
since it's not needed when a service account is used.

Change-Id: I7762940993c185c17d7468df72dff22e99d7f8c2
2019-06-28 15:11:55 +02:00
Tristan Cacqueray 76aa62230c Add python-path option to node
This change adds a new python_path Node attribute so that zuul executor
can remove the default hard-coded ansible_python_interpreter.

Change-Id: Iddf2cc6b2df579636ec39b091edcfe85a4a4ed10
2019-05-07 02:22:45 +00:00
David Shrewsbury d6ef934b70 Extract common config parsing for ProviderConfig
Adds a ProviderConfig class method that can be called to get
the config schema for the common config options in a Provider.
Drivers are modified to call this method.

Change-Id: Ib67256dddc06d13eb7683226edaa8c8c10a73326
2019-01-07 12:34:05 +00:00
David Shrewsbury a19dffd916 Extract out common config parsing for ConfigPool
Our driver code is in a less-than-ideal situation where each driver
is responsible for parsing config options that are common to all
drivers. This change begins to correct that, starting with ConfigPool.
It changes the driver API in the following ways:

1) Forces objects derived from ConfigPool to implement a load() method
   that should call super's method, then handle loading driver specific
   options from the config.

2) Adds a ConfigPool class method that can be called to get the config
   schema for the common config options leaving drivers to have to only
   define the schema for their own config options.

Other base config objects will be modeled after this pattern in
later changes.

Change-Id: I41620590c355cacd2c4fbe6916acfe80f20e3216
2019-01-03 11:05:26 -05:00
Tristan Cacqueray 4295ff6870 Implement a Kubernetes driver
This changes implements a Kubernetes resource provider.
The driver supports namespace request and pod request to enable both
containers as machine and native containers workflow.

Depends-On: https://review.openstack.org/605823
Change-Id: I81b5dc5abe92b71cc98b0d71c8a2863cddff6027
2018-10-25 10:24:45 +00:00