Commit Graph

24 Commits

Author SHA1 Message Date
James E. Blair c78fe769f2 Allow custom k8s pod specs
This change adds the ability to use the k8s (and friends) drivers
to create pods with custom specs.  This will allow nodepool admins
to define labels that create pods with options not otherwise supported
by Nodepool, as well as pods with multiple containers.

This can be used to implement the versatile sidecar pattern, which,
in a system where it is difficult to background a system process (such
as a database server or container runtime) is useful to run jobs with
such requirements.

It is still the case that a single resource is returned to Zuul, so
a single pod will be added to the inventory.  Therefore, the expectation
that it should be possible to shell into the first container in the
pod is documented.

Change-Id: I4a24a953a61239a8a52c9e7a2b68a7ec779f7a3d
2024-01-30 15:59:34 -08:00
Benjamin Schanzel 4660bb9aa7
Kubernetes/OpenShift drivers: allow setting dynamic k8s labels
Just like for the OpenStack/AWS/Azure drivers, allow to configure
dynamic metadata (labels) for kubernetes resources with information
about the corresponding node request.

Change-Id: I5d174edc6b7a49c2ab579a9a0b1b560389d6de82
2023-09-11 10:49:27 +02:00
mbecker 3fa6821437 Add gpu support for k8s/openshift pods
This adds the option to request GPUs for kubernetes and openshift pods.

Since the resource name depends on the GPU vendor and the cluster
installation, this option is left for the user to define it in the
node pool.
To leverage the ability of some schedulers to use fractional GPUs,
the actual GPU value is read as a string.

For GPUs, requests and limits cannot be decoupled (cf.
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/),
so the same value will be used for requests and limits.

Change-Id: Ibe33b06c374a431f164080edb34c3a501c360df7
2023-07-11 07:10:30 -07:00
James E. Blair eedd6b9d2a Add extra-resources handling to openshift drivers
This adds the extra-resources handling that was just added to the
k8s driver to openshift.

Change-Id: I56e5eaf6ec22d10e88420094e92041c0b39b04e5
2023-06-27 14:06:11 -07:00
mbecker 1822976350 Add k8s annotations to pods
This allows adding key/value pairs under
metadata.annotations in the kubernetes
resource specification.
This information can be used by different tools
to govern handling of resources.

One particular use-case is the runai-scheduler which
uses annotations to allocate fractional GPU resources
to a pod.

Change-Id: Ib319caffe51e00bedda2861e8e1f2bbe04340322
2023-06-27 14:06:01 -07:00
James E. Blair 669552f6f9 Add support for specifying pod resource limits
We currently allow users to specify pod resource requests and limits
for cpu, ram, and ephemeral storage.  But if a user specifies one of
these, the value is used for both the request and the limit.

This updates the specification to allow the use of separate request
and limit values.

It also normalizes related behavior across all 3 pod drivers,
including adding resource reporting to the openshift drivers.

Change-Id: I49f918b01f83d6fd0fd07f61c3e9a975aa8e59fb
2023-02-12 07:14:30 -08:00
James E. Blair 9bf44b4a4c Add scheduler, volumes, and labels to k8s/openshift
This adds support for specifying the scheduler name, volumes (and
volume mounts), and additional metadata labels to the Kubernetes
and OpenShift (and OpenShift pods) drivers.

This also extends the k8s and openshift test frameworks so that we
can exercise the new code paths (as well as some previous similar
settings).  Tests and assertions for both a minimal (mostly defaults)
configuration as well as a configuration that uses all the optional
settings are added.

Change-Id: I648e88a518c311b53c8ee26013a324a5013f3be3
2023-02-11 12:03:45 -08:00
James E. Blair aa8580ce32 Add support for privileged containers
To allow users to run docker-in-docker style workloads on k8s
and openshift clusters, add support for adding the privileged
flag to containers created in k8s and openshift pods.

Change-Id: I349d61bf200d7fb6d1effe112f7505815b06e9a8
2023-01-25 11:09:25 -08:00
Clark Boylan 2a231a08c9 Add idle state to driver providers
This change adds an idle state to driver providers which is used to
indicate that the provider should stop performing actions that are not
safe to perform while we bootstrap a second newer version of the
provider to handle a config update.

This is particularly interesting for the static driver because it is
managing all of its state internally to nodepool and not relying on
external cloud systems to track resources. This means it is important
for the static provider to not have an old provider object update
zookeeper at the same time as a new provider object. This was previously
possible and created situtations where the resources in zookeeper did
not reflect our local config.

Since all other drivers rely on external state the primary update here
is to the static driver. We simply stop performing config
synchronization if the idle flag is set on a static provider. This will
allow the new provider to take over reflecting the new config
consistently.

Note, we don't take other approaches and essentially create a system
specific to the static driver because we're trying to avoid modifying
the nodepool runtime significantly to fix a problem that is specific to
the static driver.

Change-Id: I93519d0c6f4ddf8a417d837f6ae12a30a55870bb
2022-10-24 15:30:31 -07:00
James E. Blair 6d3b5f3bab Add missing cloud/region/az/host_id info to nodes
To the greatest extent possible within the limitation of each provider,
this adds cloud, region, az, and host_id to nodes.

Each of AWS, Azure, GCE, IBMVPC have the cloud name hard-coded to
a value that makes sense for each driver given that each of these
are singleton clouds.  Their region and az values are added as
appropriate.

The k8s, openshift, and openshiftpods all have their cloud names set
to the k8s context name, which is the closest approximation of what
the "cloud" attribute means in its existing usage in the OpenStack
driver.  If pods are launched, the host_id value is set to the k8s
host node name, which is an approximation of the existing usage in
the OpenStack driver (where it is typically an opaque uuid that
uniquely identifies the hypervisor).

Change-Id: I53765fc3914a84d2519f5d4dda4f8dc8feda72f2
2022-08-25 13:41:05 -07:00
Zuul 2926807c65 Merge "Add option of configuring imagePullSecrets for openshift drivers" 2022-04-19 08:58:53 +00:00
James E. Blair 9bcc046ffc Add QuotaSupport to drivers that don't have it
This adds QuotaSupport to all the drivers that don't have it, and
also updates their tests so there is at least one test which exercises
the new tenant quota feature.

Since this is expected to work across all drivers/providers/etc, we
should start including at least rudimentary quota support in every
driver.

Change-Id: I891ade226ba588ecdda835b143b7897bb4425bd8
2022-01-27 10:11:01 -08:00
Albin Vass 700cf38db0 Add option of configuring imagePullSecrets for openshift drivers
Change-Id: If1c877e86a020b4ee1b4dbf795c8ac2e3079b43f
2022-01-11 14:19:29 +01:00
Tristan Cacqueray 05927dae03 kubernetes: refactor client creation to utils_k8s
This change moves the kubernetes client creation to a common
function to re-use the exception handling logic.

Change-Id: I5bdd369f6c9a78e5f79a926d8690f285fda94af9
2021-06-15 16:13:53 +00:00
James E. Blair 63f38dfd6c Support threadless deletes
The launcher implements deletes using threads, and unlike with
launches, does not give drivers an opportunity to override that
and handle them without threads (as we want to do in the state
machine driver).

To correct this, we move the NodeDeleter class from the launcher
to driver utils, and add a new driver Provider method that returns
the NodeDeleter thread.  This is added in the base Provider class
so all drivers get this behavior by default.

In the state machine driver, we override the method so that instead
of returning a thread, we start a state machine and add it to a list
of state machines that our internal state machine runner thread
should drive.

Change-Id: Iddb7ed23c741824b5727fe2d89c9ddbfc01cd7d7
2021-03-21 14:39:01 -07:00
Clark Boylan e7f831c34e
Bump openshift dep
The openshift library has been completely redesigned with recent
releases so bump the dep and adapt to the new api. The update is
necessary in order to fix a urllib3 version conflict [1].

[1] Trace:
ERROR: nodepool 3.14.1.dev3 has requirement urllib3<1.26,>=1.25.4, but you'll have urllib3 1.24 which is incompatible.
ERROR: kubernetes 8.0.2 has requirement urllib3>=1.24.2, but you'll have urllib3 1.24 which is incompatible.
ERROR: botocore 1.19.30 has requirement urllib3<1.27,>=1.25.4; python_version != "3.4", but you'll have urllib3 1.24 which is incompatible.

Change-Id: Ia4d09fd0a4a49d644bb575b74184de930c62ce89
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>
Story: 2008427
Task: 41373
2021-01-11 17:26:31 +01:00
Benjamin Schanzel d4cf0572e6 k8s/OpenShift Provider: Remove workingDir Attribute
For users to be able to specifiy a custom working dir for their
container nodes this change removes the hard-coded /tmp workingDir
attribute from the container specs.

The user-specified WORKDIR from the respective Dockerfile is then used.

Change-Id: I0e2c0ca5be0af2360f54336340a40fa37ffe1001
2020-11-02 10:23:12 +01:00
Benjamin Schanzel 19be1a2e26 OpenShift/k8s Provider: Basic Support for k8s nodeSelectors
This adds support to specify node selectors on Pod node labels.
They are used by the k8s scheduler to place a Pod on specific nodes with
corresponding labels.
This allows to place a build node/Pod on k8s nodes with certain
capabilities (e.g. storage types, number of CPU cores, etc.)

Change-Id: Ic00a84181c8ef66189e4259ef6434dc62b81c3c6
2020-08-14 16:39:04 +02:00
Benjamin Schanzel b76a0f458e OpenShift/k8s Provider: Allow passing env vars to Pods
For the OpenShift and Kubernetes drivers, allow passing env vars to the
Pod nodes via their label config.
It is not possible to set persistent env vars in containers on run time
because there is no login shell available. Thus, we need to pass in any
env vars during node launch. This allows to set, e.g., ``http_proxy``
variables.

The env vars are passed as a list of dicts with ``name`` and ``value``
fields as per the k8s Pod YAML schema. [1]

```
- name: pod-fedora
  type: pod
  image: docker.io/fedora:28
  env:
  - name: foo
    value: bar
```

[1] https://kubernetes.io/docs/tasks/inject-data-application/define-environment-variable-container/

Change-Id: Ibbd9222fcd8f7dc5be227e7f5c8d8772a4c594e2
2020-07-13 17:11:01 +02:00
Benjamin Schanzel bc172f0471 k8s/OKD Provider: Don't Set ca_cert if TLS verification is skipped
Kubernetes does not allow to set a ca_cert in a kubeconfig if TLS
certificate verifiaction is disabled. Doing so results in an error
message:
`error: specifying a root certificates file with the insecure flag is not allowed`
This change makes sure we skip the ca_cert option nodepool-launcher
generates for the Zuul executor if nodepools kubeconfig is set to
skip TLS cert verification.

Change-Id: I458c054fc9fae340d187ce40ea1236efdf65d50f
2020-04-08 13:22:30 +02:00
Benjamin Schanzel 7d7b08fadf Kubernetes/OpenShift Provider: Don't Require Bash in Container Images
Currently the Kubernetes and OpenShift providers set the entrypoint
of their build node pods to `/bin/bash`, which then requires `bash`
to be available in the respective container image. This might not
always be the case (e.g. with Alpine based images).
This change makes sure the entrypoint is set to `/bin/sh`, which we
can more reliably assume to be available in the container image.

Change-Id: I799ea95b715e50d9c22e66cc80579cf119db8f38
2020-03-10 11:17:43 +01:00
Tristan Cacqueray fc15740286 Ensure both kubernetes and openshift token are b64decoded
This change decodes the kubernetes secret and also use
a similar token for openshift project: secret.data.token instead
of the token-secret.value.

Change-Id: Ie846d362a648268e52b5f56e29567cbff9c84930
2019-10-23 17:31:29 +00:00
Tristan Cacqueray 159038503a Implement an OpenShift Pod provider
This change implements a single project OpenShift pod provider usable by a
regular user service account, without the need for a self-provisioner role.

Change-Id: I84e4bdda64716f9dd803eaa89e576c26a1667809
2019-05-07 02:25:15 +00:00
Tristan Cacqueray c1378c4407 Implement an OpenShift resource provider
This change implements an OpenShift resource provider. The driver currently
supports project request and pod request to enable both containers as machine
and native containers workflow.

Depends-On: https://review.openstack.org/608610
Change-Id: Id3770f2b22b80c2e3666b9ae5e1b2fc8092ed67c
2019-01-10 05:05:46 +00:00