Commit Graph

20 Commits

Author SHA1 Message Date
James E. Blair c78fe769f2 Allow custom k8s pod specs
This change adds the ability to use the k8s (and friends) drivers
to create pods with custom specs.  This will allow nodepool admins
to define labels that create pods with options not otherwise supported
by Nodepool, as well as pods with multiple containers.

This can be used to implement the versatile sidecar pattern, which,
in a system where it is difficult to background a system process (such
as a database server or container runtime) is useful to run jobs with
such requirements.

It is still the case that a single resource is returned to Zuul, so
a single pod will be added to the inventory.  Therefore, the expectation
that it should be possible to shell into the first container in the
pod is documented.

Change-Id: I4a24a953a61239a8a52c9e7a2b68a7ec779f7a3d
2024-01-30 15:59:34 -08:00
Benjamin Schanzel 4660bb9aa7
Kubernetes/OpenShift drivers: allow setting dynamic k8s labels
Just like for the OpenStack/AWS/Azure drivers, allow to configure
dynamic metadata (labels) for kubernetes resources with information
about the corresponding node request.

Change-Id: I5d174edc6b7a49c2ab579a9a0b1b560389d6de82
2023-09-11 10:49:27 +02:00
mbecker 3fa6821437 Add gpu support for k8s/openshift pods
This adds the option to request GPUs for kubernetes and openshift pods.

Since the resource name depends on the GPU vendor and the cluster
installation, this option is left for the user to define it in the
node pool.
To leverage the ability of some schedulers to use fractional GPUs,
the actual GPU value is read as a string.

For GPUs, requests and limits cannot be decoupled (cf.
https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/),
so the same value will be used for requests and limits.

Change-Id: Ibe33b06c374a431f164080edb34c3a501c360df7
2023-07-11 07:10:30 -07:00
James E. Blair eedd6b9d2a Add extra-resources handling to openshift drivers
This adds the extra-resources handling that was just added to the
k8s driver to openshift.

Change-Id: I56e5eaf6ec22d10e88420094e92041c0b39b04e5
2023-06-27 14:06:11 -07:00
mbecker 1822976350 Add k8s annotations to pods
This allows adding key/value pairs under
metadata.annotations in the kubernetes
resource specification.
This information can be used by different tools
to govern handling of resources.

One particular use-case is the runai-scheduler which
uses annotations to allocate fractional GPU resources
to a pod.

Change-Id: Ib319caffe51e00bedda2861e8e1f2bbe04340322
2023-06-27 14:06:01 -07:00
James E. Blair b0a40f0b47 Use image cache when launching nodes
We consult ZooKeeper to determine the most recent image upload
when we decide whether we should accept or decline a request.  If
we accept the request, we also consult it again for the same
information when we start building the node.  In both cases, we
can use the cache to avoid what may potentially be (especially in
the case of a large number of images or uploads) quite a lot of
ZK requests.  Our cache should be almost up to date (typically
milliseconds, or at the worst, seconds behind), and the worst
case is equivalent to what would happen if an image build took
just a few seconds longer.  The tradeoff is worth it.

Similarly, when we create min-ready requests, we can also consult
the cache.

With those 3 changes, all references to getMostRecentImageUpload
in Nodepool use the cache.

The original un-cached method is kept as well, because there are
an enormous number of references to it in the unit tests and they
don't have caching enabled.

In order to reduce the chances of races in many tests, the startup
sequence is normalized to:
1) start the builder
2) wait for an image to be available
3) start the launcher
4) check that the image cache in the launcher matches what
   is actually in ZK

This sequence (apart from #4) was already used by a minority of
tests (mostly newer tests).  Older tests have been updated.
A helper method, startPool, implements #4 and additionally includes
the wait_for_config method which was used by a random assortment
of tests.

Change-Id: Iac1ff8adfbdb8eb9a286929a59cf07cd0b4ac7ad
2023-04-10 15:57:01 -07:00
James E. Blair 669552f6f9 Add support for specifying pod resource limits
We currently allow users to specify pod resource requests and limits
for cpu, ram, and ephemeral storage.  But if a user specifies one of
these, the value is used for both the request and the limit.

This updates the specification to allow the use of separate request
and limit values.

It also normalizes related behavior across all 3 pod drivers,
including adding resource reporting to the openshift drivers.

Change-Id: I49f918b01f83d6fd0fd07f61c3e9a975aa8e59fb
2023-02-12 07:14:30 -08:00
James E. Blair 9bf44b4a4c Add scheduler, volumes, and labels to k8s/openshift
This adds support for specifying the scheduler name, volumes (and
volume mounts), and additional metadata labels to the Kubernetes
and OpenShift (and OpenShift pods) drivers.

This also extends the k8s and openshift test frameworks so that we
can exercise the new code paths (as well as some previous similar
settings).  Tests and assertions for both a minimal (mostly defaults)
configuration as well as a configuration that uses all the optional
settings are added.

Change-Id: I648e88a518c311b53c8ee26013a324a5013f3be3
2023-02-11 12:03:45 -08:00
James E. Blair 6d3b5f3bab Add missing cloud/region/az/host_id info to nodes
To the greatest extent possible within the limitation of each provider,
this adds cloud, region, az, and host_id to nodes.

Each of AWS, Azure, GCE, IBMVPC have the cloud name hard-coded to
a value that makes sense for each driver given that each of these
are singleton clouds.  Their region and az values are added as
appropriate.

The k8s, openshift, and openshiftpods all have their cloud names set
to the k8s context name, which is the closest approximation of what
the "cloud" attribute means in its existing usage in the OpenStack
driver.  If pods are launched, the host_id value is set to the k8s
host node name, which is an approximation of the existing usage in
the OpenStack driver (where it is typically an opaque uuid that
uniquely identifies the hypervisor).

Change-Id: I53765fc3914a84d2519f5d4dda4f8dc8feda72f2
2022-08-25 13:41:05 -07:00
James E. Blair 10df93540f Use Zuul-style ZooKeeper connections
We have made many improvements to connection handling in Zuul.
Bring those back to Nodepool by copying over the zuul/zk directory
which has our base ZK connection classes.

This will enable us to bring other Zuul classes over, such as the
component registry.

The existing connection-related code is removed and the remaining
model-style code is moved to nodepool.zk.zookeeper.  Almost every
file imported the model as nodepool.zk, so import adjustments are
made to compensate while keeping the code more or less as-is.

Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3
2022-05-23 07:40:20 -07:00
Zuul 2926807c65 Merge "Add option of configuring imagePullSecrets for openshift drivers" 2022-04-19 08:58:53 +00:00
James E. Blair 9bcc046ffc Add QuotaSupport to drivers that don't have it
This adds QuotaSupport to all the drivers that don't have it, and
also updates their tests so there is at least one test which exercises
the new tenant quota feature.

Since this is expected to work across all drivers/providers/etc, we
should start including at least rudimentary quota support in every
driver.

Change-Id: I891ade226ba588ecdda835b143b7897bb4425bd8
2022-01-27 10:11:01 -08:00
Albin Vass 700cf38db0 Add option of configuring imagePullSecrets for openshift drivers
Change-Id: If1c877e86a020b4ee1b4dbf795c8ac2e3079b43f
2022-01-11 14:19:29 +01:00
Tristan Cacqueray 05927dae03 kubernetes: refactor client creation to utils_k8s
This change moves the kubernetes client creation to a common
function to re-use the exception handling logic.

Change-Id: I5bdd369f6c9a78e5f79a926d8690f285fda94af9
2021-06-15 16:13:53 +00:00
Albin Vass 0c84b7fa4e Add shell-type config
Ansible needs to know which shell type the node uses to operate
correctly, especially for ssh connections for windows nodes because
otherwise ansible defaults to trying bash.

Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538
2021-03-05 15:14:29 +01:00
Clark Boylan e7f831c34e
Bump openshift dep
The openshift library has been completely redesigned with recent
releases so bump the dep and adapt to the new api. The update is
necessary in order to fix a urllib3 version conflict [1].

[1] Trace:
ERROR: nodepool 3.14.1.dev3 has requirement urllib3<1.26,>=1.25.4, but you'll have urllib3 1.24 which is incompatible.
ERROR: kubernetes 8.0.2 has requirement urllib3>=1.24.2, but you'll have urllib3 1.24 which is incompatible.
ERROR: botocore 1.19.30 has requirement urllib3<1.27,>=1.25.4; python_version != "3.4", but you'll have urllib3 1.24 which is incompatible.

Change-Id: Ia4d09fd0a4a49d644bb575b74184de930c62ce89
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>
Story: 2008427
Task: 41373
2021-01-11 17:26:31 +01:00
David Shrewsbury 8528322cf0 Update tests for node-attributes
This found problems with the openshift and openshiftpods drivers,
so that is fixed. Also update the docs to reflect the fact that
node-attributes is supported across all drivers.

Note that we do not appear to have GCE driver tests, so that one
is just assumed to work.  :(

Change-Id: I98b6f871815d2b564d1550d960e682c180bac7c2
2020-04-02 12:39:56 -07:00
Tristan Cacqueray fc15740286 Ensure both kubernetes and openshift token are b64decoded
This change decodes the kubernetes secret and also use
a similar token for openshift project: secret.data.token instead
of the token-secret.value.

Change-Id: Ie846d362a648268e52b5f56e29567cbff9c84930
2019-10-23 17:31:29 +00:00
Tristan Cacqueray 76aa62230c Add python-path option to node
This change adds a new python_path Node attribute so that zuul executor
can remove the default hard-coded ansible_python_interpreter.

Change-Id: Iddf2cc6b2df579636ec39b091edcfe85a4a4ed10
2019-05-07 02:22:45 +00:00
Tristan Cacqueray c1378c4407 Implement an OpenShift resource provider
This change implements an OpenShift resource provider. The driver currently
supports project request and pod request to enable both containers as machine
and native containers workflow.

Depends-On: https://review.openstack.org/608610
Change-Id: Id3770f2b22b80c2e3666b9ae5e1b2fc8092ed67c
2019-01-10 05:05:46 +00:00