Commit Graph

39 Commits

Author SHA1 Message Date
Zuul 392cf017c3 Merge "Add support for AWS IMDSv2" 2024-02-28 02:46:53 +00:00
Zuul 8775e54e5d Merge "Remove hostname-format option" 2024-02-28 02:46:52 +00:00
Zuul 202188b2f5 Merge "Reconcile docs/validation for some options" 2024-02-27 18:19:33 +00:00
James E. Blair 8259170516 Change the AWS default image volume-type from gp2 to gp3
gp3 is better in almost every way (cheaper, faster, more configurable).
It seems difficult to find a situation where gp2 would be a better
choice, so update the default when creating images to use gp3.

There are two locations where we can specify volume-type: image creation
(where the volume type becomes the default type for the image) and
instance creation (where we can override what the image specifies).
This change updates only the first (image creation), but not the second,
which has no default (which means to use whatever the image specified).

https://aws.amazon.com/ebs/general-purpose/

Change-Id: Ibfc5dfd3958e5b7dbd73c26584d6a5b8d3a1b4eb
2024-02-20 13:04:26 -08:00
James E. Blair e097731339 Remove hostname-format option
This option has not been used since at least the migratio to the
statemachine framework.

Change-Id: I7a0e928889f72606fcbba0c94c2d49fbb3ffe55f
2024-02-08 09:40:41 -08:00
James E. Blair f89b41f6ad Reconcile docs/validation for some options
Some drivers were missing docs and/or validation for options that
they actually support.  This change:

adds launch-timeout to:
  metastatic docs and validation
  aws validation
  gce docs and validation
adds post-upload-hook to:
  aws validation
adds boot-timeout to:
  metastatic docs and validation
adds launch-retries to:
  metastatic docs and validation

Change-Id: Id3f4bb687c1b2c39a1feb926a50c46b23ae9df9a
2024-02-08 09:36:35 -08:00
James E. Blair 3f4fb008b0 Add support for AWS IMDSv2
This is an authenticated http metadata service which is typically
available by default, but a more secure setup is to enforce its
usage.

This change adds the ability to do that for both instances and
AMIs.

Change-Id: Ia8554ff0baec260289da0574b92932b37ffe5f04
2024-01-24 15:11:35 -08:00
Zuul 785f7dcbc9 Merge "AWS: Add support for retrying image imports" 2023-08-28 18:43:56 +00:00
James E. Blair c2d9c45655 AWS: Add support for retrying image imports
AWS has limits on the number of image import tasks that can run
simultaneously.  In a busy system with large images, it would be
better to wait until those limits clear rather than delete the
uploaded s3 object and start over, uploading it again.  To support
this, we now detect that condition and optionally retry for a
specified amount of time.

The default remains to bail on the first error.

Change-Id: I6aa7f79b2f73c4aa6743f11221907a731a82be34
2023-08-12 11:45:22 -07:00
James E. Blair 202230e16b Use diskimage username in AWS and Azure drivers
The AWS and Azure drivers incorrectly required the user to supply
the username in the pool configuration when using diskimages.
The OpenStack and IBMVPC drivers correctly use the top-level
diskimage configuration to determine the username.

Correct this by deprecating the pool-level configuration in the
drivers that offer it, and default it to using the top-level
configuration.

Change-Id: I4e6b4d4268b32ab7b397a11dd0ccd08b18c09a86
2023-08-03 12:31:31 -07:00
Christian Mueller 36dbff84ba Amazon EC2 Spot support
This adds support for launching Amazon EC2 Spot instances
(https://aws.amazon.com/ec2/spot/), which comes with huge cost saving
opportunities.

Amazon EC2 Spot instances are spare Amazon EC2 capacity, you can get
with an discount of up to 90% compared to on-demand pricing.
In contrast to on-demand instances, Spot instances can be relaimed with a
2 minute notification in advance
(https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html).

When :attr:`providers.[aws].pools.labels.use-spot` is set to True, the AWS
driver will launch Spot instances. If an instance get interrupted, it will be
terminated and no replacement instance will be launched.

Change-Id: I9868d014991d78e7b2421439403ae1371b33524c
2023-04-16 21:12:06 +02:00
James E. Blair fdc093a8de Add import_image support to AWS
In I9478c0050777bf35e1201395bd34b9d01b8d5795 we switched from using the
import_image method to import_snapshot in the AWS driver.  This method
is faster and more like other drivers in Nodepool.  However, some operating
systems (such as Windows, RHEL or SLES) require licensing metadata
associated with an AMI which is not available to be set when we register
an AMI from a snapshot.  For these systems, the only viable way to upload
images is with the import_image method.

This change restores the previous method as an option, but keeps the
"snapshot" method as the default.

Change-Id: I81daabebbc9dbe968d8aaf65e6b70f5cdfdd01bf
2023-01-30 20:25:56 -08:00
James E. Blair be3edd3e17 Convert openstack driver to statemachine
This updates the OpenStack driver to use the statemachine framework.

The goal is to revise all remaining drivers to use the statemachine
framework for two reasons:

1) We can dramatically reduce the number of threads in Nodepool which
is our biggest scaling bottleneck.  The OpenStack driver already
includes some work in that direction, but in a way that is unique
to it and not easily shared by other drivers.  The statemachine
framework is an extension of that idea implemented so that every driver
can use it.  This change further reduces the number of threads needed
even for the openstack driver.

2) By unifying all the drivers with a simple interface, we can prepare
to move them into Zuul.

There are a few updates to the statemachine framework to accomodate some
features that only the OpenStack driver used to date.

A number of tests need slight alteration since the openstack driver is
the basis of the "fake" driver used for tests.

Change-Id: Ie59a4e9f09990622b192ad840d9c948db717cce2
2023-01-10 10:30:14 -08:00
James E. Blair 4ea824cfa9 Aws: add support for volume iops and throughput
Users can request specific IOPS and throughput allocations from EC2.
The availability and defaults vary for volume type, but IOPS are
available for all volumes, and throughput is available on gp3 volumes.

Change-Id: Icc7432d8ce1c3514bfe9d8fda20bd399b67ede7a
2022-10-14 07:08:30 -07:00
Zuul f670c53a56 Merge "Add ENA support option on uploaded AWS images" 2022-08-31 20:57:30 +00:00
James E. Blair a028e86b73 Add ENA support option on uploaded AWS images
Change I9478c0050777bf35e1201395bd34b9d01b8d5795 made images
unbootable for instance types that require enhanced networking
(ENA). The reason is that the register_image call needs to set
EnableENA to True [1].

Also add missing documentation of the 'architecture' attribute.

[1] https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/enhanced-networking-ena.html
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>

Change-Id: Iacf1e989c0cda90294f8474d4bd11403cec41c12
2022-08-29 16:21:20 +00:00
James E. Blair 6320b06950 Add support for dynamic tags
This allows users to create tags (or properties in the case of OpenStack)
on instances using string interpolation values.  The use case is to be
able to add information about the tenant* which requested the instance
to cloud-provider tags.

* Note that ultimately Nodepool may not end up using a given node for
the request which originally prompted its creation, so care should be
taken when using information like this.  The documentation notes that.

This feature uses a new configuration attribute on the provider-label
rather than the existing "tags" or "instance-properties" because existing
values may not be safe for use as Python format strings (e.g., an
existing value might be a JSON blob).  This could be solved with YAML
tags (like !unsafe) but the most sensible default for that would be to
assume format strings and use a YAML tag to disable formatting, which
doesn't help with our backwards-compatibility problem.  Additionally,
Nodepool configuration does not use YAML anchors (yet), so this would
be a significant change that might affect people's use of external tools
on the config file.

Testing this was beyond the ability of the AWS test framework as written,
so some redesign for how we handle patching boto-related methods is
included.  The new approach is simpler, more readable, and flexible
in that it can better accomodate future changes.

Change-Id: I5f1befa6e2f2625431523d8d94685f79426b6ae5
2022-08-23 11:06:55 -07:00
James E. Blair 916d62a374 Allow specifying diskimage metadata/tags
For drivers that support tagging/metadata (openstack, aws, azure),
Add or enhance support for supplying tags for uploaded diskimages.

This allows users to set metadata on the global diskimage object
which will then be used as default values for metadata on the
provider diskimage values.  The resulting merged dictionary forms
the basis of metadata to be associated with the uploaded image.

The changes needed to reconcile this for the three drivers mentioned
above are:

All: the diskimages[].meta key is added to supply the default values
for provider metadata.

OpenStack: provider diskimage metadata is already supported using
providers[].diskimages[].meta, so no further changes are needed.

AWS, Azure: provider diskimage tags are added using the key
providers[].diskimages[].tags since these providers already use
the "tags" nomenclature for instances.

This results in the somewhat incongruous situation where we have
diskimage "metadata" being combined with provider "tags", but it's
either that or have images with "metadata" while we have instances
with "tags", both of which are "tags" in EC2.  The chosen approach
has consistency within the driver.

Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8
2022-08-23 06:39:08 -07:00
James E. Blair 89cda5a1ba AWS: Use snapshot instead of image import
AWS recommends using the import_image method for importing diskimages
into the system as AMIs, but there is a significant caveat with that:
EC2 will attempt to boot an instance from the image and snapshot the
instance.  That process requires that the image match certain
characteristics of already existing images supported by AWS, so only
certain operating systems can be succesfully imported.

If anything goes wrong with the import process, the errors are opaque
since the temporary instance used for the import is inaccessible to
the user.

An alternative is to use the import_snapshot method which will produce
a snapshot object in EC2, and then a new AMI can be "registered" and
pointed to that snapshot.  It's an extra step for the Nodepool
builder, but it's simple and takes less time overall than waiting for
EC2 to boot up the temporary instance.

This is a much closer approximation to the import scheme used in
other Nodepool drivers.

I have successfully tested this method with a cirros image, which is
notable for not being a supported operating system with the previous
method.

The caveats and instructions relating to setting up the Import/Export
service roles still apply, so the documentation related to them remain.

The method for reconciling missing tags for aborted image uploads
is updated to accomodate the new process.  Notably while the import_image
method left a breadcrumb in the snapshot description, it does not appear
that we are able to set the description when we import the snapshot.
Instead we need to examine the snapshot import task list to find the
intended tags for a given snapshot or image.

Change-Id: I9478c0050777bf35e1201395bd34b9d01b8d5795
2022-08-04 14:04:13 -07:00
James E. Blair 207d8ac63c AWS multi quota support
This adds support for AWS quotas that are specific to instance types.

The current quota support in AWS assumes only the "standard" instance types,
but AWS has several additional types with particular specialties (high memory,
GPU, etc).  This adds automatic support for those by encoding their service
quota codes (like 'L-1216C47A') into the QuotaInformation object.

QuotaInformation accepts not only cores, ram, and instances as resource
values, but now also accepts arbitraly keys such as 'L-1216C47A'.
Extra testing of QI is added to ensure we handle the arithmetic correctly
in cases where one or the other operand does not have a resource counter.

The statemachine drivers did not encode their resource information into
the ZK Node record, so tenant quota was not operating correctly.  This is
now fixed.

The AWS driver now accepts max_cores, _instances, and _ram values similar
to the OpenStack driver.  It additionally accepts max_resources which can
be used to specify limits for arbitrary quotas like 'L-1216C47A'.

The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A'
so that, for example, high memory nodes may be limited by tenant.

The mapping of instance types to quota is manually maintained, however,
AWS doesn't seem to add new instance types too often, and those it does are
highly specialized.  If a new instance type is not handled internally, the
driver will not be able to calculate expected quota usage, but will still
operate until the new type is added to the mapping.

Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36
2022-07-25 14:41:07 -07:00
James E. Blair d5b0dee642 AWS driver create/delete improvements
The default AWS rate limit is 2 instances/sec, but in practice, we
can achieve something like 0.6 instances/sec with the current code.
That's because the create instance REST API call itself takes more
than a second to return.  To achieve even the default AWS rate
(much less a potentially faster one which may be obtainable via
support request), we need to alter the approach.  This change does
the following:

* Paralellizes create API calls.  We create a threadpool with
  (typically) 8 workers to execute create instance calls in the
  background.  2 or 3 workers should be sufficient to meet the
  2/sec rate, more allows for the occasional longer execution time
  as well as a customized higher rate.  We max out at 8 to protect
  nodepool from too many threads.
* The state machine uses the new background create calls instead
  of synchronously creating instances.  This allows other state
  machines to progress further (ie, advance to ssh keyscan faster
  in the case of a rush of requests).
* Delete calls are batched.  They don't take as long as create calls,
  yet their existence at all uses up rate limiting slots which could
  be used for creating instances.  By batching deletes, we make
  more room for creates.
* A bug in the RateLimiter could cause it not to record the initial
  time and therefore avoid actually rate limiting.  This is fixed.
* The RateLimiter is now thread-safe.
* The default rate limit for AWS is changed to 2 requests/sec.
* Documentation for the 'rate' parameter for the AWS driver is added.
* Documentation for the 'rate' parameter for the Azure driver is
  corrected to describe the rate as requests/sec instead of delay
  between requests.

Change-Id: Ida2cbc59928e183eb7da275ff26d152eae784cfe
2022-06-22 13:28:58 -07:00
James E. Blair daa4e39a0d Fix default python paths in aws, azure, ibmvpc drivers
The python-path value should default to "auto" per documentation
and to match other drivers.  Correct that.

Change-Id: Ie8254e10d9c4d8ff8f8f298fac32140a18248293
2022-04-12 06:32:41 -07:00
James E. Blair 43678bf4c1 Update AWS driver to use statemachine framework
This updates the aws driver to use the statemachine framework which
should be able to scale to a much higher number of parallel operations
than the standard thread-per-node model.  It is also simpler and
easier to maintain.  Several new features are added to bring it to
parity with other drivers.

The unit tests are changed minimally so that they continue to serve
as regression tests for the new framework.  Following changes will
revise the tests and add new tests for the additional functionality.

Change-Id: I8968667f927c82641460debeccd04e0511eb86a9
2022-02-22 17:06:07 -08:00
James E. Blair d55ea477de Fix AWS driver equality check
A recent refactor of the config objects (which simplified the
repetitive __eq__ methods) missed updating the AWS driver.  This
could lead to infinite recursion (as the AWS driver explicitly
called super().__eq__ which itself called __eq__).

This updates the driver to use the new framework, and it also adds
a unit test which exercises it.

Change-Id: I3c612e2784de1ffd1642587ba6017e36bebd8d67
2021-06-23 12:07:25 -07:00
Albin Vass 0c84b7fa4e Add shell-type config
Ansible needs to know which shell type the node uses to operate
correctly, especially for ssh connections for windows nodes because
otherwise ansible defaults to trying bash.

Change-Id: I71abfefa57aaafd88f199be19ee7caa64efda538
2021-03-05 15:14:29 +01:00
Albin Vass 2d59dc461c aws: add support for attaching instance profiles
Change-Id: Ie338f5f9c8f88c7e5584bce02c9b0d081f068da7
2020-06-12 12:22:50 +02:00
Andy Ladjadj 5bae6272f4 add ebs-optimized support for aws provider
Change-Id: I1f6330a71b85f23e6fbe3abd636764e5f3b8a61d
2020-02-04 18:59:24 +01:00
Clément Mondion 49482e157c add tags support for aws provider
Change-Id: Ib871bfda41192a74ee02b0b3d2e422fde21f2801
2020-01-23 10:32:08 +01:00
Tobias Henkel 52f7d4fb62
Make public ip configurable in aws
When running nodepool against private cloud rooms it can be desirable
that the nodes don't get a public ip address. Let the user specify
this on pool level.

Change-Id: I3d636517837fd8a6593c12e4309372da5c062b06
2019-12-21 13:47:08 +01:00
Tobias Henkel 761a9ee00e
Support userdata for instances in aws
In some cases we need to be able to launch instances with custom
userdata also in aws.

Change-Id: I0891961f16bb3bd728622d3413bd185978d79324
2019-12-21 13:35:00 +01:00
Albin Vass ec9a532cdd 'keys' must be defined for host-key-checking: false
If host-key-checking is set to false the aws driver
fails with an UnboundLocalError

Change-Id: I91ec292a48e283f9fb8d60b944da8eaf1bec393b
2019-12-17 16:14:46 +01:00
Zuul 915be0a5be Merge "AWS driver: add ability to determine AMI id using filters" 2019-10-24 18:45:20 +00:00
Ian Wienand db87a0845f Set default python-path to "auto"
The "python-path" configuration option makes its way through to Zuul
where it sets the "ansible_interpreter_path" in the inventory.
Currently this defaults to "/usr/bin/python2" which is wrong for
Python 3-only distributions.

Ansible >=2.8 provides for automated discovery of the interpreter to
avoid runtime errors choosing an invalid interpreter [1].  Using this
should mean that "python-path" doesn't need to be explicitly for any
common case.  As more distributions become Python 3 only, this should
"do the right thing" without further configuration.

This switches the default python-path to "auto".  The dependent change
updates Zuul to accept this and use it when running with Ansible
>=2.8, or default back to "/usr/bin/python2" for earlier Ansible
versions.

Testing and documentation is updated, and a release note added.

[1] https://docs.ansible.com/ansible/2.8/reference_appendices/interpreter_discovery.html

Depends-On: https://review.opendev.org/682275
Change-Id: I02a1a618c8806b150049e91b644ec3c0cb826ba4
2019-10-17 09:17:50 +11:00
Kerby Ferris 661b40a4b7 AWS driver: add ability to determine AMI id using filters
Change-Id: I99263a5907f72d7993d7549ff8adf1dfeedd3b69
2019-10-08 12:06:46 -07:00
Clark Boylan c009b6aebe Set manage_images to false on aws
We don't have image management for aws so set the manage_images property
on aws providers to false.

This corrects a problem with min ready calculations checking that the
imgae is ready and doing so by checking for the pool_label diskimage
attribute.

Change-Id: I698f8cb1c6ac2969ce94830510fead8918f8a55e
2019-08-23 12:38:50 -07:00
Tristan Cacqueray 76aa62230c Add python-path option to node
This change adds a new python_path Node attribute so that zuul executor
can remove the default hard-coded ansible_python_interpreter.

Change-Id: Iddf2cc6b2df579636ec39b091edcfe85a4a4ed10
2019-05-07 02:22:45 +00:00
Monty Taylor 516e8cd176 Rename aws flavor-name to instance-type
The AWS term is instance-type, not flavor-name. Rename this while
we don't have a huge userbase.

While we're in there, rename a variable from image_name to image_id
since we use image_id everywhere else.

Change-Id: I1f7f16d2873982626d2434cf5ca1f6280adf739c
2019-02-06 17:09:36 +00:00
Clint Byrum 5c0ca26fea Remove unused fields from AWS driver
These fields were cargo-culted from another provider, but they are not
used in the actual AWS driver, and thus should invalidate the
configuration if specified.

Change-Id: Ic12b47eea1ab4b49e8c6746749f0a6d4fb322435
2019-02-04 15:36:03 -08:00
Tristan Cacqueray aa16b8b891 Amazon EC2 driver
This change adds an experimental AWS driver. It lacks some of the deeper
features of the openstack driver, such as quota management and image
building, but is highly functional for running tests on a static AMI.

Note that the test base had to be refactored to allow fixtures to be
customized in a more flexible way.

Change-Id: I313f9da435dfeb35591e37ad0bec921c8b5bc2b5
Co-Authored-By: Tristan Cacqueray <tdecacqu@redhat.com>
Co-Authored-By: David Moreau-Simard <dmsimard@redhat.com>
Co-AUthored-By: Clint Byrum <clint@fewbar.com>
2019-01-28 12:08:36 -08:00