nodepool

Commit Graph

Author	SHA1	Message	Date
James E. Blair	fc95724601	Fix leaked upload cleanup The cleanup routine for leaked image uploads based its detection on upload ids, but they are not unique except in the context of a provider and build. This meant that, for example, as long as there was an upload with id 0000000001 for any image build for the provider (very likely!) we would skip cleaning up any leaked uploads with id 0000000001. Correct this by using a key generated on build+upload (provider is implied because we only consider uploads for our current provider). Update the tests relevant to this code to exercise this condition. Change-Id: Ic68932b735d7439ca39e2fbfbe1f73c7942152d6	2024-03-09 13:46:17 -08:00
Zuul	392cf017c3	Merge "Add support for AWS IMDSv2"	2024-02-28 02:46:53 +00:00
Clark Boylan	e5c1790be7	Rollback to 1.28/stable microk8s in functional testing We use latest/stable by default which very recently updated to 1.29/stable. Unfortunately it appears there are issues [0] with this version on Debian Bookworm which also happens to be the platform we test on. Our jobs have been consistently failing in a manner that appears related to this issue. Update the job to collect logs so that we can better confirm this is the case and rollback to 1.28 which should be working. Also update the AWS tests to handle a recent moto release which requires us to use mock_aws rather than individual mock_* classes. [0] https://github.com/canonical/microk8s/issues/4361 Change-Id: I72310521bdabfc3e34a9f2e87ff80f6d7c27c180 Co-Authored-By: James E. Blair <jim@acmegating.com> Co-Authored-By: Jeremy Stanley <fungi@yuggoth.org>	2024-01-29 14:15:54 -08:00
James E. Blair	3f4fb008b0	Add support for AWS IMDSv2 This is an authenticated http metadata service which is typically available by default, but a more secure setup is to enforce its usage. This change adds the ability to do that for both instances and AMIs. Change-Id: Ia8554ff0baec260289da0574b92932b37ffe5f04	2024-01-24 15:11:35 -08:00
Simon Westphahl	2aeaee92f1	Ignore unrelated error labels in request handler Nodepool was declining node requests when other unrelated instance types of a provider were unavailable: Declining node request <NodeRequest {... 'node_types': ['ubuntu'], ... }> due to ['node type(s) [ubuntu-invalid] not available'] To fix this we will the check error labels against the requested labels before including them in the list of invalid node types. Change-Id: I7bbb3b813ca82baf80821a9e84cc10385ea95a01	2023-11-09 13:58:54 +01:00
James E. Blair	8669acfe6b	Use a state machine for ssh key scanning We currently use a threadpool executor to scan up to 10 nodes at a time for ssh keys. If they are slow to respond, that can create a bottleneck. To alleviate this, use a state machine model for managing the process, and drive each state machine from a single thread. We use select.epoll() to handle the potentially large number of connections that could be happening simultaneously. Note: the paramiko/ssh portion of this process spawns its own thread in the background (and always has). Since we are now allowing more keyscan processis in parallel, we could end up with an unbounded set of paramiko threads in the background. If this is a concern we may need to cap the number of requests handled simultaneously. Even if we do that, this will still result in far fewer threads than simply increasing the cap on the threadpool executor. Change-Id: I42b76f4c923fd9441fb705e7bffd6bc9ea7240b1	2023-11-04 08:54:20 -07:00
James E. Blair	7d7d81cd46	AWS: improve service quota handling The AWS API call to get the service quota has its own rate limit that is separate from EC2. It is not documented, but the defaults appear to be very small; experimentally it appear to be something like a bucket size of 30 tokens and a refill rate somewhere between 3 and 10 tokens per minute. This change moves the quota lookup calls to their own rate limiter so they are accounted for separately from other calls. We should configure that rate limiter with the new very low values, however, that would significantly slow startup since we need to issue serveral calls at once when we start; after that we are not sensitive to a delay. The API can handle a burst at startup (with a bucket size of 30) but our rate limiter doesn't have a burst option. Instead of cofiguring it properly, we will just configure it with the rate limit we use for normal operations (so that we at least have some delay), but otherwise, rely on caching so that we know that we won't actually exceed the rate limit. This change therefore also adds a Lazy Executor TTL cache to the operations with a timeout of 5 minutes. This means that we will issue bursts of requests every 5 minutes, and as long as the number of requests is less than the token replacement rate, we'll be fine. Because this cache is on the adapter, multiple pool workers will use the same cache. This will cause a reduction in API calls since currently there is only pool-worker level caching of nodepool quota information objects. When the 5 minute cache on the nodepool quota info object expires, we will now hit the adapter cache (with its own 5 minute timeout) rather than go directly to the API repeatedly for each pool worker. This does mean that quota changes may take between 5 and 10 minutes to appear in nodepool. The current code only looks up quota information for instance and volume types actually used. If that number is low, all is well, but if it is high, then we could potentially approach or exceed the token replacement rate. To make this more predictable, we will switch the API call to list all quotas instead of fetching only the ones we need. Due to pagination, this results in a total of 8 API calls as of writing; 5 for ec2 quotas and 3 for ebs. These are likely to grow over time, but very slowly. Taken all together, these changes mean that a single launcher should issue at most 8 quota service api requests every 5 minutes, which is below the lowest observed token replacement rate. Change-Id: Idb3fb114f5b8cda8a7b6d5edc9c011cb7261be9f	2023-10-17 14:36:37 -07:00
James E. Blair	40eb0fd8d4	Use ec2_client for instances and volumes There are two modes of use for the boto3 interface library for AWS: resource or client calls. The resource approach returns magic objects representing instances of each type of resource which have their own methods and can lazy-load relationships to other objects. The client approach is more low-level and returns dictionary representations of the HTTP results. In general in the AWS driver, we try to use resources when we can and fall back to client methods otherwise. When we construct our own AwsInstance objects (which translates what we get from AWS to what nodepool expects), we filled the "az" field by getting the availability zone from the subnet. The subnet is an object relationship that is lazy-loaded, which means that each time we retrieved an instance record, we issued another API call to look up the subnet (and this information is not cached). To correct this, we switch to getting the subnet from the placement field of the instance (which is more sensible anyway). To prevent future accidental lazy-loads of un-cached data, we switch instance and volume operations from the resource style of usage to the client. The result is a massive increase in performance (as the subnet lazy-load could take 0.1 seconds for each instance running, and we very frequently list all instances). Change-Id: I529318896fc8096bbd9dbdac60d1a29c3ac641b6	2023-10-17 14:36:37 -07:00
Zuul	76fb25d529	Merge "Handle invalid AWS instance in quota lookup"	2023-09-25 21:38:34 +00:00
James E. Blair	b3921d5c40	AWS: handle 'InvalidConversionTaskId.Malformed' error When we call the describe_import_image_tasks paginator with a task id specified and that task id does not exist, AWS can return a InvalidConversionTaskId.Malformed error instead of the expected empty list. Handle that case and add a test for it. Change-Id: I1a3a9297e123a50a4beeb7e18124b8cc17aa28bb	2023-09-05 16:09:26 -07:00
Zuul	785f7dcbc9	Merge "AWS: Add support for retrying image imports"	2023-08-28 18:43:56 +00:00
Zuul	77e2550326	Merge "Test if username is set for diskimage based nodes in AWS"	2023-08-24 08:54:51 +00:00
James E. Blair	c2d9c45655	AWS: Add support for retrying image imports AWS has limits on the number of image import tasks that can run simultaneously. In a busy system with large images, it would be better to wait until those limits clear rather than delete the uploaded s3 object and start over, uploading it again. To support this, we now detect that condition and optionally retry for a specified amount of time. The default remains to bail on the first error. Change-Id: I6aa7f79b2f73c4aa6743f11221907a731a82be34	2023-08-12 11:45:22 -07:00
Benedikt Loeffler	16ef0f8bad	Test if username is set for diskimage based nodes in AWS Change-Id: I4c13e95baf213831d02c31df0e91c82c2c86d3d8	2023-08-03 12:31:31 -07:00
James E. Blair	a602654482	Handle invalid AWS instance in quota lookup Early nodepool performed all cloud operations within the context of an accepted node request. This means that any disagreement between the nodepool configuration and the cloud (such as what instance types, images, networks, or other resources are actually available) would be detected within that context and the request would be marked as completed and failed. When we added tenant quota support, we also added the possibility of needing to interact with the cloud before accepting a request. Specifically, we may now ask the cloud what resources are needed for a given instance type (and volume, etc) before deciding whether to accept a request. If we raise an exception here, the launcher will simply loop indefinitely. To avoid this, we will add a new exception class to indicate a permanent configuration error which was detected at runtime. If AWS says an instance type doesn't exist when we try to calculate its quota, we mark it as permanently errored in the provider configuration, then return empty quota information back to the launcher. This allows the launcher to accept the request, but then immediately mark it as failed because the label isn't available. The state of this error is stored on the provider manager, so the typical corrective action of updating the configuration to correct the label config means that a new provider will be spawned with an empty error label list; the error state will be cleared and the launcher will try again. Finally, an extra exception handler is added to the main launcher loop so that if any other unhandled errors slip through, the request will be deferred and the launcher will continue processing requests. Change-Id: I9a5349203a337ab23159806762cb46c059fe4ac5	2023-07-18 13:51:13 -07:00
James E. Blair	acb6772c3a	Add AWS volume quota support Like the OpenStack driver, this automatically applies volume quota limits if specified in the label configuration. Change-Id: I71c1b95de08dc72cc777099952892de659d45d41	2023-07-17 15:17:50 -07:00
Zuul	63900e9bc4	Merge "Report leaked resource metrics in statemachine driver"	2023-05-02 23:29:19 +00:00
James E. Blair	d4f2c8b9e7	Report leaked resource metrics in statemachine driver The OpenStack driver reports some leaked metrics. Extend that in a generic way to all statemachine drivers. Doing so also adds some more metrics to the OpenStack driver. Change-Id: I97c01b54b576f922b201b28b117d34b5ee1a597d	2023-04-26 06:40:12 -07:00
Zuul	58b24d2203	Merge "Amazon EC2 Spot support"	2023-04-26 11:40:34 +00:00
Christian Mueller	36dbff84ba	Amazon EC2 Spot support This adds support for launching Amazon EC2 Spot instances (https://aws.amazon.com/ec2/spot/), which comes with huge cost saving opportunities. Amazon EC2 Spot instances are spare Amazon EC2 capacity, you can get with an discount of up to 90% compared to on-demand pricing. In contrast to on-demand instances, Spot instances can be relaimed with a 2 minute notification in advance (https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/spot-interruptions.html). When :attr:`providers.[aws].pools.labels.use-spot` is set to True, the AWS driver will launch Spot instances. If an instance get interrupted, it will be terminated and no replacement instance will be launched. Change-Id: I9868d014991d78e7b2421439403ae1371b33524c	2023-04-16 21:12:06 +02:00
James E. Blair	b0a40f0b47	Use image cache when launching nodes We consult ZooKeeper to determine the most recent image upload when we decide whether we should accept or decline a request. If we accept the request, we also consult it again for the same information when we start building the node. In both cases, we can use the cache to avoid what may potentially be (especially in the case of a large number of images or uploads) quite a lot of ZK requests. Our cache should be almost up to date (typically milliseconds, or at the worst, seconds behind), and the worst case is equivalent to what would happen if an image build took just a few seconds longer. The tradeoff is worth it. Similarly, when we create min-ready requests, we can also consult the cache. With those 3 changes, all references to getMostRecentImageUpload in Nodepool use the cache. The original un-cached method is kept as well, because there are an enormous number of references to it in the unit tests and they don't have caching enabled. In order to reduce the chances of races in many tests, the startup sequence is normalized to: 1) start the builder 2) wait for an image to be available 3) start the launcher 4) check that the image cache in the launcher matches what is actually in ZK This sequence (apart from #4) was already used by a minority of tests (mostly newer tests). Older tests have been updated. A helper method, startPool, implements #4 and additionally includes the wait_for_config method which was used by a random assortment of tests. Change-Id: Iac1ff8adfbdb8eb9a286929a59cf07cd0b4ac7ad	2023-04-10 15:57:01 -07:00
James E. Blair	fdc093a8de	Add import_image support to AWS In I9478c0050777bf35e1201395bd34b9d01b8d5795 we switched from using the import_image method to import_snapshot in the AWS driver. This method is faster and more like other drivers in Nodepool. However, some operating systems (such as Windows, RHEL or SLES) require licensing metadata associated with an AMI which is not available to be set when we register an AMI from a snapshot. For these systems, the only viable way to upload images is with the import_image method. This change restores the previous method as an option, but keeps the "snapshot" method as the default. Change-Id: I81daabebbc9dbe968d8aaf65e6b70f5cdfdd01bf	2023-01-30 20:25:56 -08:00
Christian von Schultz	a828513ae8	Fix AWS quota limits for vCPUs In the AWS adapter, when getting the quota for an instance type, set the quota for the AWS service quota code to be the number of vCPUs rather than the number of cores. The number of vCPUs is typically twice the number of cores. This fixes "VcpuLimitExceeded" errors from AWS. Change-Id: I880e6abb84b0527363893576057aa105a5a448a5	2022-12-14 14:13:47 +01:00
Zuul	d73d4da72c	Merge "Serve all paused handlers before unpausing"	2022-11-09 16:51:11 +00:00
James E. Blair	0673599abc	Fix AWS instance_profile tests for moto v4 Moto v4 has added support for iam instance profiles, which means our tests are now failing since we have two tests that request an instance be created with a profile association, but the profile did not exist in moto due to the lack of support. Now that support has been added, we create a dummy instance profile and use that for the association. We can now also validate that the association was made as expected. Change-Id: I2839b7e1536a6305e44678dc9b22c71f3eb55677	2022-11-08 07:39:13 -08:00
Simon Westphahl	0f1680be7e	Serve all paused handlers before unpausing The launcher implementation assumed that only one request handler will be paused at any given point in time. However, this is not true when e.g. the request handler accepts multiple requests that all run into a quota limit during launch. The consequence of this is that the pool is unpaused too early and we might accept other node requests until the provider is paused again. This could lead to a starvation of earlier paused handlers as they were fulfilled in a LIFO fashion. To fix this edge case we will store paused request handlers in a set and only unpause the provider when there are no paused handlers anymore. Paused handlers are now also run in priority order. Change-Id: Ia34e2844533ce9942d489838c4ce14a605d79287	2022-10-20 12:06:11 +02:00
James E. Blair	4ea824cfa9	Aws: add support for volume iops and throughput Users can request specific IOPS and throughput allocations from EC2. The availability and defaults vary for volume type, but IOPS are available for all volumes, and throughput is available on gp3 volumes. Change-Id: Icc7432d8ce1c3514bfe9d8fda20bd399b67ede7a	2022-10-14 07:08:30 -07:00
James E. Blair	6d3b5f3bab	Add missing cloud/region/az/host_id info to nodes To the greatest extent possible within the limitation of each provider, this adds cloud, region, az, and host_id to nodes. Each of AWS, Azure, GCE, IBMVPC have the cloud name hard-coded to a value that makes sense for each driver given that each of these are singleton clouds. Their region and az values are added as appropriate. The k8s, openshift, and openshiftpods all have their cloud names set to the k8s context name, which is the closest approximation of what the "cloud" attribute means in its existing usage in the OpenStack driver. If pods are launched, the host_id value is set to the k8s host node name, which is an approximation of the existing usage in the OpenStack driver (where it is typically an opaque uuid that uniquely identifies the hypervisor). Change-Id: I53765fc3914a84d2519f5d4dda4f8dc8feda72f2	2022-08-25 13:41:05 -07:00
James E. Blair	6320b06950	Add support for dynamic tags This allows users to create tags (or properties in the case of OpenStack) on instances using string interpolation values. The use case is to be able to add information about the tenant* which requested the instance to cloud-provider tags. * Note that ultimately Nodepool may not end up using a given node for the request which originally prompted its creation, so care should be taken when using information like this. The documentation notes that. This feature uses a new configuration attribute on the provider-label rather than the existing "tags" or "instance-properties" because existing values may not be safe for use as Python format strings (e.g., an existing value might be a JSON blob). This could be solved with YAML tags (like !unsafe) but the most sensible default for that would be to assume format strings and use a YAML tag to disable formatting, which doesn't help with our backwards-compatibility problem. Additionally, Nodepool configuration does not use YAML anchors (yet), so this would be a significant change that might affect people's use of external tools on the config file. Testing this was beyond the ability of the AWS test framework as written, so some redesign for how we handle patching boto-related methods is included. The new approach is simpler, more readable, and flexible in that it can better accomodate future changes. Change-Id: I5f1befa6e2f2625431523d8d94685f79426b6ae5	2022-08-23 11:06:55 -07:00
James E. Blair	916d62a374	Allow specifying diskimage metadata/tags For drivers that support tagging/metadata (openstack, aws, azure), Add or enhance support for supplying tags for uploaded diskimages. This allows users to set metadata on the global diskimage object which will then be used as default values for metadata on the provider diskimage values. The resulting merged dictionary forms the basis of metadata to be associated with the uploaded image. The changes needed to reconcile this for the three drivers mentioned above are: All: the diskimages[].meta key is added to supply the default values for provider metadata. OpenStack: provider diskimage metadata is already supported using providers[].diskimages[].meta, so no further changes are needed. AWS, Azure: provider diskimage tags are added using the key providers[].diskimages[].tags since these providers already use the "tags" nomenclature for instances. This results in the somewhat incongruous situation where we have diskimage "metadata" being combined with provider "tags", but it's either that or have images with "metadata" while we have instances with "tags", both of which are "tags" in EC2. The chosen approach has consistency within the driver. Change-Id: I30aadadf022af3aa97772011cda8dbae0113a3d8	2022-08-23 06:39:08 -07:00
James E. Blair	89cda5a1ba	AWS: Use snapshot instead of image import AWS recommends using the import_image method for importing diskimages into the system as AMIs, but there is a significant caveat with that: EC2 will attempt to boot an instance from the image and snapshot the instance. That process requires that the image match certain characteristics of already existing images supported by AWS, so only certain operating systems can be succesfully imported. If anything goes wrong with the import process, the errors are opaque since the temporary instance used for the import is inaccessible to the user. An alternative is to use the import_snapshot method which will produce a snapshot object in EC2, and then a new AMI can be "registered" and pointed to that snapshot. It's an extra step for the Nodepool builder, but it's simple and takes less time overall than waiting for EC2 to boot up the temporary instance. This is a much closer approximation to the import scheme used in other Nodepool drivers. I have successfully tested this method with a cirros image, which is notable for not being a supported operating system with the previous method. The caveats and instructions relating to setting up the Import/Export service roles still apply, so the documentation related to them remain. The method for reconciling missing tags for aborted image uploads is updated to accomodate the new process. Notably while the import_image method left a breadcrumb in the snapshot description, it does not appear that we are able to set the description when we import the snapshot. Instead we need to examine the snapshot import task list to find the intended tags for a given snapshot or image. Change-Id: I9478c0050777bf35e1201395bd34b9d01b8d5795	2022-08-04 14:04:13 -07:00
Zuul	123a32f922	Merge "AWS multi quota support"	2022-07-29 17:01:09 +00:00
James E. Blair	207d8ac63c	AWS multi quota support This adds support for AWS quotas that are specific to instance types. The current quota support in AWS assumes only the "standard" instance types, but AWS has several additional types with particular specialties (high memory, GPU, etc). This adds automatic support for those by encoding their service quota codes (like 'L-1216C47A') into the QuotaInformation object. QuotaInformation accepts not only cores, ram, and instances as resource values, but now also accepts arbitraly keys such as 'L-1216C47A'. Extra testing of QI is added to ensure we handle the arithmetic correctly in cases where one or the other operand does not have a resource counter. The statemachine drivers did not encode their resource information into the ZK Node record, so tenant quota was not operating correctly. This is now fixed. The AWS driver now accepts max_cores, _instances, and _ram values similar to the OpenStack driver. It additionally accepts max_resources which can be used to specify limits for arbitrary quotas like 'L-1216C47A'. The tenant quota system now also accepts arbitrary keys such as 'L-1216C47A' so that, for example, high memory nodes may be limited by tenant. The mapping of instance types to quota is manually maintained, however, AWS doesn't seem to add new instance types too often, and those it does are highly specialized. If a new instance type is not handled internally, the driver will not be able to calculate expected quota usage, but will still operate until the new type is added to the mapping. Change-Id: Iefdc8f3fb8249c61c43fe51b592f551e273f9c36	2022-07-25 14:41:07 -07:00
Zuul	e27ce0e8d1	Merge "Fix race in test_aws_resource_cleanup"	2022-07-01 03:05:18 +00:00
James E. Blair	025318b825	Fix race in test_aws_resource_cleanup If the test was slow enough that the image was not immediately deleted, we would fail the assertIsNone because image.state=='available'. Correct this by continuing to iterate in the wait loop while that's true. Change-Id: I6ee14c5737abe37d90d6b57442644085b08f30ab	2022-06-30 15:23:15 -07:00
James E. Blair	d5b0dee642	AWS driver create/delete improvements The default AWS rate limit is 2 instances/sec, but in practice, we can achieve something like 0.6 instances/sec with the current code. That's because the create instance REST API call itself takes more than a second to return. To achieve even the default AWS rate (much less a potentially faster one which may be obtainable via support request), we need to alter the approach. This change does the following: * Paralellizes create API calls. We create a threadpool with (typically) 8 workers to execute create instance calls in the background. 2 or 3 workers should be sufficient to meet the 2/sec rate, more allows for the occasional longer execution time as well as a customized higher rate. We max out at 8 to protect nodepool from too many threads. * The state machine uses the new background create calls instead of synchronously creating instances. This allows other state machines to progress further (ie, advance to ssh keyscan faster in the case of a rush of requests). * Delete calls are batched. They don't take as long as create calls, yet their existence at all uses up rate limiting slots which could be used for creating instances. By batching deletes, we make more room for creates. * A bug in the RateLimiter could cause it not to record the initial time and therefore avoid actually rate limiting. This is fixed. * The RateLimiter is now thread-safe. * The default rate limit for AWS is changed to 2 requests/sec. * Documentation for the 'rate' parameter for the AWS driver is added. * Documentation for the 'rate' parameter for the Azure driver is corrected to describe the rate as requests/sec instead of delay between requests. Change-Id: Ida2cbc59928e183eb7da275ff26d152eae784cfe	2022-06-22 13:28:58 -07:00
Zuul	a4acb5644e	Merge "Use Zuul-style ZooKeeper connections"	2022-05-23 22:56:54 +00:00
James E. Blair	10df93540f	Use Zuul-style ZooKeeper connections We have made many improvements to connection handling in Zuul. Bring those back to Nodepool by copying over the zuul/zk directory which has our base ZK connection classes. This will enable us to bring other Zuul classes over, such as the component registry. The existing connection-related code is removed and the remaining model-style code is moved to nodepool.zk.zookeeper. Almost every file imported the model as nodepool.zk, so import adjustments are made to compensate while keeping the code more or less as-is. Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3	2022-05-23 07:40:20 -07:00
Zuul	c1cacfc076	Merge "AWS tests: cleanup image deletion checking"	2022-05-19 16:26:17 +00:00
Ian Wienand	e1461659de	AWS tests: cleanup image deletion checking This is a follow-on to I3279c3b5cb8cf26d390835fd0a7049bc43ec40b5 As discussed in the referenced issue, the blank return and AttributeError here is the correct way to determine a recently deleted image. We don't need to catch the ClientError as that won't be raised any more. Getting a value for the state would indicate it wasn't deleted. This bumps the moto base requirement to ensure we have this behaviour. Change-Id: I2d5b0ccb9802aa0d4c81555a17f40fe8b8595ebd	2022-04-28 14:16:16 +10:00
Zuul	df836a6744	Merge "Delete AWS image builds"	2022-04-27 22:58:24 +00:00
James E. Blair	eb7f58ce0e	Delete AWS image builds The AWS driver adapter was not actually deleting the image build from AWS when the builder requested it (it was simply missing the implementation of the method). This went unnoticed because as long as a launcher is running, it will act to clean up leaked images. But it's still better to have the builder do so synchronously. Add a test which verifies the behavior. Change-Id: I0bea930847ded45574bf53ae098b9926d4924107	2022-04-27 11:59:54 -07:00
Ian Wienand	8ffd70776a	AWS tests : catch AttributeError raised by latest moto release Per the notes inline, the recent 3.1.6 moto release has changed the way deleted images are returned. Catching the exception we now see works, and we can revisit if we find better solutions. Change-Id: I3279c3b5cb8cf26d390835fd0a7049bc43ec40b5	2022-04-27 14:18:53 +10:00
James E. Blair	daa4e39a0d	Fix default python paths in aws, azure, ibmvpc drivers The python-path value should default to "auto" per documentation and to match other drivers. Correct that. Change-Id: Ie8254e10d9c4d8ff8f8f298fac32140a18248293	2022-04-12 06:32:41 -07:00
James E. Blair	46e130fe1a	Add more debug info to AWS driver These changes are all in service of being able to better understand AWS driver log messages: * Use annotated loggers in the statemachine provider framework so that we see the request, node, and provider information * Have the statemachine framework pass annotated loggers to the state machines themselves so that the above information is available for log messages on individual API calls * Add optional performance information to the rate limit handler (delay and API call duration) * Add some additional log entries to the AWS adapter Also: * Suppress boto logging by default in unit tests (it is verbose and usually not helpful) * Add coverage of node deletion in the AWS driver tests Change-Id: I0e6b4ad72d1af7f776da73c5dd2a50b40f60e4a2	2022-04-11 10:14:20 -07:00
James E. Blair	348fd17e73	Add additional tests to the aws driver This adds several new tests: This exercises the new ipv6 functionality. The boto3 mocks do not support assigning an ipv6 address on creation, so we adopt a unit-test style approach and verify the calling side and our processing of the results separately. This uses the patching scheme that we just removed in the previous commit, but alters it to simply record the call arguments so we can validate them outside the patched method. This adds a diskimage upload test to the AWS driver. Because moto doesn't support the create_image method, we need to add a fake for that. Boto's practice of creating proxy objects makes that hard to do, so some helper methods are added on the adapter class to make it easier for the tests to override. The responses in the fake are based on a recorded session. This adds a test of leaked resource cleanup, including the automatic tagging of untagged resources from image import tasks. Change-Id: I0626061b246e9c52b08c49394d4d22b46beeff7a	2022-03-16 16:12:42 -07:00
James E. Blair	89b3d6a4df	Refactor AWS driver tests This makes the AWS driver tests more modular, with minimal changes. This will allow us to add more tests without extending the current single-function-which-performs-all-tests. It splits the config into multiple files, one for each distinct pool configuration. This means that tests only run one pool thread (compared to the current situation where every test runs every pool). Whatever defect in moto which appeared to require patching the network methods to verify the call signature appears to have been rectified, so that is removed. The two tags tests have been combined into one. Private IP validation is improved so that we verify the values set on the nodepool node rather than the mock call args. There was no test that verified the EBS-optimised setting, so that has been added. Change-Id: I023ec41f015c2fcb20835fc0e38714149944de84	2022-02-22 17:06:10 -08:00
James E. Blair	43678bf4c1	Update AWS driver to use statemachine framework This updates the aws driver to use the statemachine framework which should be able to scale to a much higher number of parallel operations than the standard thread-per-node model. It is also simpler and easier to maintain. Several new features are added to bring it to parity with other drivers. The unit tests are changed minimally so that they continue to serve as regression tests for the new framework. Following changes will revise the tests and add new tests for the additional functionality. Change-Id: I8968667f927c82641460debeccd04e0511eb86a9	2022-02-22 17:06:07 -08:00
James E. Blair	9bcc046ffc	Add QuotaSupport to drivers that don't have it This adds QuotaSupport to all the drivers that don't have it, and also updates their tests so there is at least one test which exercises the new tenant quota feature. Since this is expected to work across all drivers/providers/etc, we should start including at least rudimentary quota support in every driver. Change-Id: I891ade226ba588ecdda835b143b7897bb4425bd8	2022-01-27 10:11:01 -08:00
James E. Blair	d55ea477de	Fix AWS driver equality check A recent refactor of the config objects (which simplified the repetitive __eq__ methods) missed updating the AWS driver. This could lead to infinite recursion (as the AWS driver explicitly called super().__eq__ which itself called __eq__). This updates the driver to use the new framework, and it also adds a unit test which exercises it. Change-Id: I3c612e2784de1ffd1642587ba6017e36bebd8d67	2021-06-23 12:07:25 -07:00

1 2

68 Commits