nodepool

Commit Graph

Author	SHA1	Message	Date
Simon Westphahl	3c71fc9f4b	Use thread pool executor for AWS API requests So far we've cached most of the AWS API listings (instances, volumes, AMIs, snapshots, objects) but with refreshes happening synchronously. Since some of those methods are used as part of other methods during request handling we make them asynchronous. Change-Id: I22403699ebb39f3e4dcce778efaeb09328acd932	2023-10-17 14:36:37 -07:00
Zuul	6dde5c55cb	Merge "Add ZK cache stats"	2023-08-14 21:00:10 +00:00
James E. Blair	07c83f555d	Add ZK cache stats To observe the performance of the ZK connection and the new tree caches, add some statsd metrics for each of these. This will let us monitor queue size over time. Also, update the assertReportedStat method to output all received stats if the expected stat was not found (like Zuul). Change-Id: Ia7e1e0980fdc34007f80371ee0a77d4478948518 Depends-On: https://review.opendev.org/886552	2023-08-03 10:27:25 -07:00
James E. Blair	4ef3ebade8	Update references of build "number" to "id" This follows the previous change and is intended to have little or no behavior changes (only a few unit tests are updated to use different placeholder values). It updates all textual references of build numbers to build ids to better reflect that they are UUIDs instead of integers. Change-Id: I04b5eec732918f5b9b712f8caab2ea4ec90e9a9f	2023-08-02 11:18:15 -07:00
James E. Blair	3815cce7aa	Change image ID from int sequence to UUID When we export and import image data (for backup/restore purposes), we need to reset the ZK sequence counter for image builds in order to avoid collisions. The only way we can do that is to create and then delete a large number of znodes. Some sites (including OpenDev) have sequence numbers that are in the hundreds of thousands. To avoid this time-consuming operation (which is only intended to be run to restore from backup -- when operators are already under additional stress!), this change switches the build IDs from integer sequences to UUIDs. This avoids the problem with collisions after import (at least, to the degree that UUIDs avoid collisions). The actual change is fairly simple, but many unit tests need to be updated. Since the change is user-visible in the command output (image lists, etc), a release note is added. A related change which updates all of the textual references of build "number" to build "id" follows this one for clarity and ease of review. Change-Id: Ie7c68b094bc9733914a808756eeee8b62f696713	2023-08-02 11:18:15 -07:00
James E. Blair	066699f88a	Use low-level OpenStack SDK calls for server listing The OpenStack SDK performs a lot of processing on the JSON data returned by nova, and on large server lists, this can dwarf the actual time needed to receive and parse the JSON. Nodepool uses very little of this information, so let's use the keystoneauth session to get a simple JSON list. The Server object that SDK normally returns is a hybrid object that provides both attributes and dictionary keys. One method that we call has some lingering references to accessors, so we create a UserDict subclass to handle those. Nodepool-internal references are updated from attributes to dictionary keys. Change-Id: Iecc5976858e8d2ee6894a521f6a30f10ae9c6177	2023-07-25 11:29:25 -07:00
James E. Blair	9d07d26f51	Move statemachine node init into TPE This moves the node initialization and lock from the assignHandlers thread to a new threadpool executor. There are several ZK calls that happen in sequence as part of this, and if we move them out of the assignHandlers thread we can increase overall throughput. Change-Id: I67a32eed4102ab6ff56b1c21a65fe7dd071448e5	2023-05-16 20:42:49 -07:00
James E. Blair	b0a40f0b47	Use image cache when launching nodes We consult ZooKeeper to determine the most recent image upload when we decide whether we should accept or decline a request. If we accept the request, we also consult it again for the same information when we start building the node. In both cases, we can use the cache to avoid what may potentially be (especially in the case of a large number of images or uploads) quite a lot of ZK requests. Our cache should be almost up to date (typically milliseconds, or at the worst, seconds behind), and the worst case is equivalent to what would happen if an image build took just a few seconds longer. The tradeoff is worth it. Similarly, when we create min-ready requests, we can also consult the cache. With those 3 changes, all references to getMostRecentImageUpload in Nodepool use the cache. The original un-cached method is kept as well, because there are an enormous number of references to it in the unit tests and they don't have caching enabled. In order to reduce the chances of races in many tests, the startup sequence is normalized to: 1) start the builder 2) wait for an image to be available 3) start the launcher 4) check that the image cache in the launcher matches what is actually in ZK This sequence (apart from #4) was already used by a minority of tests (mostly newer tests). Older tests have been updated. A helper method, startPool, implements #4 and additionally includes the wait_for_config method which was used by a random assortment of tests. Change-Id: Iac1ff8adfbdb8eb9a286929a59cf07cd0b4ac7ad	2023-04-10 15:57:01 -07:00
James E. Blair	be3edd3e17	Convert openstack driver to statemachine This updates the OpenStack driver to use the statemachine framework. The goal is to revise all remaining drivers to use the statemachine framework for two reasons: 1) We can dramatically reduce the number of threads in Nodepool which is our biggest scaling bottleneck. The OpenStack driver already includes some work in that direction, but in a way that is unique to it and not easily shared by other drivers. The statemachine framework is an extension of that idea implemented so that every driver can use it. This change further reduces the number of threads needed even for the openstack driver. 2) By unifying all the drivers with a simple interface, we can prepare to move them into Zuul. There are a few updates to the statemachine framework to accomodate some features that only the OpenStack driver used to date. A number of tests need slight alteration since the openstack driver is the basis of the "fake" driver used for tests. Change-Id: Ie59a4e9f09990622b192ad840d9c948db717cce2	2023-01-10 10:30:14 -08:00
Zuul	9dd883107a	Merge "Add hold command to disable nodes"	2022-11-30 20:05:41 +00:00
mbecker	1658aa9851	Add hold command to disable nodes This allows nodes to be set in an idle state so that they will not have jobs scheduled while e.g. maintenance tasks are performed. This is probably most useful for static nodes. Change-Id: Iebc6b909f370fca11fab2be0b8805d4daef33afe	2022-10-13 12:43:34 +02:00
James E. Blair	08fdeed241	Add "slots" to static node driver Add persistent slot numbers for static nodes. This facilitates avoiding workspace collisions on nodes with max-parallel-jobs > 1. Change-Id: I30bbfc79a60b9e15f1255ad001a879521a181294	2022-10-11 07:02:53 -07:00
James E. Blair	6320b06950	Add support for dynamic tags This allows users to create tags (or properties in the case of OpenStack) on instances using string interpolation values. The use case is to be able to add information about the tenant* which requested the instance to cloud-provider tags. * Note that ultimately Nodepool may not end up using a given node for the request which originally prompted its creation, so care should be taken when using information like this. The documentation notes that. This feature uses a new configuration attribute on the provider-label rather than the existing "tags" or "instance-properties" because existing values may not be safe for use as Python format strings (e.g., an existing value might be a JSON blob). This could be solved with YAML tags (like !unsafe) but the most sensible default for that would be to assume format strings and use a YAML tag to disable formatting, which doesn't help with our backwards-compatibility problem. Additionally, Nodepool configuration does not use YAML anchors (yet), so this would be a significant change that might affect people's use of external tools on the config file. Testing this was beyond the ability of the AWS test framework as written, so some redesign for how we handle patching boto-related methods is included. The new approach is simpler, more readable, and flexible in that it can better accomodate future changes. Change-Id: I5f1befa6e2f2625431523d8d94685f79426b6ae5	2022-08-23 11:06:55 -07:00
James E. Blair	6a56940275	Fix race with two builders deleting images In a situation with multiple builders, each configured with different providers, it is possible for one builder to delete the ZK ImageBuild record for a build from another builder between the time that the build is completed but before the first upload starts. This is because every builder looks for images to delete from ZK. It keeps the 2 most recent ready images (this should normally cover the time period between a build and upload), unless the image is not configured for any provider this builder knows about. This is where the disjoint providers come into play -- builder1 in our scenario is not expected to have a configuration for provider2. To correct this, we adjust this check so that the only time we bypass the 2-most-recent-ready-images check is if the diskimage is not configured at all. That means that we still expect all builders to have a "diskimage" entry for every image, but we don't need those to be configured for any providers which this builder is not expected to handle. Change-Id: Ic2fefda293fa0bcbc98ee7313198b37df0576299	2022-07-25 13:06:25 -07:00
James E. Blair	7bbdfdc9fd	Update ZooKeeper class connection methods This updates the ZooKeeper class to inherit from ZooKeeperBase and utilize its connection methods. It also moves the connection loss detection used by the builder to be more localized and removes unused methods. Change-Id: I6c9dbe17976560bc024f74cd31bdb6305d51168d	2022-06-29 07:46:34 -07:00
James E. Blair	cacef76d3a	Avoid collisions after ZK image data import When image data are imported, if there are holes in the sequence numbers, ZooKeeper may register a collision after nodepool-builder builds or uploads a new image. This is because ZooKeeper stores a sequence node counter in the parent node, and we lose that information when exporting/importing. Newly built images can end up with the same sequence numbers as imported images. To avoid this, re-create missing sequence nodes so that the import state more closely matches the export state. Change-Id: I0b96ebecc53dcf47324b8a009af749a3c04e574c	2022-06-20 13:00:05 -07:00
Zuul	492f6d5216	Merge "Add the component registry from Zuul"	2022-05-24 01:02:26 +00:00
Zuul	a4acb5644e	Merge "Use Zuul-style ZooKeeper connections"	2022-05-23 22:56:54 +00:00
James E. Blair	a612aa603c	Add the component registry from Zuul This uses a cache and lets us update metadata about components and act on changes quickly (as compared to the current launcher registry which doesn't have provision for live updates). This removes the launcher registry, so operators should take care to update all launchers within a short period of time since the functionality to yield to a specific provider depends on it. Change-Id: I6409db0edf022d711f4e825e2b3eb487e7a79922	2022-05-23 07:41:27 -07:00
James E. Blair	10df93540f	Use Zuul-style ZooKeeper connections We have made many improvements to connection handling in Zuul. Bring those back to Nodepool by copying over the zuul/zk directory which has our base ZK connection classes. This will enable us to bring other Zuul classes over, such as the component registry. The existing connection-related code is removed and the remaining model-style code is moved to nodepool.zk.zookeeper. Almost every file imported the model as nodepool.zk, so import adjustments are made to compensate while keeping the code more or less as-is. Change-Id: I9f793d7bbad573cb881dfcfdf11e3013e0f8e4a3	2022-05-23 07:40:20 -07:00
Joshua Watt	2c632af426	Do not reset quota cache timestamp when invalid The quota cache may not be a valid dictionary when invalidateQuotaCache() is called (e.g. when 'ignore-provider-quota' is used in OpenStack). In that case, don't attempt to treat the None as a dictionary as this raises a TypeError exception. This bug was preventing Quota errors from OpenStack from causing nodepool to retry the node request when ignore-provider-quota is True, because the OpenStack handler calles invalidateQuotaCache() before raising the QuotaException. Since invalidateQuotaCache() was raising TypeError, it prevented the QuotaException from being raised and the node allocation was outright failed. A test has been added to verify that nodepool and OpenStack will now retry node allocations as intended. This fixes that bug, but does change the behavior of OpenStack when ignore-provider-quota is True and it returns a Quota error. Change-Id: I1916c56c4f07c6a5d53ce82f4c1bb32bddbd7d63 Signed-off-by: Joshua Watt <JPEWhacker@gmail.com>	2022-05-10 15:04:25 -05:00
James E. Blair	46e130fe1a	Add more debug info to AWS driver These changes are all in service of being able to better understand AWS driver log messages: * Use annotated loggers in the statemachine provider framework so that we see the request, node, and provider information * Have the statemachine framework pass annotated loggers to the state machines themselves so that the above information is available for log messages on individual API calls * Add optional performance information to the rate limit handler (delay and API call duration) * Add some additional log entries to the AWS adapter Also: * Suppress boto logging by default in unit tests (it is verbose and usually not helpful) * Add coverage of node deletion in the AWS driver tests Change-Id: I0e6b4ad72d1af7f776da73c5dd2a50b40f60e4a2	2022-04-11 10:14:20 -07:00
James E. Blair	0b1fa1d57d	Add commands to export/import image data from ZK Change-Id: Id1ac6403f4fe80059b90900c519e56bca7dee0a0	2021-08-24 10:28:39 -07:00
James E. Blair	91804a5e16	Azure: switch to Azul The Azure SDK for Python uses threads to manage async operations. Every time a virtual machine is created, a new thread is spawned to wait for it to finish (whether we actually end up polling it or not). This will cause the Azure driver to have significant scalability limits compared to other drivers, possibly limiting the number of simultaneous nodes to 50% compared to others. To address this, switch to using a very simple requests-based REST client I'm calling Azul. The consistency of the Azure API makes this simple. As a bonus, we can use the excellent Azure REST API documentation directly, rather that mapping attribute names through the Python SDK (which has subtle differences). A new fake Azure test fixture is also created in order to make the current unit test a more thorough exercise of the code. Finally, the "zuul-private-key" attribute is misnamed since we have a policy of a one-way dependency from Zuul -> Nodepool. It's name is updated to match the GCE driver ("key") and moved to the cloud-image section so that different images may be given different keys. Change-Id: I87bfa65733b2a71b294ebe2cf0d3404d0e4333c5	2021-03-08 14:58:31 -08:00
Zuul	74d299ec01	Merge "Offload waiting for server creation/deletion"	2021-03-06 06:09:49 +00:00
James E. Blair	4c5fa46540	Require TLS Require TLS Zookeeper connections before making the 4.0 release. Change-Id: I69acdcec0deddfdd191f094f13627ec1618142af Depends-On: https://review.opendev.org/776696	2021-02-19 18:42:33 +00:00
Tobias Henkel	2e59f7b0b3	Offload waiting for server creation/deletion Currently nodepool has one thread per server creation or deletion. Each of those waits for the cloud by regularly getting the server list and checking if their instance is active or gone. On a busy nodepool this leads to severe thread contention when the server list gets large and/or there are many parallel creations/deletions in progress. This can be improved by offloading the waiting to a single thread that regularly retrieves the server list and compares that to the list of waiting server creates/deletes. The calling threads are then waiting until the central thread wakes them up to proceed their task. The waiting threads are waiting for the event outside of the GIL and thus are not contributing to the thread contention problem anymore. An alternative approach would be to redesign the threading to be less threaded but this would be a much more complex redesign. Thus this change keeps the many threads approach but makes them wait much more lightweight which shows a substantial improvement during load testing in a test environment. Change-Id: I5525f2558a4f08a455f72e6b5479f27684471dc7	2021-02-16 15:37:57 +01:00
Clark Boylan	6276562939	Use iterate_timeout in test waits This ensures that we don't wait forever for tests to complete tasks. This is particularly useful if you've disabled the global test timeout. Change-Id: I0141e62826c3594ed20605cac25e39091d1514e2	2020-01-14 08:25:09 -08:00
Zuul	0a010d94a1	Merge "Fix builder shutdown race in tests"	2019-10-15 15:24:27 +00:00
Ian Wienand	ddbcf1b07d	Validate openstack provider pool labels have top-level labels We broke nodepool configuration with I3795fee1530045363e3f629f0793cbe6e95c23ca by not having the labels defined in the OpenStack provider in the top-level label list. The added check here would have found such a case. The validate() function is reworked slightly; previously it would return various exceptions from the tools it was calling (YAML, voluptuous, etc.). Now we have more testing (and I'd imagine we could do even more, similar vaildations too) we'd have to keep adding exception types. Just make the function return a value; this also makes sure the regular exit paths are taken from the caller in nodepoolcmd.py, rather than dying with an exception at whatever point. A unit test is added. Co-Authored-By: Mohammed Naser <mnaser@vexxhost.com> Change-Id: I5455f5d7eb07abea34c11a3026d630dee62f2185	2019-10-15 15:32:32 +11:00
David Shrewsbury	e732fec5bf	Fix builder shutdown race in tests The builder intentionally does not attempt to shutdown the uploader threads since that could take an unreasonable amount of time. This causes a race in our tests where we can shutdown the ZooKeeper connection while the upload thread is still in progress, which can cause the test to fail with a ZooKeeper error. This adds uploader thread cleanup for the builder used in tests. Change-Id: I25d4b52e17501e5dc6543adef585dd3b86bd70f9	2019-10-10 15:30:35 -04:00
David Shrewsbury	5c605b3240	Reduce upload threads in tests from 4 to 1 Only a single test actually depends on having more than a single upload thread active, so this is just wasteful. Reduce the default to 1 and add an option to useBuilder() that tests may use to alter the value. Change-Id: I07ec96000a81153b51b79bfb0daee1586491bcc5	2019-09-18 15:39:12 -04:00
Ian Wienand	9367cf8ed8	Add a dib-cmd option for diskimages This change allows you to specify a dib-cmd parameter for disk images, which overrides the default call to "disk-image-create". This allows you to essentially decide the disk-image-create binary to be called for each disk image configured. It is inspired by a couple of things: The "--fake" argument to nodepool-builder has always been a bit of a wart; a case of testing-only functionality leaking across into the production code. It would be clearer if the tests used exposed methods to configure themselves to use the fake builder. Because disk-image-create is called from the $PATH, it makes it more difficult to use nodepool from a virtualenv. You can not just run "nodepool-builder"; you have to ". activate" the virtualenv before running the daemon so that the path is set to find the virtualenv disk-image-create. In addressing activation issues by automatically choosing the in-virtualenv binary in Ie0e24fa67b948a294aa46f8164b077c8670b4025, it was pointed out that others are already using wrappers in various ways where preferring the co-installed virtualenv version would break. With this, such users can ensure they call the "disk-image-create" binary they want. We can then make a change to prefer the co-installed version without fear of breaking. In theory, there's no reason why a totally separate "/custom/venv/bin/disk-image-create" would not be valid if you required a customised dib for some reason for just one image. This is not currently possible, even modulo PATH hacks, etc., all images will use the same binary to build. It is for this flexibility I think this is best at the diskimage level, rather than as, say a global setting for the whole builder instance. Thus add a dib-cmd option for diskimages. In the testing case, this points to the fake-image-create script, and the --fake command-line option and related bits are removed. It should have no backwards compatibility effects; documentation and a release note is added. Change-Id: I6677e11823df72f8c69973c83039a987b67eb2af	2019-08-22 10:09:00 +10:00
Tobias Henkel	4131d7da59	Cleanup kube_config temp files between test runs In my local tests running tox the tmp files of each test case get deleted after the run. However kube_config maintains a static list of temporary files it knows about and tries to re-use them in subsequent test runs which causes the test to fail [1]. Fix this by telling kube_config to cleanup its temporary files in the cleanup phase. [1] Trace Traceback (most recent call last): File "/home/tobias/src/nodepool/nodepool/tests/unit/test_builder.py", line 239, in test_image_rotation_invalid_external_name build001, image001 = self._test_image_rebuild_age(expire=172800) File "/home/tobias/src/nodepool/nodepool/tests/unit/test_builder.py", line 186, in _test_image_rebuild_age self.useBuilder(configfile) File "/home/tobias/src/nodepool/nodepool/tests/__init__.py", line 539, in useBuilder BuilderFixture(configfile, cleanup_interval, securefile) File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 756, in useFixture reraise(*exc_info) File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/testtools/_compat3x.py", line 16, in reraise raise exc_obj.with_traceback(exc_tb) File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/testtools/testcase.py", line 731, in useFixture fixture.setUp() File "/home/tobias/src/nodepool/nodepool/tests/__init__.py", line 318, in setUp self.builder.start() File "/home/tobias/src/nodepool/nodepool/builder.py", line 1304, in start self._config = self._getAndValidateConfig() File "/home/tobias/src/nodepool/nodepool/builder.py", line 1279, in _getAndValidateConfig config = nodepool_config.loadConfig(self._config_path) File "/home/tobias/src/nodepool/nodepool/config.py", line 246, in loadConfig driver.reset() File "/home/tobias/src/nodepool/nodepool/driver/openshift/__init__.py", line 29, in reset config.load_kube_config(persist_config=True) File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 540, in load_kube_config loader.load_and_set(config) File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 422, in load_and_set self._load_cluster_info() File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 385, in _load_cluster_info file_base_path=self._config_base_path).as_file() File "/home/tobias/src/nodepool/.tox/py37/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 112, in as_file raise ConfigException("File does not exists: %s" % self._file) kubernetes.config.config_exception.ConfigException: File does not exists: /tmp/tmplafutg0j/tmpmiti10bn Ran 2 tests in 4.524s (+0.175s) FAILED (id=20, failures=1) Change-Id: Idce8ca9bed49162874af24b224e573121e250385	2019-05-04 11:08:40 +02:00
David Shrewsbury	fa2d4bd17c	Fix for image build leaks If, during a long DIB image build, we lose the ZooKeeper session, it's likely that the CleanupWorker thread could have run and removed the ZK record for the build (its state would be BUILDING and unlocked, indicating something went wrong). In that scenario, when the DIB process finishes (possibly writing out DIB files), it will never get cleaned up since the ZK record would now be gone. If we fail to update the ZK record at the end of the build, just delete the leaked DIB files immediately after the build. Change-Id: I5cb58318efe51b5b0c3413b7a01f02a50215a8b6	2019-04-01 15:44:31 -04:00
Zuul	280cd5937d	Merge "Revert "Revert "Add a timeout for the image build"""	2019-02-06 13:16:06 +00:00
David Shrewsbury	890ea4975e	Revert "Revert "Add a timeout for the image build"" This reverts commit `ccf40a462a`. The previous version would not work properly when daemonized because there was no stdout. This version maintains stdout and uses select/poll with non-blocking stdout to capture the output to a log file. Depends-On: https://review.openstack.org/634266 Change-Id: I7f0617b91e071294fe6051d14475ead1d7df56b7	2019-01-31 11:36:47 -05:00
Tristan Cacqueray	aa16b8b891	Amazon EC2 driver This change adds an experimental AWS driver. It lacks some of the deeper features of the openstack driver, such as quota management and image building, but is highly functional for running tests on a static AMI. Note that the test base had to be refactored to allow fixtures to be customized in a more flexible way. Change-Id: I313f9da435dfeb35591e37ad0bec921c8b5bc2b5 Co-Authored-By: Tristan Cacqueray <tdecacqu@redhat.com> Co-Authored-By: David Moreau-Simard <dmsimard@redhat.com> Co-AUthored-By: Clint Byrum <clint@fewbar.com>	2019-01-28 12:08:36 -08:00
Zuul	f2c155821c	Merge "Revert "Add a timeout for the image build""	2019-01-25 22:37:34 +00:00
David Shrewsbury	ccf40a462a	Revert "Add a timeout for the image build" This reverts commit `7225354ec0`. The disk-image-create command does not appear to be starting. Change-Id: I81abe25a253a385cae08a57561129a678546f18f	2019-01-25 17:36:31 +00:00
Zuul	26c57ee5a9	Merge "Add a timeout for the image build"	2019-01-24 16:15:32 +00:00
David Shrewsbury	7225354ec0	Add a timeout for the image build A builder thread can wedge if the build process wedges. Add a timeout to the subprocess. Since it was the call to readline() that would block, we change the process to have DIB write directly to the log. This allows us to set a timeout in the Popen.wait() call. And we kill the dib subprocess, as well. The timeout value can be controlled in the diskimage configuration and defaults to 8 hours. Change-Id: I188e8a74dc39b55a4b50ade5c1a96832fea76a7d	2019-01-23 16:27:19 -05:00
Tristan Cacqueray	c1378c4407	Implement an OpenShift resource provider This change implements an OpenShift resource provider. The driver currently supports project request and pod request to enable both containers as machine and native containers workflow. Depends-On: https://review.openstack.org/608610 Change-Id: Id3770f2b22b80c2e3666b9ae5e1b2fc8092ed67c	2019-01-10 05:05:46 +00:00
Tobias Henkel	64487baef0	Asynchronously update node statistics We currently updarte the node statistics on every node launch or delete. This cannot use caching at the moment because when the statistics are updated we might end up pushing slightly outdated data. If then there is no further update for a longer time we end up with broken gauges. We already get update events from the node cache so we can use that to centrally trigger node statistics updates. This is combined with leader election so there is only a single launcher that keeps the statistics up to date. This will ensure that the statistics are not cluttered because of several launchers pushing their own slightly different view into the stats. As a side effect this reduces the runtime of a test that creates 200 nodes from 100s to 70s on my local machine. Change-Id: I77c6edc1db45b5b45be1812cf19eea66fdfab014	2018-11-29 16:48:30 +01:00
Tobias Henkel	9d77f05d8e	Only setup zNode caches in launcher We currently only need to setup the zNode caches in the launcher. Within the commandline client and the builders this is just unneccessary work. Change-Id: I03aa2a11b75cab3932e4b45c5e964811a7e0b3d4	2018-11-26 20:13:39 +01:00
Ian Wienand	cd9aa75640	Use pipelines for stats keys Pipelines buffer stats and then send them out in more reasonable sized chunks, helping to avoid small UDP packets going missing in a flood of stats. Use this in stats.py. This needs a slight change to the assertedStats handler to extract the combined stats. This function is ported from Zuul where we updated to handle pipeline stats (Id4f6f5a6cd66581a81299ed5c67a5c49c95c9b52) so it is not really new code. Change-Id: I3f68450c7164d1cf0f1f57f9a31e5dca2f72bc43	2018-07-25 16:46:13 +10:00
Clark Boylan	f385a5821f	Fix test patching of clouds.yaml file locations OpenStack Client Config has been pulled into openstacksdk. As part of this work OSCC internals were dropped and aliased into the sdk lib. This move broke patching of the clouds.yaml file location for nodepool tests. We quickly work around this by using the new location for the value to be overridden in openstacksdk. Change-Id: I55ad4333ffddec8eeb023e345156e96773504400	2018-05-03 12:50:33 -07:00
James E. Blair	baa831192f	Store build logs automatically This updates the builder to store individual build logs in dedicated files, one per build, named for the image and build id. Old logs are automatically pruned. By default, they are stored in /var/log/nodepool/builds, but this can be changed. This removes the need to specially configure logging handler for the image build logs. Change-Id: Ia7415d2fbbb320f8eddc4e46c3a055414df5f997	2018-02-09 07:50:20 -08:00
Zuul	a5173f8f46	Merge "Do pep8 housekeeping according to zuul rules" into feature/zuulv3	2018-01-17 17:07:28 +00:00
Tobias Henkel	7d79770840	Do pep8 housekeeping according to zuul rules The pep8 rules used in nodepool are somewhat broken. In preparation to use the pep8 ruleset from zuul we need to fix the findings upfront. Change-Id: I9fb2a80db7671c590cdb8effbd1a1102aaa3aff8	2018-01-17 02:17:45 +00:00

1 2 3 4

167 Commits