nodepool

Commit Graph

Author	SHA1	Message	Date
Clark Boylan	2a231a08c9	Add idle state to driver providers This change adds an idle state to driver providers which is used to indicate that the provider should stop performing actions that are not safe to perform while we bootstrap a second newer version of the provider to handle a config update. This is particularly interesting for the static driver because it is managing all of its state internally to nodepool and not relying on external cloud systems to track resources. This means it is important for the static provider to not have an old provider object update zookeeper at the same time as a new provider object. This was previously possible and created situtations where the resources in zookeeper did not reflect our local config. Since all other drivers rely on external state the primary update here is to the static driver. We simply stop performing config synchronization if the idle flag is set on a static provider. This will allow the new provider to take over reflecting the new config consistently. Note, we don't take other approaches and essentially create a system specific to the static driver because we're trying to avoid modifying the nodepool runtime significantly to fix a problem that is specific to the static driver. Change-Id: I93519d0c6f4ddf8a417d837f6ae12a30a55870bb	2022-10-24 15:30:31 -07:00
James E. Blair	b8035de65f	Improve handling of errors in provider manager startup If a provider (or its configuration) is sufficiently broken that the provider manager is unable to start, then the launcher will go into a loop where it attempts to restart all providers in the system until it succeeds. During this time, no pool managers are running which mean all requests are ignored by this launcher. Nodepool continuously reloads its configuration file, and in case of an error, the expected behavior is to continue running and allow the user to correct the configuration and retry after a short delay. We also expect providers on a launcher to be independent of each other so that if ones fails, the others continue working. However since we neither exit, nor process node requests if a provider manager fails to start, an error with one provider can cause all providers to stop handling requests with very little feedback to the operator. To address this, if a provider manager fails to start, the launcher will now behave as if the provider were absent from the config file. It will still emit the error to the log, and it will continuously attempt to start the provider so that if the error condition abates, the provider will start. If there are no providers on-line for a label, then as long as any provider in the system is running, node requests will be handled and declined and possibly failed while the broken provider is offilne. If the system contains only a single provider and it is broken, then no requests will be handled (failed), which is the current behavior, and still likely to be the most desirable in that case. Change-Id: If652e8911993946cee67c4dba5e6f88e55ac7099	2022-01-14 19:07:32 -08:00
Fabien Boucher	f57ac1881a	Remove uneeded shebang and exec bit on some files Having python files with exec bit and shebang defined in /usr/lib/python-*/site-package/ is not fine in a RPM package. Instead of carrying a patch in nodepool RPM packaging better to fix this directly upstream. Change-Id: I5a01e21243f175d28c67376941149e357cdacd26	2019-12-13 19:30:03 +01:00
Monty Taylor	7618b714e2	Remove unused use_taskmanager flag Now that there is no more TaskManager class, nor anything using one, the use_taskmanager flag is vestigal. Clean it up so that we don't have to pass it around to things anymore. Change-Id: I7c1f766f948ad965ee5f07321743fbaebb54288a	2019-04-02 12:11:07 +00:00
Tristan Cacqueray	c7f2538457	builder: do not configure provider that doesn't manage images This change prevent the builder service from starting provider that doesn't manage images. Change-Id: Id179e2d3bedb9c9914b13241c77bddad3ec7ca57	2018-07-15 23:10:05 +00:00
David Shrewsbury	a418aabb7a	Pass zk connection to ProviderManager.start() In order to support static node pre-registration, we need to give the provider manager the opportunity to register/deregister any nodes in its configuration file when it starts (on startup or when the config change). It will need a ZooKeeper connection to do this. The OpenStack driver will ignore this parameter. Change-Id: Idd00286b2577921b3fe5b55e8f13a27f2fbde5d6	2018-06-12 12:04:16 -04:00
James E. Blair	e20858755f	Have Drivers create Providers Use the new Driver class to create instances of Providers Change-Id: Idfbde8d773a971133b49fbc318385893be293fac	2018-06-06 14:57:40 -04:00
Tristan Cacqueray	d0a67878a3	Add a plugin interface for drivers This change adds a plugin interface so that driver can be loaded dynamically. Instead of importing each driver in the launcher, provider_manager and config, the Drivers class discovers and loads driver from the driver directory. This change also adds a reset() method to the driver Config interface to reset the os_client_config reference when reloading the OpenStack driver. Change-Id: Ia347aa2501de0e05b2a7dd014c4daf1b0a4e0fb5	2018-01-19 00:45:56 +00:00
Tristan Cacqueray	b01227c9d4	Move the fakeprovider module to the fake driver This change is a follow-up to the drivers spec and it makes the fake provider a real driver. The fakeprovider module is merged into the fake provider and the get_one_cloud config loader is simplified. Change-Id: I3f8ae12ea888e7c2a13f246ea5f85d4a809e8c8d	2017-07-28 11:35:07 +00:00
Tristan Cacqueray	c0e6d5112b	Extend Nodepool configuration syntax to support multiple drivers Change-Id: I220e8e71c1205174a0a7515899c9bb6c4cc6adcb Story: 2001044 Task: 4616	2017-07-25 14:27:17 +00:00
Tristan Cacqueray	4d201328f5	Collect request handling implementation in an OpenStack driver This change moves OpenStack related code to a driver. To avoid circular import, this change also moves the StatsReporter to the stats module so that the handlers doesn't have to import the launcher. Change-Id: I319ce8780aa7e81b079c3f31d546b89eca6cf5f4 Story: 2001044 Task: 4614	2017-07-25 14:27:17 +00:00
Tristan Cacqueray	27b600ee2c	Abstract Nodepool provider management code This change adds a generic Provider meta class to the common driver module to support multiple implementation. It also renames some method to better match other drivers use-cases, e.g.: * listServers into listNodes * cleanupServer into cleanupNode Change-Id: I6fab952db372312f12e57c6212f6ebde59a1a6b3 Story: 2001044 Task: 4612	2017-07-25 14:27:13 +00:00
Jenkins	279809ed1d	Merge "Create group for label type" into feature/zuulv3	2017-06-13 17:34:36 +00:00
Ricardo Carrillo Cruz	7c3263c7df	Create group for label type Currently, we get OOTB groups per provider and per image. It would be nice to have also groups per label type, for running plays against a particular label. Change-Id: Ib4173fc0c15184444a91dc402bb306d34f295106	2017-06-13 18:54:48 +02:00
Monty Taylor	8c59361032	Support booting cloud-images by name or id The docs say we support this, but the code doesn't. Also, self._cloud_image.name == self._label._cloud_image and is essentially a foreign key. That's hard to read at the call site, so just use self._cloud_image. We have a cloud id if it's a disk image- so wrap that in a dict. Pass the other one through unmodified so that we'll search for it. We also don't have any codepaths using image_name, nor a reason to distinguish. Change-Id: I4aa9bd8e7c578ae63d05df453b9886c710a092c0	2017-06-10 10:16:51 -05:00
Paul Belanger	1d0990a1c1	Add boot-from-volume support for nodes For example, a cloud may get better preformance from a cinder volume then the local compute drive. As a result, give nodepool to option to choose if the server should boot from volume or not. Change-Id: I3faefe99096fef1fe28816ac0a4b28c05ff7f0ec Depends-On: If58cd96b0b9ce4569120d60fbceb2c23b2f7641d Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-05-30 14:23:24 -04:00
Paul Belanger	e4e98123d3	Fetch server console log if ssh connection fails Currently, if the ssh connection fails, we are blind to what the possible failures are. As a result, attempt to fetch the server console log to help debug the failure. This is the continuation of I39ec1fe591d6602a3d494ac79ffa6d2203b5676b but for the feature/zuulv3 branch. This was done to avoid merge conflicts on the recent changes to nodepool.yaml layout. Change-Id: I75ccb6d01956fb6052473f44cce8f097a56dd16a Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-05-23 12:53:44 -04:00
Paul Belanger	71ff1a9bc5	Sort flavors with operator.itemgetter('ram') The current syntax is not python3 compatible, so we look to shade to help accomplish our sorting syntax. Change-Id: Iadb39f976840fd2af6e0bd7b08bd3b01169e37a1 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-05-17 15:19:52 -04:00
Paul Belanger	d892837cad	Fix imports for python3 The syntax for imports has changed for python3, lets use the new syntax. Change-Id: Ia985424bf23b44e492f51182179d2e476cdcccbb Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-05-17 15:19:48 -04:00
Monty Taylor	642f14c076	Add ability to select flavor by name or id It's possible that it's easier for a nodepool user to just specify a name or id of a flavor in their config instead of the combo of min-ram and name-filter. In order to not have two name related items, and also to not have the pure flavor-name case use a term called "name-filter" - change name-filter to flavor-name, and introduce the semantics that if flavor-name is given by itself, it will look for an exact match on flavor name or id, and if it's given with min-ram it will behave as name-filter did already. Change-Id: I8b98314958d03818ceca5abf4e3b537c8998f248	2017-04-27 13:44:25 -07:00
David Shrewsbury	92f375c70b	Remove support for nodepool_id This was a temporary measure to keep production nodepool from deleting nodes created by v3 nodepool. We don't need to carry it over. This is an alternative to: https://review.openstack.org/449375 Change-Id: Ib24395e30a118c0ea57f8958a8dca4407fe1b55b	2017-03-30 12:08:04 -04:00
Jenkins	73f3b56376	Merge "Merge branch 'master' into feature/zuulv3" into feature/zuulv3	2017-03-30 16:03:36 +00:00
Joshua Hesketh	94f33cb666	Merge branch 'master' into feature/zuulv3 The nodepool_id feature may need to be removed. I've kept it to simplify merging both now and if we do it again later. A couple of the tests are disabled and need reworking in a subsquent commit. Change-Id: I948f9f69ad911778fabb1c498aebd23acce8c89c	2017-03-30 21:46:15 +11:00
Monty Taylor	19e8f2788c	Fetch list of AZs from nova if it's not configured Nova has an API call that can fetch the list of available AZs. Use it to provide a default list so that we can provide sane choices to the scheduler related to multi-node requests rather than just letting nova pick on a per-request basis. Change-Id: I1418ab8a513280318bc1fe6e59301fda5cf7b890	2017-03-29 13:09:50 -05:00
James E. Blair	440c427662	Remove deprecated networks syntax And simplify. Change-Id: I8be53c228de9be5dc3cb39ff9d90cda6bbde9124	2017-03-27 11:35:12 -07:00
James E. Blair	dcc3b5e071	Update nodepool config syntax This implements the changes described in: http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html It also removes some, but not all, extraneous keys from test config files. Change-Id: Iebc941b4505d6ad46c882799b6230eb23545e5c0	2017-03-27 09:34:02 -07:00
Paul Belanger	c5c5be30f9	Remove keypair from provider section This was an unused setting which was left over from when we supported snapshots. Change-Id: I940eaa57f5dad8761752d767c0dfa80f2a25c787 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-03-27 08:31:31 -07:00
Paul Belanger	f7289a5aca	Remove legacy openstack settings from nodepool.yaml Before os-client-config and shade, we would include cloud credentials in nodepool.yaml. But now comes the time where we can remove these settings in favor of using a local clouds.yaml file. Change-Id: Ie7af6dcd56dc48787f280816de939d07800e9d11 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-03-27 08:31:29 -07:00
Jenkins	30c4b46b48	Merge "Default config-drive to true"	2017-03-15 17:36:22 +00:00
Monty Taylor	066942a0ac	Stop json-encoding the nodepool metadata When we first started putting nodepool metadata into the server record in OpenStack, we json encoded the data so that we could store a dict into a field that only takes strings. We were also going to teach the ansible OpenStack Inventory about this so that it could read the data out of the groups list. However, ansible was not crazy about accepting "attempt to json decode values in the metadata" since json-encoded values are not actually part of the interface OpenStack expects - which means one of our goals, which is ansible inventory groups based on nodepool information is no longer really a thing. We could push harder on that, but we actually don't need the functionality we're getting from the json encoding. The OpenStack Inventory has supported comma separated lists of groups since before day one. And the other nodepool info we're storing stores and fetches just as easily with 4 different top level keys as it does in a json dict - and is easier to read and deal with when just looking at server records. Finally, nova has a 255 byte limit on size of the value that can be stored, so we cannot grow the information in the nodepool dict indefinitely anyway. Migrate the data to store into nodepool_ variables and a comma separated list for groups. Consume both forms, so that people upgrading will not lose track of existing stock of nodes. Finally, we don't use snapshot_id anymore - so remove it. Change-Id: I2c06dc7c2faa19e27d1fb1d9d6df78da45ffa6dd	2017-03-10 16:24:03 -05:00
Paul Belanger	a6f4f6be9b	Add nodepool-id to provider section Currently, while testing zuulv3, we are wanting to share the infracloud-chocolate provider between 2 nodepool servers. The current issue is, if we launch nodes from zuulv3-dev.o.o, nodepool.o.o will detect the nodes as leaked and delete them. A way to solve this, is to create a per provider 'nodepool-id' where an admin can configure 2 separate nodepool servers to share the same tenant. The big reason for doing this, is so we don't have to stand up a duplicate nodepool-builder and upload duplicate images. Change-Id: I03a95ce7b8bf06199de7f46fd3d0f82407bec8f5 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-02-27 15:16:57 -05:00
David Shrewsbury	3f42a89df9	Support launch failures in FakeProviderManager Let's not use mock for testing launch failures. Instead, add an attribute to FakeProviderManager that tells it how many times successive calls to createServer() should fail. Change-Id: Iba6f8f89de84b06d2c858b0ee69bc65c37ef3cf0	2017-02-21 12:59:53 -05:00
James E. Blair	fe153656df	Don't use taskmanagers in builder ProviderManager is a TaskManager, and TaskManagers are intended to serialize API requests to a single cloud from multiple threads. Currently each worker in the builder has its own set of ProviderManagers. That means that we are performing cloud API calls in parallel. That's probably okay since we perform very few of them, mostly image uploads and deletes. And in fact, we probably want to avoid blocking on image uploads. However, there is a thread associated with each of these ProviderManagers, and even though they are idle, in aggregate they add up to a significant CPU cost. This makes the use of a TaskManager by a ProviderManager optional and sets the builder not to use it in order to avoid spawning these useless threads. Change-Id: Iaf6498c34a38c384b85d3ab568c43dab0bcdd3d5	2016-12-07 11:58:24 -08:00
Paul Belanger	baf98e052b	Use diskimage-builder checksum files We recently added the ability for diskimage-builder to generate checksum files. This means nodepool can validate DIBs and then pass the contents to shade, saving shade from caclucating the checksums. Change-Id: I4cd44bb83beb4839c2c2346af081638e61899d4d Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-11-30 12:48:34 -05:00
Joshua Hesketh	e14162da13	Merge branch 'master' into feature/zuulv3 Does not include changes to force image deletion or not-run webapp etc. Change-Id: I74c6c2c575b29e61bb39dca36a71a747cd464587	2016-11-30 21:18:48 +11:00
Monty Taylor	9dbce5a757	Remove unused function make_image_dict We don't use this any more. Change-Id: Ib95ed58718a4bbf9ca46bfccc5f24a8211755270	2016-11-29 08:50:13 -06:00
Monty Taylor	919981b652	Unsubvert image and flavor caching Recent shade allows users to pass in image and flavor to create_server by name. This results in a potential extra lookup to find the image and flavor. Since nodepool is not using shade caching, this is causing our nodepool-level caching to be subverted. Although an eventual project to get nodepool to use shade caching, that's a bad scope creep for now. Just pass in the objects themselves, which gets shade to not attempt to look for them. In the case where we have an image_id - put it into a dict so that shade treats it as an object passed in and not a thing that needs to be treated like a name_or_id. Depends-On: I4938037decf51001ab5789ee383f6c7ed34889b1 Change-Id: Ic70b19ad5baf25413e20a658163ca718dce63bee	2016-09-01 22:43:49 +00:00
Paul Belanger	f1dfb117b0	Default config-drive to true As we depend more and more on glean to help bootstrap a node, it is possible for new clouds added to nodepool.yaml to be missing the setting. Which results is broken nodes and multiple configuration updates. As a result, we now default config-drive to true to make it easier to bring nodes online. Change-Id: I4e214ba7bc43a59ddffb4bfb50576ab3b96acf69 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2016-08-01 16:54:49 -04:00
Monty Taylor	60e49f110b	Cleanup leaked floating ips periodically It should not happen in a neutron setup that we have leaked floating ips. However, sometimes it seems that it happens around startup. It's also safe in a neutron context to just clean the unattached ones. So assume that sometimes clouds get into weird states and just clean them. Change-Id: I1a30efb3b7994381592c2391881711d6b1f32dff Depends-On: I93b0c7d0b0eefdfe0fb1cd4a66cdbba9baabeb09	2016-05-09 04:28:10 -05:00
James E. Blair	c64e27be15	Make stopping more reliable * Builders were interfering with the gear shutdown procedure by overriding the use of the 'running' variable on gear workers. Instead, just rely on the built-in shutdown process in the gear worker class. * Have the builder shutdown provider managers as well. * Correctly handle signals in the builder. * Have the nodepool daemon shut down its gearman client. * Use a condition object so that we can interrupt the main loop sleep and exit faster. Both the builder and the daemon now exit cleanly on CTRL-C when run in the foreground. Change-Id: Iefd5ef7df74e701725f4bafe4df51b8276088fe5	2016-04-18 08:51:17 -07:00
James E. Blair	2e05f1850f	Restore ability to run nodepoold with fakes With OSC and shade patches, we lost the ability to run nodepoold in the foreground with fakes. This restores that ability. The shade integration unit tests are updated to use the string 'real' rather than 'fake' in config files, as they are trying to avoid actually using the nodepool fakes, and the use of the string 'fake' is what triggers their use in many cases. Change-Id: Ia5d3c3d5462bc03edafcc1567d1bab299ea5d40f	2016-04-18 08:47:46 -07:00
Monty Taylor	f0b0ba8a0a	Don't get extra flavor specs It's not a big deal because we cache this - but we don't care at all about the extra flavor specs, so skip fetching them for each of the flavors. Change-Id: Iff73bdbe598fcf7556eafc484325f79452975a4f	2016-04-16 11:15:48 -05:00
Jenkins	c7f8c2be9f	Merge "Pass extended network information in to occ/shade"	2016-04-14 20:17:47 +00:00
Monty Taylor	2a30810b2e	Pass extended network information in to occ/shade We need to know which networks are public/private, which we already have in nodepool, but were not passing in to the OCC constructor. We also need to be able to indicate which network should be the target of NAT in the case of multiple private networks, which can be done via nat_destination and the new networks list argument support in OCC. Finally, 'use_neutron' is purely the purview of shade now, so remove it. Depends-On: I0d469339ba00486683fcd3ce2995002fa0a576d1 Change-Id: I70e6191d60e322a93127abf4105ca087b785130e	2016-04-14 13:27:09 -05:00
James E. Blair	cb5a6908fb	Only delete keypairs if needed This restores some logic that was inadvertently removed in the shade transition, without which, we issue an extra delete keypair API call for every server delete. Change-Id: Ib1f50c23d61c1d874f2b235fd57d2a2b0defd6c5	2016-04-01 10:15:16 -07:00
Monty Taylor	df45798508	Remove unused functions We don't use these in shade-world anymore. Change-Id: Ib4771af9f9f30cfa27020282b6fb8f3823af0db8	2016-03-30 16:23:49 -07:00
Monty Taylor	e1f4a12949	Use shade for all OpenStack interactions We wrote shade as an extraction of the logic we had in nodepool, and have since expanded it to support more clouds. It's time to start using it in nodepool, since that will allow us to add more clouds and also to handle a wider variety of them. Making a patch series was too tricky because of the way fakes and threading work, so this is everything in one stab. Depends-On: I557694b3931d81a3524c781ab5dabfb5995557f5 Change-Id: I423716d619aafb2eca5c1748bc65b38603a97b6a Co-Authored-By: James E. Blair <jeblair@linux.vnet.ibm.com> Co-Authored-By: David Shrewsbury <shrewsbury.dave@gmail.com> Co-Authored-By: Yolanda Robla <yolanda.robla-mota@hpe.com>	2016-03-26 10:23:25 +01:00
James E. Blair	afdd58c10a	Log shade inner exceptions With the dependent change, shade now stores inner exceptions if they occur. Wrap our use of shade with a context manager that logs the inner exceptions in nodepool's own logging context. Change-Id: I6be2422aa0352ee9f0ff7429ee6e66384c2b5d57 Depends-On: I33269743a8f62b863569130aba3cc9b5a8539aa0	2016-03-23 08:24:31 +01:00
Monty Taylor	eed395d637	Be more specific in logging timeout exceptions At the moment, grepping through logs to determine what's happening with timeouts on a provider is difficult because for some errors the cause of the timeout is on a different line than the provider in question. Give each timeout a specific named exception, and then when we catch the exceptions, log them specifically with node id, provider and then the additional descriptive text from the timeout exception. This should allow for easy grepping through logs to find specific instances of types of timeouts - or of all timeouts. Also add a corresponding success debug log so that comparitive greps/counts are also easy. Change-Id: I889bd9b5d92f77ce9ff86415c775fe1cd9545bbc	2016-03-04 17:42:09 -06:00
Monty Taylor	536f7feab0	Add an error log with the server fault message In case there is useful debug information in the server fault message, log it so that we can try to track down why servers go away. Change-Id: I33fd51cbfc110fdb1ccfa6bc30a421d527f2e928	2016-03-03 01:36:49 +00:00

1 2 3

124 Commits