zuul/zuul - zuul - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
James E. Blair	7262ef7f6f	Include job_uuid in NodeRequests This is part of the circular dependency refactor. It updates the NodeRequest object to include the job_uuid in addition to the job_name (which is temporarily kept for backwards compatability). When node requests are completed, we now look up the job by uuid if supplied. Change-Id: I57d4ab6c241b03f76f80346b5567600e1692947a	2023-12-20 10:44:04 -08:00
James E. Blair	80f2d5ae7d	Fix race in test_node_request_canceled Even though our nodepool is paused, it still sees events and so adds them to the provisioned_requests list. This test only passed before in the cases where it ran very quickly and beat the arrival of the event. To make this safer, allow the event to arrive and assert that even after the request is deleted, we still only saw one request and it remains in 'requested' state (ie, it was never fulfilled after we canceled it). Change-Id: I92996842e1205c7cde880c20d3350c05b2ca1edf	2021-10-08 09:28:50 -07:00
Felix Edel	38776452bb	Don't use the AnsibleJob in the nodepool client This change follows up on a few TODOs left by the lock/unlock nodes on executor change. When locking the nodes on the executor we used the AnsibleJob as a replacement for the old build parameter that was provided to the nodepool client methods as they were originally called by the scheduler. However, the AnsibleJob class should only be used internally by the executor, so we now provide all parameters directly to the nodepool methods. This also annotates the logger in the updated nodepool client methods and fixes an outdated method signature in test_scheduler.TestSemaphore.test_semaphore_zk_error. Remove two comments about storing timestamps on the build request in ZooKeeper as this doesn't make much sense. It sounded like a good idea in the beginning, but with the current solution, the scheduler doesn't need to care about the build request anymore after it was submitted (except for canceling/cleanup purposes) and the result data is self-contained. Change-Id: I2d1005f69904c6ace8f79523133f382af0024c52	2021-09-10 10:55:01 -07:00
James E. Blair	aee6ef6f7f	Report nodepool resource stats gauges in scheduler We currently report nodepool resource usage whenever we use or return nodes. This now happens on the executors, and they don't have a global view of all nodes used. The schedulers do, and they already have a periodic stats reporting method. Shift the reporting of node resource gauges to the scheduler. To make this efficient, use a tree cache for nodes. Because node records alone don't have enough information to tie them back to a tenant or project, use the new user_data field on the Node object to store that info when we mark a node in use. Also, store the zuul system id on the node, so that we can ensure we're only reporting nodes that belong to us. Update the node list in the REST API to use the cache as well, and also filter its results by zuul system id and tenant. Depends-On: https://review.opendev.org/807362 Change-Id: I9d0987b250b8fb54b3b937c86db327d255e54abd	2021-09-10 10:54:59 -07:00
James E. Blair	b41f467340	Remove internal nodepool request cache The internal zuul.nodepool.Nodepool.requests dictionary is used so the scheduler can keep track of its requests. Since we will have multiple schedulers emitting requests, we can't use that any more. Remove any remaining uses of it. The NodeRequest uid was only used to index that dictionary (and was used to persist a request across resubmission). Since it isn't needed any more, it is removed. Change-Id: I7c82485d95979c6c9a246c3dc3954bae3c65ac13	2021-09-10 10:53:47 -07:00
James E. Blair	514f62ea31	Refactor the checkNodeRequest method We perform some checks which aren't necessary any more. This method is better thought of as a method of getting a nodeset from a fulfilled node request, so update it accordingly. Change-Id: I1113820115af68b706b6fe06d6d03cd35ae6b382	2021-09-10 08:46:42 -07:00
Felix Edel	4e2985638c	Add node request cache to zk nodepool interface This adds a TreeCache to the ZK nodepool interface; it's nearly identical to the one on the nodepool side. Co-Authored-By: James E. Blair <jim@acmegating.com> Change-Id: Ie972c397cf235d637619d1e40c5e7ff78431ac0d	2021-09-06 15:26:39 -07:00
James E. Blair	e225a28fa5	Make node requests persistent The original Nodepool protocol specified that node requests should be ephemeral, that way if the requestor crashed before accepting the nodes, the request would automatically be cleaned up and the nodes returned. This doesn't comport with multiple schedulers, as we will soon expect schedulers to stop and start routinely while we want the node requests they spawn to persist and be handled by other schedulers. Fortunately, Nodepool doesn't really care if the request is ephemeral or not. So we'll drop the "ephemeral" flag. But in the short term, we will be stopping the scheduler and that will leave orphan node requests. And even in the long term, we may have a complete Zuul system shutdown or even a bug which may leak node requests, so we still need a way of deleting node requests which don't belong. To handle that, we add a cleanup routine which we run immediately on startup and every hour that looks for node requests created by this Zuul system but don't correspond to any queue entries. We create a new UUID to identify the Zuul system and store it in ZK (so that if Nodepool has any other users we don't delete their requests). We no longer need to resubmit requests on connection loss, so tests addressing that behavior are removed. Change-Id: Ie22e99ef71cbe6b31d40c25a21498c1e867ca777	2021-09-03 16:17:15 -07:00
James E. Blair	dbe13ce076	Remove nodeset from NodeRequest To make things simpler for schedulers to handle node provisioned events for node requests which they may not have in their local pipeline state, we need to make the pipeline storage of node requests simpler. That starts by removing the nodeset object as an attribute of the NodeRequest object. This means that the scheduler can work with a node request object without relying on having the associated nodeset. It also simplifies the ZooKeeper code that deserializes NodeRequests (as it doesn't have to create fake NodeSet objects too). And finally, it simplifies what must be stored in the pipeline and queue item structures, which will also come in handy later. Two tests designed to verify that the request->nodeset magic deserialization worked have been removed since they are no longer applicable. Change-Id: I70ae083765d5cd9a4fd1afc2442bf22d6c52ba0b	2021-09-02 09:29:44 -07:00
James E. Blair	d87a9a8b8f	Clear nodeset when re-submitting node requests We encountered an issue where Zuul: * submitted a node request * nodepool fulfilled it * zuul received the ZK watch and refreshed the NodeRequest object * zuul submitted the node provisioned event to the event queue * ZK was disconnected/reconnected * zuul processed the node provisioned event * zuul found the node request no longer existed (because it's ephemeral) * zuul resubmitted the node request Because the NodeRequest object had the provisioned node information attached to it, the re-submitted request was created with an existing 'nodes' list. Nodepool appended to that list and fulfilled the new request (which requested 1 but received 2 nodes). This caused an exception in Zuul's nodepool request watch callback, which caused Zuul to ignore that and all future updates to the node request. To address this, we make a new copy of the nodeset without any allocated node info when re-submitting a request. This contains an unrelated change to the event id handling from an earlier revision; it is kept because it will simplify future changes which eliminate the node request cache altogether. Change-Id: I72f5ed7ad53e44d77b37870546daf61b8a4e7e09	2021-08-04 12:20:11 -07:00
Felix Edel	8f8f4f1898	Switch to ZooKeeper backed NodesProvisionedEvents This puts the NodesProvisionedEvents into ZooKeeper. With this, all result events are now in ZooKeeper. To make the NodesProvisionedEvent serializable, we cannot store the whole NodeRequest object in the event anymore. Instead we are using the request.id, job name and the buildset UUID to look up the corresponding NodeRequest object from the buildset when the NodesProvisionedEvent is handled. As a result of this we have to find a way to return the NodeSet even in case the NodeRequest or the buildset cannot be found anymore (e.g. due to a gate reset). In this case we look up the NodeRequest directly from ZooKeeper and provide a faked NodeSet which allows us to retrieve the node information from Zookeeper (via the update mechanism of the NodeRequest in the Nodepool client). Finally, we can get rid of the local result event queue in the scheduler as now all result events are in ZooKeeper. Together with that the `zuul.scheduler.eventqueues.result` gauge was also removed. Change-Id: Ib5e0f13d25a21ebad908d38f0201e92b704a1c85	2021-07-13 14:57:01 -07:00
Felix Edel	040c5c8032	Move parent provider determination to pipeline manager Moving the parent provider determination into the pipeline manager allows us to remove the buildset and job objects from the NodeRequest constructor. This way we can fully serialize the NodeRequest to ZooKeeper and restore it without missing important information. This has also an impact on the NodeRequest's priority property. As this is only needed to determine the znode path when the NodeRequest is submitted, we can provide it directly as parameter to the submitNodeRequest call (and the related update callbacks). To ensure that NodePool doesn't strip those additional information when it fulfills the NodeRequest, we use the new "requestor_data" field which is implemented in [1]. To make this work, we also have to look up the buildset by its UUID from the active tenants and pipelines when the NodesProvisioned event is handled in the scheduler. Something similar was already done for handling the other result events as well. [1]: https://review.opendev.org/c/zuul/nodepool/+/798746/ Depends-On: https://review.opendev.org/c/zuul/nodepool/+/798746/ Change-Id: Id794643dcf26b0565499d20adba99d3b0518fdf1	2021-07-08 13:27:08 -07:00
Felix Edel	fee46c25bc	Lock/unlock nodes on executor server Currently, the nodes are locked in the scheduler/pipeline manager before the actual build is created in the executor client. When the nodes are locked, the corresponding NodeRequest is also deleted. With this change, the executor will lock the nodes directly before starting the build and unlock them when the build is completed. To keep the order of events intact, the nodepool.acceptNodes() method is split up into two: 1. nodepool.acceptNodeRequest() does most of the old acceptNodes() method except for locking the nodes and deleting the node request. It is called on the scheduler side when the NodesProvisionedEvent is handled (which is also where acceptNodes() was previously called). 2. nodepool.acceptNodes() is now called on the executor side when the job is started. It locks the nodes and deletes the node request in ZooKeeper. Finally, it's also necessary to move the autohold processing to the executor, as this requires a lock on the node. To allow processing of autoholds, the executor now also determines the build attempts and sets the RETRY_LIMIT result if necessary. Change-Id: I7392ce47e84dcfb8079c16e34e0ed2062ebf4136	2021-07-01 05:46:02 +00:00
Felix Edel	ba7f81be2d	Provide statsd client to Nodepool and make scheduler optional To lock/unlock the nodes directly in the executor server, we have to make the Nodepool API work without a scheduler instance. To keep the stats emitting intact, we provide a statsd client directly to the Nodepool instance. This leaves only one place where the scheduler is used in the Nodepool class, which is the onNodesProvisioned() callback. This callback won't be necessary anymore when the nodes are locked on the executor and thus this function call and the scheduler parameter itself can be removed. Change-Id: I3f3e4bfff08e244f68a9be7c6a4efcc194a23332	2021-04-30 12:12:28 +02:00
Felix Edel	71a990d42a	Move setupZK() helper function to BaseTestCase class As we are going to move initialization of the ZooKeeper connection to the server classes rather than the cmd classes, this should help to provide the necessary parameters to the various components in the test cases. This also removes some duplicate code as the setupZK() functionality can now also be used by other TestClasses which don't inherit from ZuulTestCase. Change-Id: Iee64b7173ceb78967e48dcd6f51bc998ce9469c5	2021-03-08 06:50:03 -08:00
James E. Blair	74a9c9de9b	Use ZooKeeper TLS in tests This mirrors the configuration in Nodepool for using TLS-enabled ZooKeeper in tests. We use the ensure-zookeeper role in order to get a newer ZooKeeper than is supplied in bionic. Change-Id: I14413fccbc9a6a7a75b6233d667e2a1d2856d894	2021-03-08 06:49:57 -08:00
Felix Edel	b4d8a4e74b	Simplify ZooKeeper client initialization The ZooKeeperClient now provides a fromConfig() method that parses all necessary configuration values to instantiate a ZooKeeperClient. Previously, this needed to be done in every component to initialize the connection to ZooKeeper. Change-Id: I5fa4ddab5f85c658291f1262ee0392a60086846e	2021-02-21 07:41:43 -08:00
Jan Kubovy	d518e56208	Prepare Zookeeper for scale-out scheduler This change is a common root for other Zookeeper related changed regarding scale-out-scheduler. Zookeeper becoming a central component requires to increase "maxClientCnxns". Since the ZooKeeper class is expected to grow significantly (ZooKeeper is becoming a central part of Zuul) a split of the ZooKeeper class (zk.py) into zk module is done here to avoid the current god-class. Also the zookeeper log is copied to the "zuul_output_dir". Change-Id: I714c06052b5e17269a6964892ad53b48cf65db19 Story: 2007192	2021-02-15 14:44:18 +01:00
David Shrewsbury	f6b6991af2	Add caching of autohold requests Change-Id: I94d4a0d2e8630d360ad7c5d07690b6ed33b22f75	2019-09-16 10:46:36 -04:00
James E. Blair	0b00c4685b	Set relative priority of node requests Add a relative_priority field to node requests and continuously adjust it for each queue item based on the contents of queues. This allows for a more fair distribution of build resources between different projects. The first item in a pipeline from a given project (or, in the case of a dependent pipeline, group of projects) has equal priority to all other first-items of other projcets in the same pipeline. Second items have a lower priority, etc. Depends-On: https://review.openstack.org/620954 Change-Id: Id3799aeb2cec6d96a662bfa394a538050f7ea947	2018-11-30 12:50:34 +01:00
Paul Belanger	ecb0b84f11	Add support for shared ansible_host in inventory Today it is possible to create the following ansible inventory file: [foo] foo01 ansible_host=192.168.1.1 [bar] bar01 ansible_host=192.168.1.1 Which allows a user to create multiple host aliases for a single connection. This could be done with ansible groups, however there is some functional differences on how ansible runs in that configuration. We could also request 2 nodes from nodepool, however in this case, it would be a waste of CI resources because every alias would need a new node from nodepool. Now, a user is able to alias multiple host names to a single node from nodepool by doing the following: nodeset: nodes: - name: - foo - bar label: ubuntu-xenial This would result in a single node request from nodepool, but create an inventory file with 2 alaises sharing and single ansible_host variable. Change-Id: I674d6baac26852ee1503feb1ed16c279bf773688 Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-11-18 16:22:01 -05:00
James E. Blair	4f1731ba86	Emit some nodepool stats Change-Id: I7bc3914e8b8d64afee061c002dcc9cca5dd1ef4d	2017-10-13 15:56:59 -07:00
David Shrewsbury	94e95886e2	Handle double node locking snafu If our queue processing is slow, and we lose the ZooKeeper session after a node request has been fulfilled, but before we actually accept the nodes, we need to be aware of this and not try to use the nodes given to us. Also pass the request ID along in the event queue since the actual request object can have its ID changed out from underneath us on a resubmit. Compare this ID with the request ID, and also verify that the actual request still exists. Furthermore, when we lock nodes, let's make sure that they are actually allocated to the request we are processing. Change-Id: Id89f6542afcf3f5d4a0b392b5cb8cf21ec3f6865	2017-10-05 17:26:59 -04:00
Clark Boylan	ffe8f8bb81	Cleanup zookeeper and fake nodepool in nodepool tests The nodepool tests set up zookeeper connections and a fake nodepool. These test resources spawned threads that were leaking and causing subsequent tests to run with zookeeper and nodepool threads running. Clean that up. Change-Id: Ib799d4a40eb51e7bbe71673a082f345235c41019	2017-04-25 10:19:31 -07:00
Clark Boylan	246201835f	Use current class name on super call In order to get the method resolution correct for setUp in the tests we need to call super on the current Class not the parent. Change-Id: I1cf7f45c107a1fad2b70e3b18dc23b8bcc6c4b33	2017-04-25 10:14:41 -07:00
Clark Boylan	621ec9a22b	Put test id in zk chroot path In an effort to help debug leaked zk chroots add the test id to the chroot path so that we can see which tests are leaking. Change-Id: I27047ecfa69d59964ac9ad4a03941b9826099420	2017-04-10 12:36:10 +00:00
James E. Blair	8b2a14704f	Use hostname in Nodepool requests Rather than identifying a Nodepool request as originating from 'zuul', identify it with the hostname of the zuul scheduler. This way the operator of a site with multiple zuuls can identify whence requests originated. Change-Id: I69aaca90d2bc3fac5fe55cdd9c57d724f6e103b4	2017-03-06 13:44:10 -08:00
James E. Blair	0d5a36e3ff	Remove zk host list parsing Just pass through the zookeeper host list string unparsed since that what we're going to expect from our config file. Change-Id: Ife8fd97860ad35c793ef956adbb9d626569f60bf	2017-02-21 11:08:47 -05:00
James E. Blair	2a8e0fa5f1	Move tests into test/unit This makes room for a sibling directory for nodepool functional tests. Change-Id: Iace94d313edb04192ac23a533ed967f076410980	2017-01-24 10:18:38 -08:00

29 Commits