zuul/zuul - zuul - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
Zuul	617bbb229c	Merge "Fix validate-tenants isolation"	2024-02-28 02:46:55 +00:00
James E. Blair	c531adacae	Add --keep-config-cache option to delete-state command The circular dependency refactor will require deleting all of the pipeline states as well as the event queues from ZK while zuul is offline during the upgrade. This is fairly close to the existing "delete-state" command, except that we can keep the config cache. Doing so will allow for a faster recovery since we won't need to issue all of the cat jobs again in order to fetch file contents. To facilitate this, we add a "--keep-config-cache" argument to the "delete-state" command which will then remove everything under /zuul except /zuul/config. Also, speed up both operations by implementing a fast recursive delete method which sends async delete ops depth first and only checks their results at the end (as opposed to the standard kazoo delete which checks each operation at once). This is added without a release note since it's not widely useful and the upcoming change which requires its use will have a release note with usage instructions. Change-Id: I4db43e00a73f5e5b796261ffe7236ed906e6b421	2024-02-02 12:09:52 -08:00
James E. Blair	fb7d24b245	Fix validate-tenants isolation The validate-tenants scheduler subcommand is supposed to perform complete tenant validation, and in doing so, it interacts with zk. It is supposed to isolate itself from the production data, but it appears to accidentally use the same unparsed config cache as the production system. This is mostly okay, but if the loading paths are different, it could lead to writing cache errors into the production file cache. The error is caused because the ConfigLoader creates an internal reference to the unparsed config cache and therefore ignores the temporary/isolated unparsed config cache created by the scheduler. To correct this, we will always pass the unparsed config cache into the configloader. Change-Id: I40bdbef4b767e19e99f58cbb3aa690bcb840fcd7	2024-01-31 14:58:45 -08:00
James E. Blair	ebb7986c6f	Client (old): don't translate null to 0000000 Like I9886cd44f8b4bae6f4a5ce3644f0598a73ecfe0a, have the zuul client send actual null values for oldrev/newrev instead of 0000000 which could lead to unintended behavior. Change-Id: I44994426493d05a039b5a1051504958b36729c9d	2024-01-12 06:49:17 -08:00
Zuul	1edf5b6760	Merge "Fix delete-pipeline-state command"	2023-05-22 11:33:40 +00:00
Clark Boylan	c1b0a00c60	Only check bwrap execution under the executor The reason for this is that containers for zuul services need to run privileged in order to successfully run bwrap. We currently only expect users to run the executor as privilged and the new bwrap execution checks have broken other services as a result. (Other services load the bwrap system bceause it is a normal zuul driver and all drivers are loaded by all services). This works around this by add a check_bwrap flag to connection setup and only setting it to true on the executor. A better longer term followup fixup would be to only instantiate the bwrap driver on the executor in the first place. This can probably be accomplished by overriding the ZuulApp configure_connections method in the executor and dropping bwrap creation in ZuulApp. Temporarily stop running the quick-start job since it's apparently not using speculative images. Change-Id: Ibadac0450e2879ef1ccc4b308ebd65de6e5a75ab	2023-05-17 13:45:23 -07:00
Simon Westphahl	cc2ff9742c	Fix delete-pipeline-state command This change also extends the test to assert that the pipeline change list was re-creates by asserting that the node exists in Zookeeper. Traceback (most recent call last): File "/home/westphahl/src/opendev/zuul/zuul/.nox/tests/bin/zuul-admin", line 10, in <module> sys.exit(main()) File "/home/westphahl/src/opendev/zuul/zuul/zuul/cmd/client.py", line 1066, in main Client().main() File "/home/westphahl/src/opendev/zuul/zuul/zuul/cmd/client.py", line 592, in main if self.args.func(): File "/home/westphahl/src/opendev/zuul/zuul/zuul/cmd/client.py", line 1045, in delete_pipeline_state PipelineChangeList.new(context) File "/home/westphahl/src/opendev/zuul/zuul/zuul/zk/zkobject.py", line 225, in new obj._save(context, data, create=True) File "/home/westphahl/src/opendev/zuul/zuul/zuul/zk/zkobject.py", line 507, in _save path = self.getPath() File "/home/westphahl/src/opendev/zuul/zuul/zuul/model.py", line 982, in getPath return self.getChangeListPath(self.pipeline) AttributeError: 'PipelineChangeList' object has no attribute 'pipeline' Change-Id: I8d7bf2fdb3ebf4790ca9cf15519dff4b761fbf2e	2023-04-26 15:58:32 +02:00
Zuul	987fba9f28	Merge "Fix prune-database command"	2023-03-30 01:49:54 +00:00
James E. Blair	7153505cd5	Fix prune-database command This command had two problems: * It would only delete the first 50 buildsets * Depending on DB configuration, it may not have deleted anything or left orphan data. We did not tell sqlalchemy to cascade delete operations, meaning that when we deleted the buildset, we didn't delete anything else. If the database enforces foreign keys (innodb, psql) then the command would have failed. If it doesn't (myisam) then it would have deleted the buildset rows but not anything else. The tests use myisam, so they ran without error and without deleting the builds. They check that the builds are deleted, but only through the ORM via a joined load with the buildsets, and since the buildsets are gone, the builds weren't returned. To address this shortcoming, the tests now use distinct ORM methods which return objects without any joins. This would have caught the error had it been in place before. Additionally, the delet operation retained the default limit of 50 rows (set in place for the web UI), meaning that when it did run, it would only delete the most recent 50 matching builds. We now explicitly set the limit to a user-configurable batch size (by default, 10,000 builds) so that we keep transaction sizes manageable and avoid monopolizing database locks. We continue deleting buildsets in batches as long as any matching buildsets remain. This should allow users to remove very large amounts of data without affecting ongoing operations too much. Change-Id: I4c678b294eeda25589b75ab1ce7c5d0b93a07df3	2023-03-29 17:12:13 -07:00
James E. Blair	b1490b1d8e	Avoid layout updates after delete-pipeline-state The delete-pipeline-state commend forces a layout update on every scheduler, but that isn't strictly necessary. While it may be helpful for some issues, if it really is necessary, the operator can issue a tenant reconfiguration after performing the delete-pipeline-state. In most cases, where only the state information itself is causing a problem, we can omit the layout updates and assume that the state reset alone is sufficient. To that end, this change removes the layout state changes from the delete-pipeline-state command and instead simply empties and recreates the pipeline state and change list objects. This is very similar to what happens in the pipeline manager _postConfig call, except in this case, we have the tenant lock so we know we can write with imputinity, and we know we are creating objects in ZK from scratch, so we use direct create calls. We set the pipeline state's layout uuid to None, which will cause the first scheduler that comes across it to (assuming its internal layout is up to date) perform a pipeline reset (which is almost a noop on an empty pipeline) and update the pipeline state layout to the current tenant layout state. Change-Id: I1c503280b516ffa7bbe4cf456d9c900b500e16b0	2023-03-01 13:54:46 -08:00
James E. Blair	7a8882c642	Set layout state event ltime in delete-pipeline-state The delete-pipeline-state command updates the layout state in order to force schedulers to update their local layout (essentially perform a local-only reconfiguration). In doing so, it sets the last event ltime to -1. This is reasonable for initializing a new system, but in an existing system, when an event arrives at the tenant trigger event queue it is assigned the last reconfiguration event ltime seen by that trigger event queue. Later, when a scheduler processes such a trigger event after the delete-pipeline-state command has run, it will refuse to handle the event since it arrived much later than its local layout state. This must then be corrected manually by the operator by forcing a tenant reconfiguration. This means that the system essentially suffers the delay of two sequential reconfigurations before it can proceed. To correct this, set the last event ltime for the layout state to the ltime of the layout state itself. This means that once a scheduler has updated its local layout, it can proceed in processing old events. Change-Id: I66e798adbbdd55ff1beb1ecee39c7f5a5351fc4b	2023-02-28 07:11:41 -08:00
James E. Blair	8f774043e6	Use importlib for versioning The semver parsing in PBR doesn't handle the full suite of pep440 versions (for example: 1.2.3+foo1 is the pep440 recommended way of handling local versions). Since we aren't doing anything with the parsed versions anyway, just return the string we get from importlib. Change-Id: I0a838c639333c40db5b12cd852b921f1b1c87fed	2023-01-23 10:51:08 -08:00
James E. Blair	3780ed548c	Unpin JWT and use integer IAT values PyJWT 2.6.0 began performing validation of iat (issued at) claims in `9cb9401cc5` I believe the intent of RFC7519 is to support any numeric values (including floating point) for iat, nbf, and exp, however, the PyJWT library has made the assumption that the values should be integers, and therefore when we supply an iat with decimal seconds, PyJWT will round down when validating the value. In our unit tests, this can cause validation errors. In order to avoid any issues, we will round down the times that we supply when generating JWT tokens and supply them as integers in accordance with the robustness principle. Change-Id: Ia8341b4d5de827e2df8878f11f2d1f52a1243cd4	2022-11-15 13:52:53 -08:00
James E. Blair	3a981b89a8	Parallelize some pipeline refresh ops We may be able to speed up pipeline refreshes in cases where there are large numbers of items or jobs/builds by parallelizing ZK reads. Quick refresher: the ZK protocol is async, and kazoo uses a queue to send operations to a single thread which manages IO. We typically call synchronous kazoo client methods which wait for the async result before returning. Since this is all thread-safe, we can attempt to fill the kazoo pipe by having multiple threads call the synchronous kazoo methods. If kazoo is waiting on IO for an earlier call, it will be able to start a later request simultaneously. Quick aside: it would be difficult for us to use the async methods directly since our overall code structure is still ordered and effectively single threaded (we need to load a QueueItem before we can load the BuildSet and the Builds, etc). Thus it makes the most sense for us to retain our ordering by using a ThreadPoolExecutor to run some operations in parallel. This change parallelizes loading QueueItems within a ChangeQueue, and also Builds/Jobs within a BuildSet. These are the points in a pipeline refresh tree which potentially have the largest number of children and could benefit the most from the change, especially if the ZK server has some measurable latency. Change-Id: I0871cc05a2d13e4ddc4ac284bd67e5e3003200ad	2022-11-09 10:51:29 -08:00
James E. Blair	1eda9ccf96	Correct exit routine in web, merger Change I216b76d6aaf7ebd01fa8cca843f03fd7a3eea16d unified the service stop sequence but omitted changes to zuul-web. Update zuul-web to match and make its sequence more robust. Also remove unecessary sys.exit calls from the merger. Change-Id: Ifdebc17878aa44d57996e4bdd46e49e6144b406b	2022-10-05 13:25:07 -07:00
James E. Blair	9a279725f9	Strictly sequence reconfiguration events In the before times when we only had a single scheduler, it was naturally the case that reconfiguration events were processed as they were encountered and no trigger events which arrived after them would be processed until the reconfiguration was complete. As we added more event queues to support SOS, it became possible for trigger events which arrived at the scheduler to be processed before a tenant reconfiguration caused by a preceding event to be complete. This is now even possible with a single scheduler. As a concrete example, imagine a change merges which updates the jobs which should run on a tag, and then a tag is created. A scheduler will process both of those events in succession. The first will cause it to submit a tenant reconfiguration event, and then forward the trigger event to any matching pipelines. The second event will also be forwarded to pipeline event queues. The pipeline events will then be processed, and then only at that point will the scheduler return to the start of the run loop and process the reconfiguration event. To correct this, we can take one of two approaches: make the reconfiguration more synchronous, or make it safer to be asynchronous. To make reconfiguration more synchronous, we would need to be able to upgrade a tenant read lock into a tenant write lock without releasing it. The lock recipes we use from kazoo do not support this. While it would be possible to extend them to do so, it would lead us further from parity with the upstream kazoo recipes, so this aproach is not used. Instead, we will make it safer for reconfiguration to be asynchronous by annotating every trigger event we forward with the last reconfiguration event that was seen before it. This means that every trigger event now specifies the minimum reconfiguration time for that event. If our local scheduler has not reached that time, we should stop processing trigger events and wait for it to catch up. This means that schedulers may continue to process events up to the point of a reconfiguration, but will then stop. The already existing short-circuit to abort processing once a scheduler is ready to reconfigure a tenant (where we check the tenant write lock contenders for a waiting reconfiguration) helps us get out of the way of pending reconfigurations as well. In short, once a reconfiguration is ready to start, we won't start processing tenant events anymore because of the existing lock check. And up until that happens, we will process as many events as possible until any further events require the reconfiguration. We will use the ltime of the tenant trigger event as our timestamp. As we forward tenant trigger events to the pipeline trigger event queues, we decide whether an event should cause a reconfiguration. Whenever one does, we note the ltime of that event and store it as metadata on the tenant trigger event queue so that we always know what the most recent required minimum ltime is (ie, the ltime of the most recently seen event that should cause a reconfiguration). Every event that we forward to the pipeline trigger queue will be annotated to specify that its minimum required reconfiguration ltime is that most recently seen ltime. And each time we reconfigure a tenant, we store the ltime of the event that prompted the reconfiguration in the layout state. If we later process a pipeline trigger event with a minimum required reconfigure ltime greater than the current one, we know we need to stop and wait for a reconfiguration, so we abort early. Because this system involves several event queues and objects each of which may be serialized at any point during a rolling upgrade, every involved object needs to have appropriate default value handling, and a synchronized model api change is not helpful. The remainder of this commit message is a description of what happens with each object when handled by either an old or new scheduler component during a rolling upgrade. When forwarding a trigger event and submitting a tenant reconfiguration event: The tenant trigger event zuul_event_ltime is initialized from zk, so will always have a value. The pipeline management event trigger_event_ltime is initialzed to the tenant trigger event zuul_event_ltime, so a new scheduler will write out the value. If an old scheduler creates the tenant reconfiguration event, it will be missing the trigger_event_ltime. The _reconfigureTenant method is called with a last_reconfigure_event_ltime parameter, which is either the trigger_event_ltime above in the case of a tenant reconfiguration event forwarded by a new scheduler, or -1 in all other cases (including other types of reconfiguration, or a tenant reconfiguration event forwarded by an old scheduler). If it is -1, it will use the current ltime so that if we process an event from an old scheduler which is missing the event ltime, or we are bootstrapping a tenant or otherwise reconfiguring in a context where we don't have a triggering event ltime, we will use an ltime which is very new so that we don't defer processing trigger events. We also ensure we never go backward, so that if we process an event from an old scheduler (and thus use the current ltime) then process an event from a new scheduler with an older (than "now") ltime, we retain the newer ltime. Each time a tenant reconfiguration event is submitted, the ltime of that reconfiguration event is stored on the trigger event queue. This is then used as the min_reconfigure_ltime attribute on the forwarded trigger events. This is updated by new schedulers, and ignored by old ones, so if an old scheduler process a tenant trigger event queue it won't update the min ltime. That will just mean that any events processed by a new scheduler may continue to use an older ltime as their minimum, which should not cause a problem. Any events forwarded by an old scheduler will omit the min_reconfigure_ltime field; that field will be initialized to -1 when loaded on a new scheduler. When processing pipeline trigger events: In process_pipeline_trigger_queue we compare two values: the last_reconfigure_event_ltime on the layout state which is either set to a value as above (by a new scheduler), or will be -1 if it was last written by an old scheduler (including in the case it was overwritten by an old scheduler; it will re-initialize to -1 in that case). The event.min_reconfigure_ltime field will either be the most recent reconfiguration ltime seen by a new scheduler forwarding trigger events, or -1 otherwise. If the min_reconfigure_ltime of an event is -1, we retain the old behavior of processing the event regardless. Only if we have a min_reconfigure_ltime > -1 and it is greater than the layout state last_reconfigure_event_ltime (which itself may be -1, and thus less than the min_reconfigure_ltime) do we abort processing the event. (The test_config_update test for the Gerrit checks plugin is updated to include an extra waitUntilSettled since a potential test race was observed during development.) Change-Id: Icb6a7858591ab867e7006c7c80bfffeb582b28ee	2022-07-18 10:51:59 -07:00
Zuul	c37047fa92	Merge "Replace 'web' section with 'webclient'"	2022-07-01 08:10:57 +00:00
James E. Blair	603b826911	Add --wait-for-init scheduler option This instructs the scheduler to wait until all tenants have been initialized before processing pipelines. This can be useful for large systems with excess scheduler capacity to speed up a rolling restart. This also removes an unused instance variable from SchedulerTestManager. Change-Id: I19e733c881d1abf636674bf572f4764a0d018a8a	2022-06-18 07:57:49 -07:00
Vitaliy Lotorev	ab68665f12	Replace 'web' section with 'webclient' 'web' section is used by zuul-web component while zuul REST API client uses 'webclient' section. Change-Id: I145c9270ca6676abd0d4977ce1c4c637d304a264	2022-06-05 17:47:17 +03:00
Zuul	6cb2692101	Merge "Add prune-database command"	2022-06-01 21:28:53 +00:00
James E. Blair	3ffbf10f25	Add prune-database command This adds a zuul-admin command which allows operators to delete old database entries. Change-Id: I4e277a07394aa4852a563f4c9cdc39b5801ab4ba	2022-05-30 07:31:16 -07:00
James E. Blair	591d7e624a	Unify service stop sequence We still had some variations in how services stop. Finger, merger, and scheduler all used signal.pause in a while loop which is incompatible with stopping via the command socket (since we would always restart the pause). Sending these components a stop or graceful signal would cause them to wait forever. Instead of using signal.pause, use the thread.join methods within a while loop, and if we encounter a KeyboardInterrupt (C-c) during the join, call our exit handler and retry the join loop. This maintains the intent of the signal.pause loop (which is to make C-c exit cleanly) while also being compatible with an internal stop issued via the command socket. The stop sequence is now unified across all components. The executor has an additional complication in that it forks a process to handle streaming. To keep a C-c shutdown clean, we also handle a keyboard interrupt in the child process and use it to indicate the start of a shutdown. In the main executor process, we now close the socket which is used to keep the child running and then wait for the child to exit before the main process exits (so that the child doesn't keep running and emit a log line after the parent returns control to the terminal). Change-Id: I216b76d6aaf7ebd01fa8cca843f03fd7a3eea16d	2022-05-28 10:27:50 -07:00
Matthieu Huin	57c78c08e1	Clarify zuul admin CLI scope We have two CLIs: zuul-client for REST-related operations, which cover tenant-scoped, workflow modifying actions such as enqueue, dequeue and promote; and zuul which supercedes zuul-client and covers also true admin operations like ZooKeeper maintenance, config checking and issueing auth tokens. This is a bit confusing for users and operators, and can induce code duplication. * Rename zuul CLI into zuul-admin. zuul is still a valid endpoint and will be removed after next release. * Print a deprecation warning when invoking the admin CLI as zuul instead of zuul-admin, and when running autohold-, enqueue-, dequeue and promote subcommands. These subcommands will need to be run with zuul-client after next release. * Clarify the scopes and deprecations in the documentation. Change-Id: I90cf6f2be4e4c8180ad0f5e2696b7eaa7380b411	2022-05-19 15:35:30 +02:00
James E. Blair	864a2b7701	Make a global component registry We generally try to avoid global variables, but in this case, it may be helpful to set the component registry as a global variable. We need the component registry to determine the ZK data model API version. It's relatively straightforward to pass it through the zkcontext for zkobjects, but we also may need it in other places where we might alter processing of data we previously got from zk (eg, the semaphore cleanup). Or we might need it in serialize or deserialize methods of non-zkobjects (for example, ChangeKey). To account for all potential future uses, instantiate a global singleton object which holds a registry and use that instead of local-scoped component registry objects. We also add a clear method so that we can be sure unit tests start with clean data. Change-Id: Ib764dbc3a3fe39ad6d70d4807b8035777d727d93	2022-02-14 10:58:34 -08:00
James E. Blair	a160484a86	Add zuul-scheduler tenant-reconfigure This is a new reconfiguration command which behaves like full-reconfigure but only for a single tenant. This can be useful after connection issues with code hosting systems, or potentially with Zuul cache bugs. Because this is the first command-socket command with an argument, some command-socket infrastructure changes are necessary. Additionally, this includes some minor changes to make the services more consistent around socket commands. Change-Id: Ib695ab8e7ae54790a0a0e4ac04fdad96d60ee0c9	2022-02-08 14:14:17 -08:00
James E. Blair	29fbee7375	Add a model API version This is a framework for making upgrades to the ZooKeeper data model in a manner that can support a rolling Zuul system upgrade. Change-Id: Iff09c95878420e19234908c2a937e9444832a6ec	2022-01-27 12:19:11 -08:00
Zuul	4808bc025e	Merge "Add "zuul delete-pipeline-state" command"	2022-01-27 11:26:26 +00:00
James E. Blair	65da4efdd4	Add "zuul delete-pipeline-state" command This is intended to aid Zuul developers who are diagnosing a bug with a running Zuul and who have determined that Zuul may be able to correct the situation and resume if a pipeline is completely reset. It is intrusive and not at all guaranteed to work. It may make things worse. It's basically just a convenience method to avoid firing up the REPL and issuing Python commands directly. I can't enumerate the requirements where it may or may not work. Therefore the documentation recommends against its use and there is no release note included. Nevertheless, we may find it useful to have such a command during a crisis in the future. Change-Id: Ib637c31ff3ebbb2733a4ad9b903075e7b3dc349c	2022-01-26 16:36:04 -08:00
James E. Blair	215c96f500	Remove gearman server The gearman server is no longer required. Remove it from tests and the scheduler. Change-Id: I34eda003889305dadec471930ab277e31d78d9fe	2022-01-25 06:44:17 -08:00
James E. Blair	3aa546da86	Remove the rpc client and listener These are not used any more, remove them from the scheduler and the "zuul" client. Change-Id: I5a3217dde32c5f8fefbb0a7a8357a737494d2956	2022-01-25 06:44:09 -08:00
Tristan Cacqueray	cb13bdb90c	Remove ZooKeeperClient for tenant-conf-check This change enables running the tenant-conf-check without access to the ZooKeeper service. Change-Id: I285cd44f86e5d900715b052b13bf7b2bc58e77a4	2022-01-10 20:04:02 +00:00
James E. Blair	704fef6cb9	Add readiness/liveness probes to prometheus server To facilitate automation of rolling restarts, configure the prometheus server to answer readiness and liveness probes. We are 'live' if the process is running, and we are 'ready' if our component state is either running or paused (not initializing or stopped). The prometheus_client library doesn't support this directly, so we need to handle this ourselves. We could create yet another HTTP server that each component would need to start, or we could take advantage of the fact that the prometheus_client is a standard WSGI service and just wrap it in our own WSGI service that adds the extra endpoints needed. Since that is far simpler and less resounce intensive, that is what this change does. The prometheus_client will actually return the metrics on any path given to it. In order to reduce the chances of an operator configuring a liveness probe with a typo (eg '/healthy/ready') and getting the metrics page served with a 200 response, we restrict the metrics to only the '/metrics' URI which is what we specified in our documentation, and also '/' which is very likely accidentally used by users. Change-Id: I154ca4896b69fd52eda655209480a75c8d7dbac3	2021-12-09 07:37:29 -08:00
Clark Boylan	5b1ba567c8	Prevent duplicate config file entries It is currently possible to list default zuul config file paths in the extra-config-paths config directive. Doing so will duplicate the configs in Zuul which can cause problems. Prevent this entirely via configuration validation. Note: There has been a bit of refactoring to ensure that the voluptuous schema is validated when reading the config. This ensures that an invalid config doesn't produce hard to understand error messages because loadTPCs() has attempted to process configuration that isn't valid. Instead we can catch schema errors early and report them with human friendly messages. Change-Id: I07e9d4d3614cbc6cdee06b2866f7ae41d7779135	2021-11-15 15:16:25 -08:00
Simon Westphahl	59edeaf3d1	Use pipeline summary from Zookeeper in zuul-web With this change zuul-web will generate the status JSON on its own by directly using the data from Zookeeper. This includes the event queue lengths as well as the pipeline summary. Change-Id: Ib80d9c019a15dd9de9d694cb62fd34030016c311	2021-11-10 09:49:48 +01:00
Felix Edel	791c99f64f	Load system config and tenant layouts in zuul-web This uses the configloader in zuul-web to load the system config and tenant layouts directly from ZooKeeper. Doing so will allow us to provide the necessary information for most API endpoints directly in zuul-web without the need to ask the scheduler via RPC for it. Change-Id: I4fe19c4e41f3357a07b2fda939c5ffb4e7055e37	2021-11-10 09:25:45 +01:00
Felix Edel	3029b16489	Make the ConfigLoader work independently of the Scheduler This is an early preparation step for removing the RPC calls between zuul-web and the scheduler. We want to format the status JSON and do the job freezing (job freezing API) directly in zuul-web without utilising the scheduler via RPC. In order to make this work, zuul-web must instantiate a ConfigLoader. Currently this would require a scheduler instance which is not available in zuul-web, thus we have to make this parameter optional. Change-Id: I41214086aaa9d822ab888baf001972d2846528be	2021-11-10 09:15:53 +01:00
Felix Edel	2c900c2c4a	Split up registerScheduler() and onLoad() methods This is an early preparation step for removing the RPC calls between zuul-web and the scheduler. In order to do so we must initialize the ConfigLoader in zuul-web which requires all connections to be available. Therefore, this change ensures that we can load all connections in zuul-web without providing a scheduler instance. To avoid unnecessary traffic from a zuul-web instance the onLoad() method initializes the change cache only if a scheduler instance is available on the connection. Change-Id: I3c1d2995e81e17763ae3454076ab2f5ce87ab1fc	2021-11-09 09:17:43 +01:00
Clark Boylan	d7bca47d35	Cleanup empty secrets dirs when deleting secrets The zuul delete-keys command can leave us with empty org and project dirs in zookeeper. When this happens the zuul export-keys command complaisn about secrets not being present. Address this by checking if the project dir and org dir should be cleaned up when calling delete-keys. Note this happend to OpenDev after renaming all projects from foo/* to bar/* orphaning the org level portion of the name. Change-Id: I6bba5ea29a752593b76b8e58a0d84615cc639346	2021-10-19 09:38:21 -07:00
Albin Vass	6e96fcfc67	Exit sucessfully when manipulating project keys Change-Id: Idb2918fab4d17aa611bf81f42d5b86abc865514f	2021-09-21 16:04:29 +02:00
James E. Blair	e2dd49b5be	Add delete-state command to delete everything from ZK This will give operators a tool for manual recovery in case of emergency. Change-Id: Ia84beb08b685f59a24f76cb0b6adf518f6e64362	2021-08-24 10:07:41 -07:00
James E. Blair	a0af6004de	Add copy-keys and delete-keys zuul client commands These can be used when renaming a project. Change-Id: I98cf304914449622f9db48651b83e0744b676498	2021-08-24 10:07:41 -07:00
James E. Blair	49d945b5bd	Add commands to export/import keys to/from ZK This removes the filesystem-based keystore in favor of only using ZooKeeper. Zuul will no longer load missing keys from the filesystem, nor will it write out decrypted copies of all keys to the filesystem. This is more secure since it allows sites better control over when and where secret data are written to disk. To provide for system backups to aid in disaster recovery in the case that the ZK data store is lost, two new scheduler commands are added: * export-keys * import-keys These write the password-protected versions of the keys (in fact, a raw dump of the ZK data) to the filesystem, and read the same data back in. An administrator can invoke export-keys before performing a system backup, and run import-keys to restore the data. A minor doc change recommending the use of ``zuul-scheduler stop`` was added as well; this is left over from a previous version of this change but warrants updating. This also removes the test_keystore test file; key generation is tested in test_v3, and key usage is tested by tests which have encrypted secrets. Change-Id: I5e6ea37c94ab73ec6f850591871c4127118414ed	2021-08-24 10:07:41 -07:00
Zuul	970e4ed438	Merge "Move sigterm_method to zuul.conf"	2021-08-23 18:22:27 +00:00
Zuul	812c2250bc	Merge "Add graceful stop environment variable"	2021-08-23 18:22:25 +00:00
James E. Blair	d80555a453	Move sigterm_method to zuul.conf Instead of using an environment variable for this particular setting, do what we do for every other aspect of Zuul behavior: use a setting in zuul.conf. Change-Id: I5c075dce5b6ad23adc863252af67d7ee7ad0d4d5	2021-08-12 14:22:39 -07:00
Zuul	cdcb895323	Merge "Move fingergw config to fingergw"	2021-07-24 12:54:01 +00:00
James E. Blair	7256c52c34	Add graceful stop environment variable Add an environment variable that lets users (especially container image users) easily select which way they would like zuul-executor to handle SIGTERM. Previous change: I8d42ea1c19f3e627bbfd32a535493de0cb8a04be Change-Id: Ie15b333712302a3d8f468b083d071d29a7b9043d	2021-07-09 10:36:22 -07:00
James E. Blair	657d8c6fb2	Revert "Add graceful stop environment variable" This reverts commit `f1fca03fd1`. This needs more discussion. Change-Id: Iebf5c01e4436899a9d6e37150337dcdb4cf9705f	2021-07-09 10:25:47 -07:00
Zuul	2743cb269b	Merge "Add graceful stop environment variable"	2021-07-09 16:15:18 +00:00
James E. Blair	f1fca03fd1	Add graceful stop environment variable Add an environment variable that lets users (especially container image users) easily select which way they would like zuul-executor to handle SIGTERM. Change-Id: I8d42ea1c19f3e627bbfd32a535493de0cb8a04be	2021-07-09 08:02:15 -07:00

1 2 3 4 5 ...

432 Commits