zuul/zuul - zuul - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
James E. Blair	1f026bd49c	Finish circular dependency refactor This change completes the circular dependency refactor. The principal change is that queue items may now include more than one change simultaneously in the case of circular dependencies. In dependent pipelines, the two-phase reporting process is simplified because it happens during processing of a single item. In independent pipelines, non-live items are still used for linear depnedencies, but multi-change items are used for circular dependencies. Previously changes were enqueued recursively and then bundles were made out of the resulting items. Since we now need to enqueue entire cycles in one queue item, the dependency graph generation is performed at the start of enqueing the first change in a cycle. Some tests exercise situations where Zuul is processing events for old patchsets of changes. The new change query sequence mentioned in the previous paragraph necessitates more accurate information about out-of-date patchsets than the previous sequence, therefore the Gerrit driver has been updated to query and return more data about non-current patchsets. This change is not backwards compatible with the existing ZK schema, and will require Zuul systems delete all pipeline states during the upgrade. A later change will implement a helper command for this. All backwards compatability handling for the last several model_api versions which were added to prepare for this upgrade have been removed. In general, all model data structures involving frozen jobs are now indexed by the frozen job's uuid and no longer include the job name since a job name no longer uniquely identifies a job in a buildset (either the uuid or the (job name, change) tuple must be used to identify it). Job deduplication is simplified and now only needs to consider jobs within the same buildset. The fake github driver had a bug (fakegithub.py line 694) where it did not correctly increment the check run counter, so our tests that verified that we closed out obsolete check runs when re-enqueing were not valid. This has been corrected, and in doing so, has necessitated some changes around quiet dequeing when we re-enqueue a change. The reporting in several drivers has been updated to support reporting information about multiple changes in a queue item. Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1 Depends-On: https://review.opendev.org/907627	2024-02-09 07:39:40 -08:00
Simon Westphahl	c963526560	Add Zuul event id to merge completed events Return the Zuul event ID that is already part of the merge request with the merge result event so logs can be correlated. Change-Id: I018709cd4d4afa562e6851d0d52c1ddd7583dc62	2023-08-08 12:02:36 +02:00
Simon Westphahl	f1e3d67608	Trace merge requests and merger operations The span info for the different merger operations is stored on the request and will be returned to the scheduler via the result event. This also adds the request UUID to the "refstat" job so that we can attach that as a span attribute. Change-Id: Ib6ac7b5e7032d168f53fe32e28358bd0b87df435	2022-09-19 11:25:49 +02:00
James E. Blair	458ba317fd	Add pipeline-based merge op metrics So that operators can see in aggregate how long merge, files-changes, and repo-state merge operations take in certain pipelines, add metrics for the merge operations themselves (these exclude the overhead of pipeline processing and job dispatching). Change-Id: I8a707b8453c7c9559d22c627292741972c47c7d7	2022-07-12 10:25:59 -07:00
James E. Blair	61cb275480	Report which repo failed initial merge ops When the initial merge job for a queue item fails, users typically see a message saying "this project or one of dependencies failed to merge". To help users and/or administrators more quickly identify the problem, include connection project and change information in a warning message posted to the code review system. Change-Id: If1bced80b87b908f63867083efb306ebe02ed1ee	2022-02-20 13:06:39 -08:00
James E. Blair	66008900a8	Send synthetic merge completed events on cleanup When a merger crashes, the scheduler identifies merge jobs which were left in an incomplete state and cleans them up. However there may be queue items waiting for merge complete events, and nothing generates those in this case. Update the merge job cleanup procedure to mimic the executor job cleanup procedure which, in addition to deleting the incomplete job requests, also creates synthetic complete events in order to prompt the scheduler to resume processing. Change-Id: Idea384f636a0cd9e8c82ee92d3f5a65bef0889f2	2021-09-20 10:37:39 -07:00
James E. Blair	97a76de403	Fix race involving job request locks It's possible for the following sequence to occur (prefixed by thread ids): 2> process job request cache update 1> finish job 1> set job request state to complete 1> unlock job request 1> delete job request 1> delete job request lock 2> get cached list of running jobs for lostRequests, start examining job 2> check if the job is unlocked (this will re-create the lock dir and return true) 2> attempt to set job request state to complete (this will raise JobRequestNotFound) 2> bail This leaves a lock node laying around. We have a cleanup process that will eventually remove it in production, but it's existence can cause the clean-state checks at the end of unit tests to fail. To correct this: a) Try to avoid re-creating the lock (though this is not possible in all cases) b) If we encounter JobRequestNotFound error in the cleanup, attempt to delete the job nonetheless (so that we re-delete the lock dir) The remove method is also made entirely idemptotent to support this. Change-Id: I49ad5c38a3c6cbaf0962e805b6c228e36b97a3d2	2021-09-14 09:10:34 -07:00
Simon Westphahl	5e78afd6f9	Fix wrong call to unlock requests in merger client Change-Id: Ic519132f211dc3613023e2bc2bd8f11b29c9ac42	2021-09-06 07:15:14 +02:00
James E. Blair	6a0b5c419c	Several merger cleanups This change contains several merger-related cleanups which seem distinct but are intertwined. * Ensure that the merger API space in ZK is empty at the end of all tests. This assures us that we aren't leaking anything. * Move some ZK untility functions into the base test class. * Remove the extra zk_client created in the component registry test since we can use the normal ZK client. * The null result value in the merger server is initialized earlier to make sure that it is initalized for use in the exception handler. * The test_branch_delete_full_reconfiguration leaked a result node because one of the cat jobs fails, and later cat jobs are run but ignored. To address the last point, we need to make a change to the cat job handling. Currently, when a cat job fails, the exception bubbles up and we simply ignore all the remaining jobs. The mergers will run them, write results to ZK, but no one will see those results. That would be fine, except that we created a "waiter" node in ZK to indicate we want to see those results, and as long as it exists, the results won't be deleted by the garbage collecter, yet we are no longer waiting for them, so we won't delete them either. To correct that, we store the merge job request path on the job future. Then, when the first cat job fails, we "cancel" all the cat jobs. That entails deleting the merge job request if we are able (to save the mergers from having to do useless work), and regardless of whether that succeeds, we delete the waiter node in ZK. If a cat job happens to be running (and if there's more than one, like in this test case, it likely is), it will eventually complete and write its result data. But since we have removed the waiter node, the periodic cleanup task will detect it as leaked data and delete. Change-Id: I49a459debf5a6c032adc60b66bbd8f6a5901bebe	2021-08-19 15:01:49 -07:00
James E. Blair	a729d6c6e8	Refactor Merger/Executor API The Merger and executor APIs have a lot in common, but they behave slightly differently. A merger needs to sometimes return results. An executor needs to have separate queues for zones and be able to pause or cancel jobs. This refactors them both into a common class which can handle job state changes (like pause/cancel) and return results if requested. The MergerApi can subclass this fairly trivially. The ExecutorApi adds an intermediate layer which uses a DefaultKeyDict to maintain a distinct queue for every zone and then transparently dispatches method calls to the queue object for that zone. The ZK paths for both are significantly altered in this change. Change-Id: I3adedcc4ea293e43070ba6ef0fe29e7889a0b502	2021-08-06 15:40:46 -07:00
Felix Edel	8038f9f75c	Execute merge jobs via ZooKeeper This is the second part of I767c0b4c5473b2948487c3ae5bbc612c25a2a24a. It uses the MergerAPI. Note: since we no longer have a central gearman server where we can record all of the merge jobs, some tests now consult the merger api to get the list of merge jobs which were submitted by that scheduler. This should generally be equivalent, but something to keep in mind as we add multiple schedulers. Change-Id: I1c694bcdc967283f1b1a4821df7700d93240690a	2021-08-06 15:40:41 -07:00
James E. Blair	04ac8287b6	Match tag items against containing branches To try to approach a more intuitive behavior for jobs which apply to tags but are defined in-repo (or even for centrally defined jobs which should behave differently on tags from different branches), look up which branches contain the commit referenced by a tag and use that list in branch matchers. If a tag item is enqueued, we look up the branches which contain the commit referenced by the tag. If any of those branches match a branch matcher, the matcher is considered to have matched. This means that if a release job is defined on multiple branches, the branch variant from each branch the tagged commit is on will be used. A typical case is for a tagged commit to appear in exactly one branch. In that case, the most intuitive behavior (the version of the job defined on that branch) occurs. A less typical but perfectly reasonable case is that there are two identical branches (ie, stable has just branched from master but not diverged). In this case, if an identical commit is merged to both branches, then both variants of a release job will run. However, it's likely that these variants are identical anyway, so the result is apparently the same as the previous case. However if the variants are defined centrally, then they may differ while the branch contents are the same, causing unexpected behavior when both variants are applied. If two branches have diverged, it will not be possible for the same commit to be added to both branches, so in that case, only one of the variants will apply. However, tags can be created retroactively, so that even if a branch has diverged, if a commit in the history of both branches is tagged, then both variants will apply, possibly producing unexpected behavior. Considering that the current behavior is to apply all variants of jobs on tags all the time, the partial reduction of scope in the most typical circumstances is probably a useful change. Change-Id: I5734ed8aeab90c1754e27dc792d39690f16ac70c Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>	2020-03-06 13:29:18 -08:00
Tobias Henkel	5f423346aa	Filter out unprotected branches from builds if excluded When working with GitHub Enterprise the recommended working model is branch&pull within the same repo. This is especially necessary for workflows that combine multiple repos in a single workspace. This has the side effect that those repos can contain a large number of branches that never will be part of a job. Having many branches in a repo can have a large impact on the executor performance so exclude them from the repo state if we exclude them in the tenant config. This change only affects branches, not tags or other references. Change-Id: Ic8e75fa8bf76d2e5a0b1779fa3538ee9a5c43411	2019-06-25 20:49:54 +02:00
Tobias Henkel	7639053905	Annotate merger logs with event id If we have an event we should submit its id also to the merger so we're able to trace merge operations via an event id. Change-Id: I12b3ab0dcb3ec1d146803006e0ef644e485a7afe	2019-05-17 06:11:04 +02:00
Tobias Henkel	e69c9fe97b	Make git clone timeout configurable When dealing with large repos or slow connections to the scm the default clone timeout of 5 minutes may not be sufficient. Thus a configurable clone/fetch timeout can make it possible to handle those repos. Change-Id: I0711895806b7cbcc8b9fa3ba085bcf79d7fb6665	2019-01-31 11:17:05 +01:00
Zuul	91e7e680a1	Merge "Use gearman client keepalive"	2019-01-28 20:09:30 +00:00
Tobias Henkel	8bfc0cd409	Delay Github fileschanges workaround to pipeline processing Github pull requests files API only returns at max the first 300 changed files of a PR in alphabetical order. Change I10a593e26ac85b8c12ca9c82051cad809382f50a introduced a workaround that queries the file list from the mergers within the github event loop. While this was a minimal invasive approach this can cause multi-minute delays in the github event queue. This can be fixed by making this query asynchronous and delaying it to the pipeline processing. This query is now handled the same way as merge requests. Change-Id: I9c77b35f0da4d892efc420370c04bcc7070c7676 Depends-On: https://review.openstack.org/625596	2018-12-18 13:30:14 +01:00
Tobias Henkel	fb4c6402a4	Use gearman client keepalive If the gearman server vanishes (e.g. due to a VM crash) some clients like the merger may not notice that it is gone. They just wait forever for data to be received on an inactive connection. In our case the VM containing the zuul-scheduler crashed and after the restart of the scheduler all mergers were waiting for data on the stale connection which blocked a successful scheduler restart. Using tcp keepalive we can detect that situation and let broken inactive connections be killed by the kernel. Depends-On: I8589cd45450245a25539c051355b38d16ee9f4b9 Change-Id: I30049d59d873d64f3b69c5587c775827e3545854	2018-12-11 21:28:59 +01:00
Fabien Boucher	194a2bf237	Git driver This patch improves the existing git driver by adding a refs watcher thread. This refs watcher looks at refs added, deleted, updated and trigger a ref-updated event. When a refs is updated and that the related commits from oldrev to newrev include a change on .zuul.yaml/zuul.yaml or zuul.d/*.yaml then tenants including that ref is reconfigured. Furthermore the patch includes a triggering model. Events are sent to the scheduler so jobs can be attached to a pipeline for running jobs. Change-Id: I529660cb20d011f36814abe64f837945dd3f1f33	2017-12-15 14:32:40 +01:00
James E. Blair	3b5b335ca2	Abort reconfiguration when cat jobs fail Currently, if a cat job fails during reconfiguration, we simply proceed without that section of the config, which usually doesn't work out well. Instead, raise an exception which will abort the reconfiguration. Change-Id: I87f2d870f007e3df5f47c04ef49add27c8a0b554	2017-09-12 09:40:06 -06:00
James E. Blair	289f5930fa	Ensure ref-updated jobs run with their ref We were incorrectly preparing the current state of the repo for ref updated (eg, post) jobs. This ensures that we run with the actual supplied ref, even if the remote has moved on since then. Change-Id: I52f05406246e6e39805fd8365412f3cb77fe3a0a	2017-08-02 16:56:18 -07:00
Tobias Henkel	34ee088603	Remove zuul_url from merger config Currently the zuul_url is not used anywhere but still a required merger setting. This removes it. Change-Id: I627c8a18015f4c148c28d2f7e735b30cc1ef3862	2017-07-31 22:28:35 +02:00
Tristan Cacqueray	829e617bac	Add support for zuul.d configuration split This change implements the zuul_split spec to support configuration split in a zuul.d directory. Change-Id: I6bc7250b2045b73dfba109aa0b2f1ba5d66752b2	2017-07-10 05:13:42 +00:00
Tristan Cacqueray	91601d788e	config: refactor config get default This change adds a new get_default library procedure to simplify getting default value of config object. Change-Id: I0546b1175b259472a10690273af611ef4bad5a99	2017-06-17 02:00:50 +00:00
Paul Belanger	0a21f0a1d5	Add ssl support to gearman / gearman_server Enable SSL support for gearman. We also created an new SSLZuulBaseTest class to provide a simple way to use SSL end to end where possible. A future patch will enable support in zookeeper. Change-Id: Ia8b89bab475d758cc6a021988f8d79ead8836a9d Signed-off-by: Paul Belanger <pabelanger@redhat.com>	2017-06-14 10:10:45 -04:00
James E. Blair	1960d687c9	Use previously stored repo state on executor When the initial speculative merge for a change is performed at the request of the pipeline manager, the repo state used to construct that merge is saved in a data structure. Pass that structure to the executor when running jobs so that, after cloning each repo into the jobdir, the repos are made to appear the same as those an the merger before it started its merge. The subsequent merge operatons on the executor will repeat the same operations producing the same content (though the actual commits will be different due to timestamps). It would be more efficient to have the executors pull changes from the mergers, however, that would require the mergers to run an accessible git service, which is one of the things that adds significant complexity to a zuul deployment. This method only requires that the mergers be able to initiate outgoing connections to gearman and sources. Because the initial merge may happen well before jobs are executed, save the dependency chain for a given BuildSet when it's configuration is being finalized. This will cause us to save not only the repository configuration that the merger uses, but also the exact sequence of changes applied on top of that state. (Currently, we build the series of changes we apply before running each job, however, the queue state can change (especially if items are merged) in the period between the inital merge and job launch). The initial merge is performed before we have a shadow layout for the item, yet, we must specify a merge mode for each project for which we merge a change. Currently, we are defaulting to the 'merge-resolve' merge mode for every project during the initial speculative merge, but then the secondary merge on the executor will use the correct merge mode since we have a layout at that point. With this change, where we are trying to replicate the initial merge exactly, we can't rely on that behavior any more. Instead, when attempting to find the merge mode to use for a project, we use the shadow layout of the nearest item ahead, or else the current live layout, to find the merge mode, and only if those fail, do we use the default. This means that a change to a project's merge-mode will not use that merge mode. However, subsequent changes will. This seems to be the best we can do, short of detecting this case and merging such changes twice. This seems rare enough that we don't need to do that. The test_delayed_merge_conflict method is updated to essentially invert the meaning of the test. Since the old behavior was for the initial merge check to be completely independent of the executor merge, this test examined the case where the initial merge worked but between that time and when the executor performed its merge, a conflicting change landed. That should no longer be possible since the executor merge now uses the results of the initial merge. We keep the test, but invert its final assertion -- instead of checking for a merge conflict being reported, we check that no merge conflict is reported. Change-Id: I34cd58ec9775c1d151db02034c342bd971af036f	2017-05-24 14:19:14 -07:00
James E. Blair	34c7daaaa4	Store initial repo state in the merger When we ask a merger to speculatively merge changes, record the complete starting state of each repo (defined as all of the refs other than Zuul refs) and return that at the completion of all of the merges. This will later be used so that when a pipeline manager asks a merger to speculatively merge a change, the process can later be repeated by the (potentially multiple) executors which will end up running jobs for the change. Between the time that the merger runs and the jobs run, the underlying repos may have changed. This ensures a consistent state throughout. The facility which used saved zuul refs within the merger repo to short-cut the merge sequence for an additional change added to a previously completed merge sequence is removed, because in that case, we would not be able to know the original repo state for the earlier merge sequence. This is slightly less efficient, however, we are proposing removing zuul refs anyway due to the maintenance burden they cause. Change-Id: If0215d53c3b08877ded7276955a55fc5e617b244	2017-05-24 14:19:13 -07:00
Jenkins	9e958254bb	Merge "Remove unused merger:update task" into feature/zuulv3	2017-05-20 17:21:35 +00:00
Clint Byrum	e5c4afa94c	Use gear Text interface This makes the transition to python3 much smoother. Change-Id: I9d8638dd98502bdd91cbe6caf3d94ce197f06c6f Depends-On: If6bfc35d916cfb84d630af59f4fde4ccae5187d4 Depends-On: I93bfe33f898294f30a82c0a24a18a081f9752354	2017-05-19 06:39:15 -07:00
Jesse Keating	ba2f93c5a2	Remove unused merger:update task This task is no longer used and was the last thing that merger claimed to do that executor did not. Now what merger does is a subset of executor, thus merger can scale out to handle things that leave the executor(s) free to focus on running jobs. Change-Id: Ibc8638cf7c2109d9b32c27fb98fb84605f5d5ac0 Signed-off-by: Jesse Keating <omgjlk@us.ibm.com>	2017-05-17 10:59:29 -07:00
James E. Blair	2a53567014	Use connection to qualify projects in merger Fully qualify projects in the merger with connection names. This lets us drop the URL parameter (which always seemed unecessary, as the merger can figure that out on its own given a uniquely identified project). On disk, use the canonical hostname, so that the checked out versions of repositories include the canonical hostname, and so that repos on mergers survive changes in connection names. This simplifies both the API and the JSON data structure passed to the merger. The addProject method of the merger is flagged as an internal method now, as all "public" API methods indirectly call it. In the executor, after cloning and merging are completed, the 'origin' remote is removed from the resulting repositories since it may not be valid for use within a running job. Change-Id: Idcc9808948b018a271b32492766a96876979d1fa	2017-04-27 15:47:45 -07:00
James E. Blair	e47eb770dd	Add some gearman related debugging Make sure all clients are identified. Log the port on which the gearman server is listening in tests. Log the arguments for the launch job. Change-Id: Ia99ea5272241799aa8dd089bdb99f6058838ddff	2017-02-06 10:11:14 -08:00
James E. Blair	8b1dc3fb22	Add dynamic reconfiguration If a change alters .zuul.yaml in a repo that is permitted to use in-repo configuration, create a shadow configuration layout specifically for that and any following changes with the new configuration in place. Such configuration changes extend only to altering jobs and job trees. More substantial changes such as altering pipelines will be ignored. This only applies to "project" repos (ie, the repositories under test which may incidentally have .zuul.yaml files) rather than "config" repos (repositories specifically designed to hold Zuul configuration in zuul.yaml files). This is to avoid the situation where a user might propose a change to a config repository (and Zuul would therefore run) that would perform actions that the gatekeepers of that repository would not normally permit. This change also corrects an issue with job inheritance in that the Job instances attached to the project pipeline job trees (ie, those that represent the job as invoked in the specific pipeline configuration for a project) were inheriting attributes at configuration time rather than when job trees are frozen when a change is enqueued. This could mean that they would inherit attributes from the wrong variant of a job. Change-Id: If3cd47094e6c6914abf0ffaeca45997c132b8e32	2016-07-18 09:58:19 -07:00
James E. Blair	14abdf44c0	Load in-repo configuration Change-Id: I225934407ce31f92a9b6df4bc282fbd5ec2968b3	2015-12-09 16:17:25 -08:00
James E. Blair	b1afc8089f	Improve merge client logging When submitting a job to the mergers, log more information about the job. Specifically the UUID will now be included for easier cross-correlation with completion events. Change-Id: Id92ae0c73f725da23761c59c97f0d39d64e802a9	2015-03-10 11:01:36 -07:00
James E. Blair	eb98aba7a2	Set gearman timeout to 300 In practice we are seeing that geard can occasionally get disrupted and then temporarily backlogged enough that it exceeds the 30 second timeout for submitting a job. To make Zuul less fragile in this case, increase the timeouts for any requests submitted to gearman. Change-Id: I12741bb259c1a78fa2446d764318f84df34bac67	2014-12-12 11:00:10 -08:00
James E. Blair	e9a8184fe0	Add precedence to merge jobs When creating a merge job, give it the precedence of the associated pipeline. Change-Id: I96c6a942a08f603ae7cce442427ae171d7e76d78	2014-09-25 08:35:55 -07:00
James E. Blair	4076e2b432	Split the merger into a separate process Connect it to Zuul via Gearman. Any number of mergers may be deployed. Directly find the pipeline for a build when processing a result, so that the procedure is roughly the same for build and merge results. The timer trigger currently requires the gerrit trigger also be configured. Make that explicit inside of the timer trigger so that the scheduler API interaction with triggers is cleaner. Change-Id: I69498813764753c97c426e42d17596c2ef1d87cf	2014-02-17 11:47:15 -08:00

38 Commits