zuul/zuul - zuul - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
Simon Westphahl	c8ec0b25b5	Cancel jobs of abandoned circular dep. change When a change that is part of a circular dependency is abandoned we'd set the item status to dequeued needing change. This will set all builds as skipped, overwriting exiting builds. This means that when the item was removed, we did not cancel any of the builds. For normal builds this mainly waste resources, but if there are paused builds, those will be leaked and continue running until the executor is force-restarted. The fix here is to cancel the jobs before setting it as dequeued needing change. Change-Id: If111fe1a21a1c944abcf460a6601293c255376d6	2024-04-11 12:26:54 +02:00
Zuul	a3abea408b	Merge "Emit per-branch queue stats separately"	2024-03-25 19:22:37 +00:00
Zuul	0496c249be	Merge "Reset jobs behind non-mergeable cycle"	2024-03-25 18:21:43 +00:00
Zuul	b0a7ed2899	Merge "Attempt to preserve triggering event across re-enqueues"	2024-03-25 10:09:13 +00:00
Simon Westphahl	349c6a029d	Don't reset buildset when cycle dependency merged In case a live change depends on a cycle and the cycle is merged while the item is still active the scheduler will detect the cycle as changed and re-enqueue the dependent change. The reason for this behavior is that we don't consider dependencies of merged changes when building the dependency graph. Change-Id: Ibc952886b56655c0705882497511b120e5a731cd	2024-03-21 13:35:50 -07:00
Simon Westphahl	305d4dbab9	Handle dependency limit errors more gracefully When the dependency graph exceeds the configured size we will raise an exception. Currently we don't handle those exceptions and let them bubble up to the pipeline processing loop in the scheduler. When this happens during trigger event processing this is only aborting the current pipeline handling run and the next scheduler will continue processing the pipeline as usual. However, in case where the item is already enqueued this exception can block the pipeline processor and lead to a hanging pipeline: ERROR zuul.Scheduler: Exception in pipeline processing: Traceback (most recent call last): File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 2370, in _process_pipeline while not self._stopped and pipeline.manager.processQueue(): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1800, in processQueue item_changed, nnfi = self._processOneItem( ^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1624, in _processOneItem self.getDependencyGraph(item.changes[0], dependency_graph, item.event, File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph self.getDependencyGraph(needed_change, dependency_graph, File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph self.getDependencyGraph(needed_change, dependency_graph, File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph self.getDependencyGraph(needed_change, dependency_graph, [Previous line repeated 8 more times] File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 813, in getDependencyGraph raise Exception("Dependency graph is too large") Exception: Dependency graph is too large To fix this, we'll handle the exception and remove the affected item. We'll also handle the exception during enqueue and ignore the trigger event in this case. Change-Id: I210c5fa4c568f2bf03eedc18b3e9c9a022628dc3	2024-03-19 14:37:26 +01:00
James E. Blair	6ccbdacdf2	Attempt to preserve triggering event across re-enqueues When a dependency cycle is updated, we will re-enqueue the changes in the cycle so that each of the changes goes thorugh the process of being added to the queue with the updated contents of the cycle. That may mean omitting changes from the cycle, or adding new ones, or even splitting into two cycles. In that case, in order to preserve the idea of the "triggering change", carry over the triggering change from when the cycle was originally enqueued. Note that this now exposes us to the novel idea that the triggering change may not be part of the queue item. Change-Id: I9e00009040f91d7edc31f4928e632edde4b2745f	2024-03-13 13:07:08 -07:00
James E. Blair	c2103f7058	Reset jobs behind non-mergeable cycle In the case of a dependency cycle, we check the mergeability of each change in the item before we try to merge any of them, and dequeue the item if it looks like one of them won't be able to merge. However, that bypasses the normal behavior where we reset changes behind failing items, which could lead to merging changes that were tested with changes ahead that did not merge. To correct this, update the cycle-can-not-be-merged dequeue stanza with a reset, to mirror the stanza below which handles the failure of any individual change to merge. Change-Id: I52a9fc2da4dd89131722d69d2b5dea886eb3d51c	2024-03-13 09:03:16 -07:00
James E. Blair	794545fc64	Emit per-branch queue stats separately We currently emit 4 statsd metrics for each shared queue, but in the case that a queue is configured as per-branch, we disregard the branch and emit the stats under the same hierarchy for any branch of that queue. This means that if we have a queue for integrated-master and a queue for integrated-stable at the same time, we would emit the stats for the master queue, then immediately emit the same stats for the stable queue, overwriting the master stats. To correct this, move the metrics down a level in the case that the queue is configured per-branch, and include the branch name in the key. Change-Id: I2f4b22394bc3774410a02ae76281eddf080e5c7f	2024-03-06 06:32:22 -08:00
James E. Blair	79a9f86c8d	Ignore circular dependencies in supercedent pipelines There are two issues with supercedent pipelines related to circular deps: 1) When operating in a post-merge configuration on changes (not refs), the pipeline manager would throw an exception starting with 10.0.0 because any time it operates on change objects, it attempts to collect the dependency cycle before enqueing a change, and starting with 10.0.0, the supercedent manager raises an exception in that case. 2) When operating in a pre-merge configuration on changes, the behavior regarding circular dependencies was undefined before 10.0.0. It is likely that they were ignored because the manager creates a dynamic queue based on the project-ref, but it wasn't explicitly documented or tested. To correct both of these: Override the cycleForChange method in the supercedent manager so that it always returns an empty cycle. Document the expected behavior. Add tests that cover the cases described above. Change-Id: Icf30d488334d40a929f31c2f390e18ae599a3c42	2024-03-04 10:50:23 -08:00
Zuul	617bbb229c	Merge "Fix validate-tenants isolation"	2024-02-28 02:46:55 +00:00
James E. Blair	ced306d5b1	Update gerrit changes more atomically The following problem was observed: Change A depends-on change B, which is in turn the tip of a patch series of several changes. Drivers warm the change cache on events by querying information about changes related to those events. But they don't process depends-on headers, which means most drivers only warm one change, and while the gerrit driver will follow other types of dependency links which are unique to it, it stops at depends-on boundaries. So in the example above, the only change in the cach which was warm was change A. The triggering event was enqueued, forwarded, and processed by two responding pipelines simultaneously on two executors. Each of them noticed the depends-on link and started querying gerrit for change B and its dependencies. One of the schedulers was about 1 second ahead of the other in this process. In the gerrit driver, there is a two phase process for updating changes. First the change itself is updated in the normal way common to all drivers, and then gerrit-specific dependency links are updated. That means the change is added to the change cache with no dependencies, then mutated to add dependencies later. The first scheduler added change B to the cache with no dependencies. The second scheduler saw the update and refreshed its copy of B. The second scheduler begin updating B, saw that the ltime of its copy of B was sufficiently new it didn't need to update the cache and stopped updating. The second scheduler enqueued changes A and B, but no others in its pipeline. The first scheduler finished querying the stack of changes ending at B, added them to the change cache, and mutated the entry for B in the cache. The first scheduler enqueued A, B, and the rest of the stack in its pipeline. The second scheduler updated its copy of B to include the new dependencies. The second scheduler ran a pipeline processor, noticed that B lacked dependencies, and dequeued A and B, and reported an error to the user. The good news is that Zuul noticed the mistake and dequeued the changes. To correct this, we will now collect all of the information about a change and its gerrit-specific dependencies before writing any of that information to the change cache. This means that in our example above, the second scheduler would not have aborted its queries. Eventually, both schedulers would end up with the same information before enqueing anything. This process is still not quite optimal, in that we will have multiple schedulers racing to update the same changes at the same time, but they are designed to deal with collisions like that, so it should at least be correct. A future area of work might be to investigate whether we can optimize this case further. Change-Id: I647c2b54a55789e521fca71c8c3814907df65da6	2024-02-22 06:37:31 -08:00
James E. Blair	1bec2014bc	Remove updateJobParentData method This method was added as part of the initial deduplication work in `959a0b9834`. Since we now collect parent data at the time that we run the job, this method doesn't actually do anything other than decide when jobs are ready to run. This change moves that logic back into the findJobsToRun method and removes the unecessary updateJobParentData method. Change-Id: Iac744a24ee3902360eeaef371808657a8eeb2080	2024-02-09 10:19:08 -08:00
James E. Blair	fa274fcf25	Remove most model_api backwards-compat handling Since we require deleting the state (except config cache) for the circular dependency upgrade, we can now remove any zookeeper backwards compatability handling not in the config cache. This change does so. It leaves upgrade handling for two attributes in the branch cache (even though they are old) because it is plausible that even systems that have been upgrading regularly may have non-upgraded data in the branch cache for infrequently used branches, and the cost for retaining the upgrade handling is small. Change-Id: I881578159b06c8c7b38a9fa0980aa0626dddad31	2024-02-09 07:39:55 -08:00
James E. Blair	ca83980bb7	Clean up safety check The safety check was originally written to detect when a dependncy cycle changed without the pipeline manager noticing. Since the dependency cycle refactor, items can have multiple changes and previous processes that were designed around updates to items causing cascading updates to items behind them (but in the same bundle) no longer make as much sense. However, the "safety check" now seems to make more sense as the primary method for determining that a dependency cycle has changed. It fits in well with other checks in pipeline processing now that it examines the situation for a single item. Resolve the temporary safety check by keeping it. It is cleaned up a bit and moved earlier in the pipeline processing. We can also clean up the slightly awkward silent dequeue/re-enqueue method and incorporate it into the safety check. Now the process is: If a dpendency cycle changes, dequeue the item without reporting and then re-enqueue all of the items changes. This means that if a dependency cycle (no matter how large) is split in half, we will keep both halves (now separate) in the pipeline. This behavior is likely to be the most intuitive to users. In general, there are two ways to update a dependency cycle: with a new patchset that changes the graph (typically only gerrit) or with a PR message or topic change (gerrit and others). To achieve some consistency between these methods, we reuse the same re-enqueue method in both cases (but in the case of a patchset superceding a change, we don't re-enqueue the old change; but we do expect the patchset-created event to enqueue the new version). The timing is still a little different, but the results are the same. Change-Id: Ifa42b081cbd103ef04d8814c27ab5c51aa5e8335	2024-02-09 07:39:53 -08:00
James E. Blair	1f026bd49c	Finish circular dependency refactor This change completes the circular dependency refactor. The principal change is that queue items may now include more than one change simultaneously in the case of circular dependencies. In dependent pipelines, the two-phase reporting process is simplified because it happens during processing of a single item. In independent pipelines, non-live items are still used for linear depnedencies, but multi-change items are used for circular dependencies. Previously changes were enqueued recursively and then bundles were made out of the resulting items. Since we now need to enqueue entire cycles in one queue item, the dependency graph generation is performed at the start of enqueing the first change in a cycle. Some tests exercise situations where Zuul is processing events for old patchsets of changes. The new change query sequence mentioned in the previous paragraph necessitates more accurate information about out-of-date patchsets than the previous sequence, therefore the Gerrit driver has been updated to query and return more data about non-current patchsets. This change is not backwards compatible with the existing ZK schema, and will require Zuul systems delete all pipeline states during the upgrade. A later change will implement a helper command for this. All backwards compatability handling for the last several model_api versions which were added to prepare for this upgrade have been removed. In general, all model data structures involving frozen jobs are now indexed by the frozen job's uuid and no longer include the job name since a job name no longer uniquely identifies a job in a buildset (either the uuid or the (job name, change) tuple must be used to identify it). Job deduplication is simplified and now only needs to consider jobs within the same buildset. The fake github driver had a bug (fakegithub.py line 694) where it did not correctly increment the check run counter, so our tests that verified that we closed out obsolete check runs when re-enqueing were not valid. This has been corrected, and in doing so, has necessitated some changes around quiet dequeing when we re-enqueue a change. The reporting in several drivers has been updated to support reporting information about multiple changes in a queue item. Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1 Depends-On: https://review.opendev.org/907627	2024-02-09 07:39:40 -08:00
James E. Blair	fb7d24b245	Fix validate-tenants isolation The validate-tenants scheduler subcommand is supposed to perform complete tenant validation, and in doing so, it interacts with zk. It is supposed to isolate itself from the production data, but it appears to accidentally use the same unparsed config cache as the production system. This is mostly okay, but if the loading paths are different, it could lead to writing cache errors into the production file cache. The error is caused because the ConfigLoader creates an internal reference to the unparsed config cache and therefore ignores the temporary/isolated unparsed config cache created by the scheduler. To correct this, we will always pass the unparsed config cache into the configloader. Change-Id: I40bdbef4b767e19e99f58cbb3aa690bcb840fcd7	2024-01-31 14:58:45 -08:00
James E. Blair	7262ef7f6f	Include job_uuid in NodeRequests This is part of the circular dependency refactor. It updates the NodeRequest object to include the job_uuid in addition to the job_name (which is temporarily kept for backwards compatability). When node requests are completed, we now look up the job by uuid if supplied. Change-Id: I57d4ab6c241b03f76f80346b5567600e1692947a	2023-12-20 10:44:04 -08:00
James E. Blair	9201f9ee28	Store builds on buildset by uuid This is part of the circular dependency refactor. This updates the buildset object in memory (and zk) to store builds indexed by frozen job uuid rather than job name. This also updates everal related fields and also temporary dictionaries to do the same. This will allow us, in the future, to have more than one job/build in a buildset with the same name (for different changes/refs). Change-Id: I70865ec8d70fb9105633f0d03ba7c7e3e6cd147d	2023-12-12 11:58:21 -08:00
James E. Blair	cb3c4883f2	Index job map by uuid This is part of the circular dependency refactor. It changes the job map (a dictionary shared by the BuildSet and JobGraph classes (BuildSet.jobs is JobGraph._job_map -- this is because JobGraph is really just a class to encapsulate some logic for BuildSet)) to be indexed by FrozenJob.uuid instead of job name. This helps prepare for supporting multiple jobs with the same name in a buildset. Change-Id: Ie17dcf2dd0d086bd18bb3471592e32dcbb8b8bda	2023-12-12 10:22:25 -08:00
James E. Blair	071c48c5ae	Freeze job dependencies with job graph This is part of the circular dependency refactor. Update the job graph to record job dependencies when it is frozen, and store these dependencies by uuid. This means our dependency graph points to actual frozen jobs rather than mere job names. This is a pre-requisite to being able to disambiguate dependencies later when a queue item supports multiple jobs with the same name. The behavior where we would try to unwind an addition to the job graph if it failed is removed. This was originally written with the idea that we would try to run as many jobs as possible if there was a config error. That was pre-zuul-v3 behavior. Long since, in all cases when we actually encounter an error adding to the job graph, we bail and report that to the user. No longer handling that case simplifies the code somewhat and makes it more future-proof (while complicating one of the tests that relied on that behavior as a shortcut). This attempts to handle upgrades by emulating the old behavior if a job graph was created on an older model version. Since it relies on frozen job uuids, it also attempts to handle the case where a frozenjob does not have a uuid (which is a very recent model change and likely to end up in the same upgrade for some users) by emulating the old behavior. Change-Id: I0070a07fcb5af950651404fa8ae66ea18c6ca006	2023-12-06 16:41:18 -08:00
Zuul	11c06b5939	Merge "Improve error reporting for circular dependencies"	2023-11-09 21:03:01 +00:00
Simon Westphahl	6c6872841b	Don't schedule initial merge for branch/ref items Currently we schedule a merge/repo-state for every item that is added to a pipeline. For changes and tags we need the initial merge in order to build a dynamic layout or to determine if a given job variant on a branch should match for a tag. For other change-types (branches/refs) we don't need the initial merge/repo-state before we can freeze the job graph. The overhead of those operations can become quite substantial for projects with a lot of branches that also have a periodic pipeline config, but only want to execute jobs for a small subset of those branches. With this change, branch/ref changes that don't execute any jobs will be removed without triggering any merge/repo state requests. In addition we will reduce the number of merge requests for branch/ref changes as the initial merge is skipped in all cases. Change-Id: I157ed52dba8f4e197b35798217b23ec7f035b2d9	2023-10-27 12:20:57 +02:00
James E. Blair	6fda08b8eb	Load configuration from unknown dynamic branches The always-dynamic-branches option specifies a regex such that branches that match it are ignored for Zuul configuration purposes, unless a change is proposed, at which point the zuul.yaml config is read from the branch in the same way as if a change was made to the file. Because creading and deleting dynamic branches do not cause reconfigurations, the list of project branches stored on a tenant may not be updated after a dynamic branch is created. This list is used to decide from what branches to try to load config files. Together, all of this means that if you create an always-dynamic-branch and propose a change to it shortly afterwords, Zuul is likely to ignore the change since it won't know to load configuration from its branch. To correct this, we extend the list of branches from which Zuul knows to read configuration with the branch of the item under test and any items ahead of it in the queue (but only if these branches match the dynamic config regex so that we don't include an excluded branch). Also add a log entry to indicate when we are loading dynamic configuration from a file. Change-Id: Ibd15ce4a154311cdb523c5603f4ad17f761d1078	2023-10-09 15:38:46 -07:00
Simon Westphahl	e92bb01447	Improve error reporting for circular dependencies Make it clear from the message reported to the change which project doesn't allow circlar dependencies. Change-Id: Id614265535dd6f2af419f7eda7dda9799f18ea56	2023-09-29 11:35:00 +02:00
Simon Westphahl	1da1c5e014	Fix child job skip with paused deduplicated parent When a build pauses, it can also return a list of child jobs to execute. If the paused build was deduplicated we need to call `setResult()` on all items that have that particular build. Change-Id: Iead5c02032bccf46852ee6b2c8adf714689aa2f5	2023-09-22 12:28:45 +02:00
Zuul	a75d640b8e	Merge "Add a bundle-changed safety check"	2023-09-19 17:05:03 +00:00
Zuul	d44b9875b0	Merge "Fix deduplicating child jobs in check"	2023-09-15 22:22:02 +00:00
Zuul	5b7b0aed5f	Merge "Fix deduplication with paused jobs"	2023-09-15 22:13:52 +00:00
Zuul	4b347ce91b	Merge "Avoid leaked items caused by config errors"	2023-09-15 18:41:33 +00:00
Zuul	5294c582b1	Merge "Fix deduplication of child jobs in check"	2023-09-15 18:24:14 +00:00
James E. Blair	9406bcc2d3	Add a bundle-changed safety check Several recent bugs and attempted fixes have shown that there may be some edge cases in the handling of dependency cycles that have the potential to cause jobs to run with the wrong changes in place. While we work on longer-term fixes to those, add a safety check to the pipeline processor so that if we detect a change to the bundle contents of a queue item, we remove the item from the queue. We may not necessarily perform the optimal behavior with this, but it should keep us from running jobs with known incorrect changes. This change requires some minor adjustment to some existing unit tests (it doesn't significantly change the outcome, but it does cause some jobs to be aborted sooner). A followup change will add some more tests which would fail without this change but merit separate review. Change-Id: Ia7b1d5b7e3d6910a709478082929f96364ca996b	2023-09-13 14:07:19 -07:00
James E. Blair	930c42cd28	Fix deduplicating child jobs in check If a second change is enqueued in check significantly after the first, then node requests for child jobs may not be deduplicated correctly. Before deduplicating a build, Zuul applies parent data to child jobs, then compares the child jobs to determine if they are equivalent. If so, then they are deduplicated. This only happens one level in the hierarchy at a time. Consider the case where a change is enqueued and both the parent and child jobs have completed (but the change is still in the queue waiting on a third, unrelated, job). If the second change in the bundle is enqueued, Zuul will: 1) Attempt to apply parent data to child jobs. Since no jobs have completed yet for this item, no parent data are applied. 2) Deduplicate jobs in the second change. Zuul will deduplicate the parent job at this point. 3) Zuul will compare the child jobs in the two changes and determine they are different because one has parent data and the other does not. 4) Zuul submits a node request for the child job. 5) On the next pipeline process, Zuul applies the parent data from the deduplicated parent job to the new child job. 6) Zuul deduplicates the child job, and the nodepool request is orphaned. To correct this, we will repeat the process of applying parent data to child jobs each time we find at least one build to deduplicate. That means that all existing parent data will be applied to all jobs on each pass through the pipeline processor no matter how deep the dependency hierarchy. Change-Id: Ifff17df40f0d59447f74cdde619246171279b553	2023-09-08 14:20:46 -07:00
James E. Blair	68f80f9749	Fix deduplication with paused jobs When a deduplicated job paused, it would not wait for all children across all queue items to complete before resuming; instead it would wait only for the children in its own queue item. Check all queue items a build is in before resuming it. Change-Id: Ic2dec3a6dc58230b0873d7e8ba474bc39ed28385	2023-09-08 12:54:33 -07:00
James E. Blair	742669ab09	Fix deduplication of child jobs in check When we deduplicate jobs, we intend to call setResult on all of the queue items with the deduplicated build. This only worked in dependent pipelines because we only looked for queue items in the current bundle. In independent pipelines, the queue items can be in different bundles. To resolve this, search for items with deduplicated builds in across the whole queue in independent pipelines (using the approach we use when deduplicating them to begin with). Change-Id: I16436710c47b4f22df39e0cd82d0e289b2293c32	2023-09-07 16:58:19 -07:00
James E. Blair	70c34607f5	Add support for limiting dependency processing To protect Zuul servers from accidental DoS attacks in case someone, say, uploads a 1k change tree to gerrit, add an option to limit the dependency processing in the Gerrit driver and in Zuul itself (since those are the two places where we recursively process deps). Change-Id: I568bd80bbc75284a8e63c2e414c5ac940fc1429a	2023-09-07 11:01:29 -07:00
Felix Edel	7ba9307f11	Avoid leaked items caused by config errors The _reportNonEqueuedItem() method is used to temporarily enqueue a change, report it and directly dequeue it. However, when the reporting fails e.g. due to a config error, the item will never be dequeued. This results in a leaked change that causes the queue processor to loop over it indefinitely. In our case the config error was caused by disabling the branch protection in GitHub for a release branch in a certain repository. This branch also defined a project-template which could not be found by Zuul anymore after the branch protection was disabled [1]. This behaviour can be reproduced in a unit test by enforcing a broken tenant configuration that references a non-existing project template during a pipeline run with a circular dependency. To fix this, ensure that the temporary enqueued item in _reportNonEqueuedItem() will be dequeued in any case. Although this fixes the endless loop in the queue processor, the same exception will still be raised on pipeline level ("exception processing pipeline..."). [1]: 2023-08-28 15:28:53,507 ERROR zuul.Pipeline.example-tenant.check: [e: 06d1ab80-45b7-11ee-8c99-721bf9f22e8c] Unable to re-enqueue change <Change 0x7f066ba36090 example-tenant/project 1234,80b4068eb1fe485df59185f0c93059fe7b15c23e> which is missing dependencies Traceback (most recent call last): File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1614, in _processOneItem quiet_dequeue = self.addChange( ^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange if not self.enqueueChangesAhead(change, event, quiet, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead r = self.addChange(needed_change, event, quiet=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange if not self.enqueueChangesAhead(change, event, quiet, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead r = self.addChange(needed_change, event, quiet=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange if not self.enqueueChangesAhead(change, event, quiet, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead r = self.addChange(needed_change, event, quiet=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange if not self.enqueueChangesAhead(change, event, quiet, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead r = self.addChange(needed_change, event, quiet=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange if not self.enqueueChangesAhead(change, event, quiet, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead r = self.addChange(needed_change, event, quiet=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange if not self.enqueueChangesAhead(change, event, quiet, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead r = self.addChange(needed_change, event, quiet=True, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 611, in addChange self._reportNonEqueuedItem(change_queue, File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 676, in _reportNonEqueuedItem if self.pipeline.tenant.layout.getProjectPipelineConfig(ci): ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8194, in getProjectPipelineConfig templates = self.getProjectTemplates(template_name) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8127, in getProjectTemplates raise TemplateNotFoundError("Project template %s not found" % name) zuul.model.TemplateNotFoundError: Project template template-foo not found Change-Id: I2514b783b646caae2863ee1ccbac4600772fe4d6	2023-09-07 18:26:49 +02:00
James E. Blair	267f675533	Allow new warnings when errors exist If a configuration error existed for a project on one branch and a change was proposed to update the config on another branch, that would activate a code path in the manager which attempts to determine whether errors are relevant. An error (or warning) is relevant if it is not in a parent change, and is on the same project+branch as the current patch. This is pretty generous. This means that if a patch touches Zuul configuration with a warning, all warnings on that branch must be updated. This was not the intended behavior. To correct that, we no longer consider warnings in any of the places where we check that a queue item is failing due to configuration errors. An existing test is updated to include sufficient setup to trigger the case where a new valid configuration is added to a project with existing errors and warnings. A new test case is added to show that we can add new deprecations as well, and that they are reported to users as warnings. Change-Id: Id901a540fce7be6fedae668390418aca06a950af	2023-09-04 14:02:13 -07:00
Clark Boylan	4effa487f5	Allow new configs to be used when warnings are present Prior to this change we checked if there are any errors in the config (which includes warnings by default) and return a build error if there are. Now we only check that proper errors are present when returning errors. This allows users to push config updates that don't fix all warnings immediately. Without this any project with warnings present would need to fix all warnings before newly proposed configs can take effect. This is particularly problematic for speculative testing, but in general it seems like warnings shouldn't be fatal. Change-Id: I31b094fb366328696708b019354b843c4b94ffc0	2023-09-04 11:20:13 -07:00
Zuul	90dce8ed12	Merge "Add pipeline queue stats"	2023-08-30 01:28:50 +00:00
Zuul	fc622866ec	Merge "Add window-ceiling pipeline parameter"	2023-08-30 01:28:43 +00:00
James E. Blair	a316015f56	Add pipeline queue stats Also add the configured window size to the pipeline stats. Remove the ambiguous phrasing "since Zuul started" from some of the counter documentation. Change-Id: Icbb7bcfbf25a1e34d26dd865fa29f61faceb4683	2023-08-29 15:49:52 -07:00
James E. Blair	7044963857	Add window-ceiling pipeline parameter This allows users to set a maximum value for the active window in the event they have a project that has long stretches of passing tests but they still don't want to commit too many resources in case of a failure. We should all be so lucky. Change-Id: I52b5f3a9e7262b88fb16afc4520b35854e8df184	2023-08-29 15:43:28 -07:00
Tobias Henkel	188e1c36ef	Only report dequeue if we have reported start The dequeue reporting has initially been introduced in order to make it possible to mark pending check runs as canceled. This is currently done unconditionally even if zuul hasn't reported the start already. This leads to occssional spam of canceled check runs that weren't supposed to be reported at all. For instance we've seen this on a repo similar to zuul-jobs that is part of all tenants but is only gated in one tenant. When a PR in such a repo is approved it enters the gate in all tenants but doesn't run any jobs in all but one tenant. If the item gets dequeued before the job freezing has been finished zuul reports canceled check runs from wrong tenants. This doesn't harm the workflow but leads to user confusion. A similar problem can be observed when a user creates a PR against a non-protected branch which typically runs no jobs. In this case an abandon of the PR can also lead to canceled check run reporting where zuul was not supposed to report anything at all on the pr. This can be fixed by skipping dequeue reporting if start hasn't been reported yet. Change-Id: Ibd1d8047168dcb5035c90fa25a629f4a7714c0f7	2023-08-17 15:46:05 -07:00
Zuul	0a82e72521	Merge "Don't cancel Github check-run during re-enqueue"	2023-08-15 07:12:39 +00:00
James E. Blair	76f791e4d3	Fix linting errors A new pycodestyle errors on ",\". We only use that to support Python <3.10, and since Zuul is now targeting only 3.11+, these instances are updated to use implicit continuation. An instance of "==" is changed to "is". A function definition which overrides an assignment is separated so that the assignment always occurs regardless of whether it ends up pointing to the function def. Finally, though not required, since we're editing the code anyway for nits, some typing info is removed. Change-Id: I6bb096b87582ab1450bed02541483fc6f1d6c44a	2023-08-02 10:28:22 -07:00
Zuul	816afcfdd1	Merge "Add manager/reporter support for config warnings"	2023-07-21 06:11:19 +00:00
James E. Blair	5be57fb87e	Add manager/reporter support for config warnings We recently added a severity field to configuration errors (but all errors are currently at the "error" severity). To prepare for "warning" severity, update the pipeline managers and reporters to expect both warnings and errors. Errors will still trigger buildset failures, but warnings will not. Both will be reported as comments. Change-Id: Ia24e91f5ddff7d9869e9e83886f996e4f425e110	2023-07-20 16:20:22 -07:00
Simon Westphahl	ea5f8fea7c	Don't cancel Github check-run during re-enqueue So far, when the scheduler re-enqueued a change that was missing dependencies, it also reported the Github check-run as cancelled but did not report start as the re-enqueued item was added as a "quiet" item. The check-run on Github was still marked as success after the item finished. But until then it appeared as cancelled even if the change was successfully re-enqueued. To fix this we'll not call the dequeue reporters when the change could be re-enqueued. The dequeued item will still be reported to the database though. Change-Id: Iea465ca1d9132322b912f7723e3ae41a8c6d3002	2023-07-20 12:45:27 +02:00
James E. Blair	2436c1a5df	Don't issue multiple merge requests for bundles In I82848367bd6f191ec5ae5822a1f438070cde14e1 we avoided spawning merge jobs for non-live items. In Id533772f35ebbc76910398e0e0fa50a3abfceb52 we backed that out partially by spawning merge jobs for non-live items if they update the config (so we can create a layout). In I38925e5fd0ed5ff45aab17d108740345716fd478 we accepted that in the case of non-live items in a bundle that updated config, we would spawn multiple merge jobs and each one should be responsible for updating its own item. However, we can revisit the assumptions in Id533772f35ebbc76910398e0e0fa50a3abfceb52 which appears not to have taken bundles into consideration. A bundle should have the same files results for every item in the bundle, so, channeling the original spirit of I82848367bd6f191ec5ae5822a1f438070cde14e1, we can try to avoid spawning merge jobs for multiple items in a bundle. This is an alternate solution to the issue addressed by I38925e5fd0ed5ff45aab17d108740345716fd478 in that rather than accepting that we will receive multiple merge jobs in the case of a bundle with non-live items that each update config, we will instead receive only one merge job for the entire bundle regardless of whether they update config, or even whether they are live. This is accomplished by establishing a single "bundle item" for the bundle which is defined as the first live item in the bundle. This is the only item in the bundle that will spawn merge jobs. When the merge job for that item completes, all of the items in the bundle will be updated with the results. Change-Id: Icfe1f2a126eb13349b510107a305c6eef7b622fb	2023-06-26 10:42:17 +00:00

1 2 3 4 5 ...

466 Commits