Commit Graph

1345 Commits

Author SHA1 Message Date
Simon Westphahl c8ec0b25b5
Cancel jobs of abandoned circular dep. change
When a change that is part of a circular dependency is abandoned we'd
set the item status to dequeued needing change. This will set all builds
as skipped, overwriting exiting builds.

This means that when the item was removed, we did not cancel any of the
builds. For normal builds this mainly waste resources, but if there are
paused builds, those will be leaked and continue running until the
executor is force-restarted.

The fix here is to cancel the jobs before setting it as dequeued needing
change.

Change-Id: If111fe1a21a1c944abcf460a6601293c255376d6
2024-04-11 12:26:54 +02:00
Zuul 3b19ca9cb3 Merge "Add zuul_unreachable ansible host group" 2024-03-25 18:26:14 +00:00
Simon Westphahl 305d4dbab9
Handle dependency limit errors more gracefully
When the dependency graph exceeds the configured size we will raise an
exception. Currently we don't handle those exceptions and let them
bubble up to the pipeline processing loop in the scheduler.

When this happens during trigger event processing this is only aborting
the current pipeline handling run and the next scheduler will continue
processing the pipeline as usual.

However, in case where the item is already enqueued this exception can
block the pipeline processor and lead to a hanging pipeline:

ERROR zuul.Scheduler: Exception in pipeline processing:
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 2370, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1800, in processQueue
    item_changed, nnfi = self._processOneItem(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1624, in _processOneItem
    self.getDependencyGraph(item.changes[0], dependency_graph, item.event,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  [Previous line repeated 8 more times]
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 813, in getDependencyGraph
    raise Exception("Dependency graph is too large")
Exception: Dependency graph is too large

To fix this, we'll handle the exception and remove the affected item.
We'll also handle the exception during enqueue and ignore the trigger
event in this case.

Change-Id: I210c5fa4c568f2bf03eedc18b3e9c9a022628dc3
2024-03-19 14:37:26 +01:00
Simon Westphahl 4680c58a27
Allow rerequested action for Github triggers
The 'requested' action is deprecated in favor of 'rerequested', but the
new schema did not permit the new action name.

Change-Id: I047d2676f44151e7569d38bc1df3d26ffee83202
2024-03-14 14:48:05 +01:00
Simon Westphahl 382e9d386c
Use Github label schema for 'unlabeled' actions
The schema validation for Github trigger events did not use the label
schema for 'unlabeled' actions leading to bogus config warnings.

Change-Id: I6c888d990047e611b560491be9bc784eb1981ada
2024-03-14 12:39:34 +01:00
Zuul a56c9c0ea9 Merge "Produce consistent merge commit shas" 2024-03-06 09:47:14 +00:00
James E. Blair 79a9f86c8d Ignore circular dependencies in supercedent pipelines
There are two issues with supercedent pipelines related to circular deps:

1) When operating in a post-merge configuration on changes (not refs), the
   pipeline manager would throw an exception starting with 10.0.0 because
   any time it operates on change objects, it attempts to collect the
   dependency cycle before enqueing a change, and starting with 10.0.0,
   the supercedent manager raises an exception in that case.
2) When operating in a pre-merge configuration on changes, the behavior
   regarding circular dependencies was undefined before 10.0.0.  It is
   likely that they were ignored because the manager creates a dynamic
   queue based on the project-ref, but it wasn't explicitly documented
   or tested.

To correct both of these:

Override the cycleForChange method in the supercedent manager so that it
always returns an empty cycle.

Document the expected behavior.

Add tests that cover the cases described above.

Change-Id: Icf30d488334d40a929f31c2f390e18ae599a3c42
2024-03-04 10:50:23 -08:00
James E. Blair 171d4c56b1 Add some github configuration deprecations
The "event" trigger attribute can currently be a list.  Technically,
it is possible to construct a trigger with an event list, such as:

    trigger:
      github:
        - event:
            - pull_request
            - pull_request_review
          branch: master

Which would trigger on any pull_request or pull_request_review event
on the master branch.  However in practice users typically have much
more narrow event selections, such as only triggering on pull_request
events with the opened action, or a pull_request event with a certain
comment.  It is not possible to construct that example with a single
trigger; the following is invalid:

    trigger:
      github:
        - event:
            - pull_request
            - pull_request_review
          actions:
            - opened
            - commented
          branch: master
          comment: recheck

That will pass syntax validation but would only fire on a recheck
comment; it would never fire on a PR opened event because that event
won't have a comment.

To help users avoid these problems, or worse, let's limit the event
specifier to a single event (of course users can add more triggers for
other events).  That will allow us to inform users when they use
options incompatible with the event they selected.

For now, we make this a deprecation so that in the future we can
enforce it and improve feedback.

This adds syntax validation for each of the possible event/action
combinations in the case where the user has already specified a single
event.  This allows us to go ahead and issue warnings if users specify
incompatible options.  Later, all of these can become errors.

Some time ago (8.3.0) we deprecated the require-status attribute.  It
is eligible for removal now, but that predated the deprecation
warnings system.  Since we haven't yet removed it, and we now have
that system, let's add a deprecation warning for it and give users a
little more time to notice that and remove it before it becomes an
error.

When a Github user requests that a check run start again, Github emits
a "check_run" event with a "rerequested" action.  In zuul < 5.0.0, we
asked users to configure the check_run trigger with the "requested"
action and we silently translated the "rerequested" from github to the
zuul "requested".  In 5.0.0, we reversed that decision in order to
match our policy of passing through data from remote systems as
closely as possible to enable users to match the corresponding
documentation of zuul and the remote system.  We deprecated
"requested" and updated the examples in the documentation to say
"rerequested".  Unfortunately, we left the canonical documentation of
the value as "requested".  To correct this oversight, that
documentation is updated to say "rerequested" and a configuration
deprecation warning is added for uses of "requested".

The "unabel" trigger attribute is undocumented and unused.  Deprecate
it from syntax checking here so we can gracefully remove it later.

Some unit tests configs are updated since they passed validation
previously but no longer do, and the actual github pull request
review state constants ('approved', etc) are updated to match
what github sends.

Change-Id: I6bf7753d74ec0c5f19dad508c33762a7803fe805
2024-02-29 16:37:47 -08:00
James E. Blair 4421a87806 Add zuul_unreachable ansible host group
This will allow users to write post-run playbooks that skip
running certain tasks on unreachable hosts.

Change-Id: I04106ad0222bcd8073ed6655a8e4ed77f43881f8
2024-02-27 13:57:07 -08:00
James E. Blair 3e4caaac4b Produce consistent merge commit shas
Use a fixed timestamp and merge message so that zuul mergers
produce the exact same commit sha each time they perform a merge
for a queue item.  This can help correlate git repo states for
different jobs in the same change as well as across different
changes in the case of a dependent change series.

The timestamp used is the "configuration time" of the queue item
(ie, the time the buildset was created or reset).  This means
that it will change on gate resets (which could be useful for
distinguishing one run of a build from another).

Change-Id: I3379b19d77badbe2a2ec8347ddacc50a2551e505
2024-02-26 16:32:46 -08:00
James E. Blair 5a8e373c3b Replace Ansible 6 with Ansible 9
Ansible 6 is EOL and Ansible 9 is available.  Remove 6 and add 9.

This is usually done in two changes, but this time it's in one
since we can just rotate the 6 around to make it a 9.

command.py has been updated for ansible 9.

Change-Id: I537667f66ba321d057b6637aa4885e48c8b96f04
2024-02-15 16:20:45 -08:00
James E. Blair b038dcaf9f Deprecate Ansible 6
Ansible 6 is no longer supported and 8 is available and working.
Deprecate Ansible 6.

Change-Id: I721ae1659cc062d9938ceea863ad746996892cc7
2024-02-07 13:22:21 -08:00
James E. Blair b4e49fd8a1 Fix 10 second delay after skipped command task
The log streaming system suffers a 10 second delay after a skipped
command task because:

* when processing the results for the next task, we ensure that the
  log streams for all of the previous tasks have stopped
* when stopping a log stream, we wait for the remote command process
  to finish before we close the stream
* if a log file has not appeared yet, we can't determine if the
  stream has finished
* so we wait 10 seconds for the log to appear before we proceed and
  terminate the stream regardless.

The underlying issue is that the code that terminates the stream does
not know whether to expect a log file (the normal case) or not (the
case for a skipped task).

To correct that, we make a new Streamer class which bundles all of
the information about a particular log streamer and which is
individually addressable.  This way we can stop an individual
streamer immediately if it's corresponding task is skipped, or stop
all of the streamers with a 10 second grace period in the normal
case.

Since the behavioral difference is a 10 second delay (but otherwise
no change in job output), we can't test the behavioral outcome of
this change, but we can exercise the code by ensuring that there are
skipped command tasks in the remote stream tests.

Change-Id: Id6ee13e5a82aa8fa3a2a0dd293cf99ed5b84347a
2024-02-02 09:43:37 -08:00
Ahmon Dancy e916f151ff gitlabconnection.py: Handle 404 on unapprove
zuul/driver/gitlab/gitlabconnection.py:
  Handle a 404 response when attempting to unapprove an MR which does
  not have an existing approval.

  Ensure that an exception is raised in otherwise unexpected
  situations.

  The modified codepath is exercised by
  tests.unit.test_gitlab_driver.TestGitlabDriver.test_merge_request_commented

tests/fakegitlab.py:
  Make GitlabWebServer.post_mr_approve()/post_mr_unapprove() act more
  like real GitLab.

tests/fixtures/layouts/basic-gitlab.yaml:
  Add "approval: False" to the pipeline.check.start.gitlab to ensure
  that the test suite ends up trying to unapprove not-yet-approved MRs
  at the start of a pipeline.  This also makes the configuration more
  like the reference pipeline in the documentation.

Change-Id: Ia000b55e28c9628cf97682939215430baa78d065
2024-01-03 08:48:00 -08:00
Zuul 4fb47d40d3 Merge "Add gerrit hashtags support" 2023-12-07 22:15:27 +00:00
James E. Blair 164b1784c6 Add gerrit hashtags support
This adds support for the hashtags-changed trigger event as well
as using hashtags as pipeline and trigger requirements.

Change-Id: I1f6628d7c227d12355f651c3c822b06e2d5c5562
2023-12-07 07:07:14 -08:00
James E. Blair 27a4beb698 Remove nested templates from assert in tests
Ansible has fixed CVE-2023-5764 by further restricting evaluation
in "nested templates" (which I take to mean a jinja template string
inside of an ansible field that is already expected to be a template
string).  The field named "that" in the "assert" module is such a
field.

To address this, remove the nested template and just treat the
expression as a single jinja template.

Change-Id: I483cf2ec0f61e6484c2768acfb7ab8d9dd5c5117
2023-12-05 09:58:28 -08:00
Zuul f51addbcdc Merge "Add tests for admin api token usage with access-rules" 2023-12-01 17:21:05 +00:00
James E. Blair 033470e8b3 Fix repo state restore for zuul role tag override
When a repo that is being used for a zuul role has override-checkout
set to a tag, the job would fail because we did not reconstruct the
tag in our zuul-role checkout; we only did that for branches.

This fixes the repo state restore for any type of ref.

There is a an untested code path where a zuul role repo is checked
out to a tag using override-checkout.  Add a test for that (and
also the same for a branch, for good measure).

Change-Id: I36f142cd3c4e7d0b930318dddd7276f3635cc3a2
2023-11-30 10:06:03 -08:00
Zuul 90746cb286 Merge "Retry lingering deduplicated builds" 2023-11-20 17:06:05 +00:00
James E. Blair dd60903a95 Retry lingering deduplicated builds
We intend to handle the case where two queue items in check share
a deduplicated build and one of them completes and is dequeued while
the other continues.  To handle this, we avoid removing any queue
items (and therefore their associated builds) from ZK if their builds
show up in any currently-enqueued queue item.  However, we don't
actually have a mechanism to load a build from ZK in that situation
if it isn't in a queue item that is currently enqueued.

Adding such a mechanism is complex and risky, whereas the circular
dependency refactoring effort currently in progress will address this
issue in a comprehensive manner.

This change addresses the issue by detecting the situation at the
point where we would try to launch the build (since we failed to
restore it from ZK) and instead of raising an exception as we currently
do, we tell the scheduler to retry the build.  This results in the
buildset not actually taking advantage of the potential deduplication,
but it does at least provide a working build result to the user in
the form of a brand new build.

Change-Id: I86c159c82b858e67433bdaa1e479471b21ea8b86
2023-11-16 14:59:31 -08:00
Simon Westphahl 8f988095b7
Fix issue with new Github default merge mode
Don't return the new default merge modes (recursive or ORT) as long as
the cached project merge modes are not updated.

This fixes the following error when loading dynamic layouts:

    Merge mode merge-recursive not supported by project
    github/org/project. Supported modes: ['merge', 'merge-resolve',
    'squash-merge', 'rebase'].

Change-Id: I473ba605decb136cd527308a63f16a5e548697fb
2023-11-10 15:16:38 +01:00
Zuul 952733340b Merge "Don't schedule initial merge for branch/ref items" 2023-11-08 13:59:01 +00:00
Zuul b6c8294d22 Merge "Add auth token to websocket" 2023-10-30 19:33:33 +00:00
Simon Westphahl 6c6872841b
Don't schedule initial merge for branch/ref items
Currently we schedule a merge/repo-state for every item that is added to
a pipeline. For changes and tags we need the initial merge in order to
build a dynamic layout or to determine if a given job variant on a
branch should match for a tag.

For other change-types (branches/refs) we don't need the initial
merge/repo-state before we can freeze the job graph. The overhead of
those operations can become quite substantial for projects with a lot of
branches that also have a periodic pipeline config, but only want to
execute jobs for a small subset of those branches.

With this change, branch/ref changes that don't execute any jobs will
be removed without triggering any merge/repo state requests.

In addition we will reduce the number of merge requests for branch/ref
changes as the initial merge is skipped in all cases.

Change-Id: I157ed52dba8f4e197b35798217b23ec7f035b2d9
2023-10-27 12:20:57 +02:00
James E. Blair 8a2458aac6 Add tests for admin api token usage with access-rules
This change adds some more tests of the zuul admin api token
(both via zuul-client and just simulated rest api usage) with
read-only access-rules in place.

Change-Id: Idc2fbb50e6f35a080ad5e5aa214ea85cbca42f11
2023-10-25 10:17:14 -07:00
James E. Blair 18fb324f1e Add auth token to websocket
When making a websocket request, browsers do not send the
"Authorization" header.  Therefore if a Zuul tenant is run in
a configuration where authz is required for read-only access,
the websocket-based log streaming will always fail.

To correct this, we will remove the http request authz check
from the console-stream endpoint, and add an optional token
parameter to the websocket message payload.  The JS web app
will be responsible for sending the auth token in the payload,
and the web server will validate it if it is required for the
tenant.  Thanks to Andrei Dmitriev for this suggestion.

Since we essentially have two different authz code paths in
zuul-web now, in order to share as much code as possible, the
authz sequence is refactored in such a way that the final authz
check can be deferred.  First we create an AuthContext at the
start of the request which stores tenant and header information,
then the actual validation is performed in a separate step where
the token can optionally be provided.

In the http code path, we create the AuthContext and validate
immediately, using the Authorization header, and we do all of that
in the cherrypy tool at the start of the request.

In the websocket code path, we create the AuthContext as the
websocket handler is being created by the cherrypy request handler,
then we perform validation after receiving a message on the
websocket.  We use the token supplied from the request.

Error handling is adjusted so in the http code path, exceptions
that return appropriate http errors are raised, but in the
websocket path, these are caught and translated into websocket
close calls.

A related issue is that we perform no validation that the
streaming build log being requested belongs to the tenant via
which the request is being sent.  This was unecessary before
read-only access was an option, but now that it is, we should
check that a streaming build request arrives via the correct
tenant URL.  This change adjusts that as well.

During testing, it was noted that the tenant configuration syntax
allows admin-rules and access-rules to use the scalar-or-list
pattern, however some parts of the code assumed only lists.  The
configloader is updated to use scalar-or-list for both of those
values.

Change-Id: Ifd4c21bb1fe962bf23acb5b4f10b3bbaba61e63a
Co-Authored-By: Andrei Dmitriev <andrei.dmitriev@nokia.com>
2023-10-24 07:29:55 -07:00
Simon Westphahl 810191b60e
Select correct merge method for Github
Starting with Github Enterprise 3.8[0] and github.com from September
2022 on[1], the merge strategy changed from using merge-recursive to
merge-ort[0].

The merge-ort strategy is available in the Git client since version
2.33.0 and became the default in 2.34.0[2].

If not configured otherwise, we've so far used the default merge
strategy of the Git client (which varies depending on the client
version). With this change, we are now explicitly choosing the default
merge strategy based on the Github version. This way, we can reduce
errors resulting from the use of different merge strategies in Zuul and
Github.

Since the newly added merge strategies must be understood by the mergers
we also need to bump the model API version.

[0] https://docs.github.com/en/enterprise-server@3.8/admin/release-notes
[1] https://github.blog/changelog/2022-09-12-merge-commits-now-created-using-the-merge-ort-strategy/
[2] https://git-scm.com/docs/merge-strategies#Documentation/merge-strategies.txt-recursive

Change-Id: I354a76fa8985426312344818320980c67171d774
2023-10-24 07:15:39 +02:00
Zuul bd11c4ff79 Merge "Add gcloud pubsub support to Gerrit driver" 2023-10-04 03:29:42 +00:00
Zuul a06d4110f5 Merge "Add more safety check tests" 2023-09-22 06:14:10 +00:00
James E. Blair 01ac88f3d3 Add more safety check tests
These tests would fail without the safety check introduced in the
parent change.

They were originally written under the assumption that we would
have the optimal behavior for the pipeline processor, where some
builds may be canceled but others may proceed after certain updates
to circular dependency bundles.  However, fixing that is proving
impractical without a refactor, so these tests are added in updated
form which mostly asserts that a lot of jobs are aborted.

The original form of the tests are also added in this change, but
with skip decorators attached.  It is hoped that after some
refactoring of circular dependency handling, we can use the original
form of these tests to validate the desired behavior.

Change-Id: Ie45da9fc1848a717bf5308e595edd27e598d6882
2023-09-21 13:24:04 -07:00
Zuul ebd223cf07 Merge "github: fallback to api_token when can't find installation" 2023-09-19 17:45:41 +00:00
Ian Wienand 3c2e518c52 github: fallback to api_token when can't find installation
graphql queries (I77be4f16cf7eb5c8035ce0312f792f4e8d4c3e10) require
authentication. Enqueueing changes from GitHub (including Depends-On)
requires we run a graphql query. This means that Zuul must be able to
authenticate either via an application or api_token to support features
like Depends-On.

If the app is setup (app_id in config) but we aren't installed with
permissions on the project we're looking up, then fall back to using a
specified api_token. This will make Depends-On work.

Logging is updated to reflect whether or not we are able to fallback to
the api_token if the application is not installed. We log the lack of an
application installation at info level if we can fallback to the token,
and log at error level if we're falling back to anonymous access.

For backward compatibility we continue to fallback to anonymous access
if neither an application install or api_token are present. The reason
for this is features like Job required-projects: work fine anonymously,
and there may be Zuul installations that don't need additional
functionality.

Keep in mind that authenticated requests to GitHub get larger API rate
limits. Zuul installations should consider setting an API token even
when using an application for this reason. This gives Zuul the best
chance that fallback requests will not be rate limited.

Documentation is updated, a changelog added and several test
configuration files are padded with the required info.

Story: #2008940
Change-Id: I2107aeafc55591eea790244701567569fa6e80d4
2023-09-18 09:29:38 -07:00
Zuul d44b9875b0 Merge "Fix deduplicating child jobs in check" 2023-09-15 22:22:02 +00:00
Zuul 5b7b0aed5f Merge "Fix deduplication with paused jobs" 2023-09-15 22:13:52 +00:00
Zuul 4b347ce91b Merge "Avoid leaked items caused by config errors" 2023-09-15 18:41:33 +00:00
Zuul 086151de37 Merge "Add more check deduplication tests" 2023-09-15 18:24:17 +00:00
Zuul 5294c582b1 Merge "Fix deduplication of child jobs in check" 2023-09-15 18:24:14 +00:00
Zuul 2d76e35129 Merge "Add more deduplication tests" 2023-09-15 18:17:40 +00:00
Zuul d543ae2f88 Merge "Make Ansible variable freezing more efficient" 2023-09-15 08:13:59 +00:00
James E. Blair e55748ba69 Make Ansible variable freezing more efficient
We currently iterate over every job/host/etc variable in the freeze
playbook.  The reason is because if any value in any variable is
Undefined according to jinja, the Ansible combine filter throws
an error.  What we want to do in Zuul is merge any variable we can,
but if any is undefined, we skip it.  Thus, the process of combining
the variables one at a time in a task and ignoring errors.

This process can be slow, especially if we have start with a large
amount of data in one of the early variables.  The combine filter
needs to reprocess the large data repeatedly for each additional
variable.

To improve the process, we create a new action plugin, "zuul_freeze"
which takes a list of variables we want to freeze, then templates
them one at a time and stores the result in a cacheable fact.  This
is the essence of what we were trying to accomplish with the combine
filter.

Change-Id: Ie41f404762daa1b1a5ae47f6ec1aa1954ad36a39
2023-09-14 14:00:45 -07:00
James E. Blair 930c42cd28 Fix deduplicating child jobs in check
If a second change is enqueued in check significantly after the
first, then node requests for child jobs may not be deduplicated
correctly.

Before deduplicating a build, Zuul applies parent data to child
jobs, then compares the child jobs to determine if they are
equivalent.  If so, then they are deduplicated.

This only happens one level in the hierarchy at a time.  Consider
the case where a change is enqueued and both the parent and child
jobs have completed (but the change is still in the queue waiting
on a third, unrelated, job).

If the second change in the bundle is enqueued, Zuul will:
1) Attempt to apply parent data to child jobs.
   Since no jobs have completed yet for this item, no parent data
   are applied.
2) Deduplicate jobs in the second change.
   Zuul will deduplicate the parent job at this point.
3) Zuul will compare the child jobs in the two changes and determine
   they are different because one has parent data and the other does
   not.
4) Zuul submits a node request for the child job.
5) On the next pipeline process, Zuul applies the parent data from
   the deduplicated parent job to the new child job.
6) Zuul deduplicates the child job, and the nodepool request is
   orphaned.

To correct this, we will repeat the process of applying parent data
to child jobs each time we find at least one build to deduplicate.

That means that all existing parent data will be applied to all jobs
on each pass through the pipeline processor no matter how deep the
dependency hierarchy.

Change-Id: Ifff17df40f0d59447f74cdde619246171279b553
2023-09-08 14:20:46 -07:00
James E. Blair 68f80f9749 Fix deduplication with paused jobs
When a deduplicated job paused, it would not wait for all children
across all queue items to complete before resuming; instead it
would wait only for the children in its own queue item.

Check all queue items a build is in before resuming it.

Change-Id: Ic2dec3a6dc58230b0873d7e8ba474bc39ed28385
2023-09-08 12:54:33 -07:00
James E. Blair 493a136dba Add more check deduplication tests
There is enough difference between how jobs are deduplicated in
check vs gate that we should test both code paths.  This duplicates
the remaining tests.

Change-Id: I539a61a575b036021fd46b363a4f7f09262db3a7
2023-09-07 16:58:19 -07:00
James E. Blair 742669ab09 Fix deduplication of child jobs in check
When we deduplicate jobs, we intend to call setResult on all of
the queue items with the deduplicated build.  This only worked
in dependent pipelines because we only looked for queue items in
the current bundle.  In independent pipelines, the queue items
can be in different bundles.

To resolve this, search for items with deduplicated builds in
across the whole queue in independent pipelines (using the approach
we use when deduplicating them to begin with).

Change-Id: I16436710c47b4f22df39e0cd82d0e289b2293c32
2023-09-07 16:58:19 -07:00
James E. Blair 77633e0005 Add more deduplication tests
This adds more test cases for automatic job deduplication, as well
as some explanatory comments.

Change-Id: I5ca96ddf655e501af3c9490ea86e8cd6a13d7e44
2023-09-07 14:11:30 -07:00
James E. Blair 70c34607f5 Add support for limiting dependency processing
To protect Zuul servers from accidental DoS attacks in case someone,
say, uploads a 1k change tree to gerrit, add an option to limit the
dependency processing in the Gerrit driver and in Zuul itself (since
those are the two places where we recursively process deps).

Change-Id: I568bd80bbc75284a8e63c2e414c5ac940fc1429a
2023-09-07 11:01:29 -07:00
Felix Edel 7ba9307f11 Avoid leaked items caused by config errors
The _reportNonEqueuedItem() method is used to temporarily enqueue a
change, report it and directly dequeue it. However, when the reporting
fails e.g. due to a config error, the item will never be dequeued.

This results in a leaked change that causes the queue processor to
loop over it indefinitely.

In our case the config error was caused by disabling the branch
protection in GitHub for a release branch in a certain repository. This
branch also defined a project-template which could not be found by Zuul
anymore after the branch protection was disabled [1].

This behaviour can be reproduced in a unit test by enforcing a broken
tenant configuration that references a non-existing project template
during a pipeline run with a circular dependency.

To fix this, ensure that the temporary enqueued item in
_reportNonEqueuedItem() will be dequeued in any case.

Although this fixes the endless loop in the queue processor, the same
exception will still be raised on pipeline level ("exception processing
pipeline...").

[1]:
2023-08-28 15:28:53,507 ERROR zuul.Pipeline.example-tenant.check: [e: 06d1ab80-45b7-11ee-8c99-721bf9f22e8c] Unable to re-enqueue change <Change 0x7f066ba36090 example-tenant/project 1234,80b4068eb1fe485df59185f0c93059fe7b15c23e> which is missing dependencies
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1614, in _processOneItem
    quiet_dequeue = self.addChange(
                    ^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 611, in addChange
    self._reportNonEqueuedItem(change_queue,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 676, in _reportNonEqueuedItem
    if self.pipeline.tenant.layout.getProjectPipelineConfig(ci):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8194, in getProjectPipelineConfig
    templates = self.getProjectTemplates(template_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8127, in getProjectTemplates
    raise TemplateNotFoundError("Project template %s not found" % name)
zuul.model.TemplateNotFoundError: Project template template-foo not found

Change-Id: I2514b783b646caae2863ee1ccbac4600772fe4d6
2023-09-07 18:26:49 +02:00
Zuul fc622866ec Merge "Add window-ceiling pipeline parameter" 2023-08-30 01:28:43 +00:00
James E. Blair 7044963857 Add window-ceiling pipeline parameter
This allows users to set a maximum value for the active window
in the event they have a project that has long stretches of
passing tests but they still don't want to commit too many resources
in case of a failure.

We should all be so lucky.

Change-Id: I52b5f3a9e7262b88fb16afc4520b35854e8df184
2023-08-29 15:43:28 -07:00