Commit Graph

466 Commits

Author SHA1 Message Date
Simon Westphahl c8ec0b25b5
Cancel jobs of abandoned circular dep. change
When a change that is part of a circular dependency is abandoned we'd
set the item status to dequeued needing change. This will set all builds
as skipped, overwriting exiting builds.

This means that when the item was removed, we did not cancel any of the
builds. For normal builds this mainly waste resources, but if there are
paused builds, those will be leaked and continue running until the
executor is force-restarted.

The fix here is to cancel the jobs before setting it as dequeued needing
change.

Change-Id: If111fe1a21a1c944abcf460a6601293c255376d6
2024-04-11 12:26:54 +02:00
Zuul a3abea408b Merge "Emit per-branch queue stats separately" 2024-03-25 19:22:37 +00:00
Zuul 0496c249be Merge "Reset jobs behind non-mergeable cycle" 2024-03-25 18:21:43 +00:00
Zuul b0a7ed2899 Merge "Attempt to preserve triggering event across re-enqueues" 2024-03-25 10:09:13 +00:00
Simon Westphahl 349c6a029d Don't reset buildset when cycle dependency merged
In case a live change depends on a cycle and the cycle is merged while
the item is still active the scheduler will detect the cycle as changed
and re-enqueue the dependent change.

The reason for this behavior is that we don't consider dependencies of
merged changes when building the dependency graph.

Change-Id: Ibc952886b56655c0705882497511b120e5a731cd
2024-03-21 13:35:50 -07:00
Simon Westphahl 305d4dbab9
Handle dependency limit errors more gracefully
When the dependency graph exceeds the configured size we will raise an
exception. Currently we don't handle those exceptions and let them
bubble up to the pipeline processing loop in the scheduler.

When this happens during trigger event processing this is only aborting
the current pipeline handling run and the next scheduler will continue
processing the pipeline as usual.

However, in case where the item is already enqueued this exception can
block the pipeline processor and lead to a hanging pipeline:

ERROR zuul.Scheduler: Exception in pipeline processing:
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 2370, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1800, in processQueue
    item_changed, nnfi = self._processOneItem(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1624, in _processOneItem
    self.getDependencyGraph(item.changes[0], dependency_graph, item.event,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  [Previous line repeated 8 more times]
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 813, in getDependencyGraph
    raise Exception("Dependency graph is too large")
Exception: Dependency graph is too large

To fix this, we'll handle the exception and remove the affected item.
We'll also handle the exception during enqueue and ignore the trigger
event in this case.

Change-Id: I210c5fa4c568f2bf03eedc18b3e9c9a022628dc3
2024-03-19 14:37:26 +01:00
James E. Blair 6ccbdacdf2 Attempt to preserve triggering event across re-enqueues
When a dependency cycle is updated, we will re-enqueue the changes
in the cycle so that each of the changes goes thorugh the process
of being added to the queue with the updated contents of the cycle.
That may mean omitting changes from the cycle, or adding new ones,
or even splitting into two cycles.

In that case, in order to preserve the idea of the
"triggering change", carry over the triggering change from when the
cycle was originally enqueued.

Note that this now exposes us to the novel idea that the triggering
change may not be part of the queue item.

Change-Id: I9e00009040f91d7edc31f4928e632edde4b2745f
2024-03-13 13:07:08 -07:00
James E. Blair c2103f7058 Reset jobs behind non-mergeable cycle
In the case of a dependency cycle, we check the mergeability of
each change in the item before we try to merge any of them, and
dequeue the item if it looks like one of them won't be able to
merge.  However, that bypasses the normal behavior where we reset
changes behind failing items, which could lead to merging changes
that were tested with changes ahead that did not merge.

To correct this, update the cycle-can-not-be-merged dequeue stanza
with a reset, to mirror the stanza below which handles the failure
of any individual change to merge.

Change-Id: I52a9fc2da4dd89131722d69d2b5dea886eb3d51c
2024-03-13 09:03:16 -07:00
James E. Blair 794545fc64 Emit per-branch queue stats separately
We currently emit 4 statsd metrics for each shared queue, but in
the case that a queue is configured as per-branch, we disregard
the branch and emit the stats under the same hierarchy for any
branch of that queue.  This means that if we have a queue for
integrated-master and a queue for integrated-stable at the same
time, we would emit the stats for the master queue, then
immediately emit the same stats for the stable queue, overwriting
the master stats.

To correct this, move the metrics down a level in the case that
the queue is configured per-branch, and include the branch name
in the key.

Change-Id: I2f4b22394bc3774410a02ae76281eddf080e5c7f
2024-03-06 06:32:22 -08:00
James E. Blair 79a9f86c8d Ignore circular dependencies in supercedent pipelines
There are two issues with supercedent pipelines related to circular deps:

1) When operating in a post-merge configuration on changes (not refs), the
   pipeline manager would throw an exception starting with 10.0.0 because
   any time it operates on change objects, it attempts to collect the
   dependency cycle before enqueing a change, and starting with 10.0.0,
   the supercedent manager raises an exception in that case.
2) When operating in a pre-merge configuration on changes, the behavior
   regarding circular dependencies was undefined before 10.0.0.  It is
   likely that they were ignored because the manager creates a dynamic
   queue based on the project-ref, but it wasn't explicitly documented
   or tested.

To correct both of these:

Override the cycleForChange method in the supercedent manager so that it
always returns an empty cycle.

Document the expected behavior.

Add tests that cover the cases described above.

Change-Id: Icf30d488334d40a929f31c2f390e18ae599a3c42
2024-03-04 10:50:23 -08:00
Zuul 617bbb229c Merge "Fix validate-tenants isolation" 2024-02-28 02:46:55 +00:00
James E. Blair ced306d5b1 Update gerrit changes more atomically
The following problem was observed:

Change A depends-on change B, which is in turn the tip of a
patch series of several changes.

Drivers warm the change cache on events by querying information
about changes related to those events.  But they don't process
depends-on headers, which means most drivers only warm one change,
and while the gerrit driver will follow other types of dependency
links which are unique to it, it stops at depends-on boundaries.

So in the example above, the only change in the cach which was warm
was change A.

The triggering event was enqueued, forwarded, and processed by
two responding pipelines simultaneously on two executors.

Each of them noticed the depends-on link and started querying gerrit
for change B and its dependencies.  One of the schedulers was about
1 second ahead of the other in this process.

In the gerrit driver, there is a two phase process for updating
changes.  First the change itself is updated in the normal way
common to all drivers, and then gerrit-specific dependency links
are updated.  That means the change is added to the change cache
with no dependencies, then mutated to add dependencies later.

The first scheduler added change B to the cache with no dependencies.
The second scheduler saw the update and refreshed its copy of B.
The second scheduler begin updating B, saw that the ltime of its
copy of B was sufficiently new it didn't need to update the cache
and stopped updating.
The second scheduler enqueued changes A and B, but no others in its
pipeline.
The first scheduler finished querying the stack of changes ending at
B, added them to the change cache, and mutated the entry for B in the
cache.
The first scheduler enqueued A, B, and the rest of the stack in its
pipeline.
The second scheduler updated its copy of B to include the new
dependencies.
The second scheduler ran a pipeline processor, noticed that B lacked
dependencies, and dequeued A and B, and reported an error to the user.

The good news is that Zuul noticed the mistake and dequeued the
changes.

To correct this, we will now collect all of the information about a
change and its gerrit-specific dependencies before writing any of
that information to the change cache.  This means that in our example
above, the second scheduler would not have aborted its queries.
Eventually, both schedulers would end up with the same information
before enqueing anything.

This process is still not quite optimal, in that we will have multiple
schedulers racing to update the same changes at the same time, but
they are designed to deal with collisions like that, so it should
at least be correct.

A future area of work might be to investigate whether we can optimize
this case further.

Change-Id: I647c2b54a55789e521fca71c8c3814907df65da6
2024-02-22 06:37:31 -08:00
James E. Blair 1bec2014bc Remove updateJobParentData method
This method was added as part of the initial deduplication work in
959a0b9834.  Since we now collect
parent data at the time that we run the job, this method doesn't
actually do anything other than decide when jobs are ready to run.

This change moves that logic back into the findJobsToRun method
and removes the unecessary updateJobParentData method.

Change-Id: Iac744a24ee3902360eeaef371808657a8eeb2080
2024-02-09 10:19:08 -08:00
James E. Blair fa274fcf25 Remove most model_api backwards-compat handling
Since we require deleting the state (except config cache) for the
circular dependency upgrade, we can now remove any zookeeper
backwards compatability handling not in the config cache.  This
change does so.  It leaves upgrade handling for two attributes
in the branch cache (even though they are old) because it is
plausible that even systems that have been upgrading regularly
may have non-upgraded data in the branch cache for infrequently
used branches, and the cost for retaining the upgrade handling
is small.

Change-Id: I881578159b06c8c7b38a9fa0980aa0626dddad31
2024-02-09 07:39:55 -08:00
James E. Blair ca83980bb7 Clean up safety check
The safety check was originally written to detect when a dependncy
cycle changed without the pipeline manager noticing.

Since the dependency cycle refactor, items can have multiple changes
and previous processes that were designed around updates to items
causing cascading updates to items behind them (but in the same bundle)
no longer make as much sense.  However, the "safety check" now seems
to make more sense as the primary method for determining that a
dependency cycle has changed.  It fits in well with other checks
in pipeline processing now that it examines the situation for a
single item.

Resolve the temporary safety check by keeping it.  It is cleaned up
a bit and moved earlier in the pipeline processing.  We can also
clean up the slightly awkward silent dequeue/re-enqueue method and
incorporate it into the safety check.  Now the process is:

  If a dpendency cycle changes, dequeue the item without reporting
  and then re-enqueue all of the items changes.

This means that if a dependency cycle (no matter how large) is
split in half, we will keep both halves (now separate) in the
pipeline.  This behavior is likely to be the most intuitive to
users.

In general, there are two ways to update a dependency cycle: with a
new patchset that changes the graph (typically only gerrit) or with
a PR message or topic change (gerrit and others).  To achieve some
consistency between these methods, we reuse the same re-enqueue method
in both cases (but in the case of a patchset superceding a change,
we don't re-enqueue the old change; but we do expect the
patchset-created event to enqueue the new version).  The timing is
still a little different, but the results are the same.

Change-Id: Ifa42b081cbd103ef04d8814c27ab5c51aa5e8335
2024-02-09 07:39:53 -08:00
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
James E. Blair fb7d24b245 Fix validate-tenants isolation
The validate-tenants scheduler subcommand is supposed to perform
complete tenant validation, and in doing so, it interacts with zk.
It is supposed to isolate itself from the production data, but
it appears to accidentally use the same unparsed config cache
as the production system.  This is mostly okay, but if the loading
paths are different, it could lead to writing cache errors into
the production file cache.

The error is caused because the ConfigLoader creates an internal
reference to the unparsed config cache and therefore ignores the
temporary/isolated unparsed config cache created by the scheduler.

To correct this, we will always pass the unparsed config cache
into the configloader.

Change-Id: I40bdbef4b767e19e99f58cbb3aa690bcb840fcd7
2024-01-31 14:58:45 -08:00
James E. Blair 7262ef7f6f Include job_uuid in NodeRequests
This is part of the circular dependency refactor.  It updates the
NodeRequest object to include the job_uuid in addition to the job_name
(which is temporarily kept for backwards compatability).  When node
requests are completed, we now look up the job by uuid if supplied.

Change-Id: I57d4ab6c241b03f76f80346b5567600e1692947a
2023-12-20 10:44:04 -08:00
James E. Blair 9201f9ee28 Store builds on buildset by uuid
This is part of the circular dependency refactor.

This updates the buildset object in memory (and zk) to store builds
indexed by frozen job uuid rather than job name.  This also updates
everal related fields and also temporary dictionaries to do the same.

This will allow us, in the future, to have more than one job/build
in a buildset with the same name (for different changes/refs).

Change-Id: I70865ec8d70fb9105633f0d03ba7c7e3e6cd147d
2023-12-12 11:58:21 -08:00
James E. Blair cb3c4883f2 Index job map by uuid
This is part of the circular dependency refactor.  It changes the
job map (a dictionary shared by the BuildSet and JobGraph classes
(BuildSet.jobs is JobGraph._job_map -- this is because JobGraph
is really just a class to encapsulate some logic for BuildSet))
to be indexed by FrozenJob.uuid instead of job name.  This helps
prepare for supporting multiple jobs with the same name in a
buildset.

Change-Id: Ie17dcf2dd0d086bd18bb3471592e32dcbb8b8bda
2023-12-12 10:22:25 -08:00
James E. Blair 071c48c5ae Freeze job dependencies with job graph
This is part of the circular dependency refactor.

Update the job graph to record job dependencies when it is frozen,
and store these dependencies by uuid.  This means our dependency
graph points to actual frozen jobs rather than mere job names.

This is a pre-requisite to being able to disambiguate dependencies
later when a queue item supports multiple jobs with the same name.

The behavior where we would try to unwind an addition to the job
graph if it failed is removed.  This was originally written with the
idea that we would try to run as many jobs as possible if there was
a config error.  That was pre-zuul-v3 behavior.  Long since, in all
cases when we actually encounter an error adding to the job graph,
we bail and report that to the user.  No longer handling that
case simplifies the code somewhat and makes it more future-proof
(while complicating one of the tests that relied on that behavior
as a shortcut).

This attempts to handle upgrades by emulating the old behavior
if a job graph was created on an older model version.  Since it
relies on frozen job uuids, it also attempts to handle the case
where a frozenjob does not have a uuid (which is a very recent
model change and likely to end up in the same upgrade for some
users) by emulating the old behavior.

Change-Id: I0070a07fcb5af950651404fa8ae66ea18c6ca006
2023-12-06 16:41:18 -08:00
Zuul 11c06b5939 Merge "Improve error reporting for circular dependencies" 2023-11-09 21:03:01 +00:00
Simon Westphahl 6c6872841b
Don't schedule initial merge for branch/ref items
Currently we schedule a merge/repo-state for every item that is added to
a pipeline. For changes and tags we need the initial merge in order to
build a dynamic layout or to determine if a given job variant on a
branch should match for a tag.

For other change-types (branches/refs) we don't need the initial
merge/repo-state before we can freeze the job graph. The overhead of
those operations can become quite substantial for projects with a lot of
branches that also have a periodic pipeline config, but only want to
execute jobs for a small subset of those branches.

With this change, branch/ref changes that don't execute any jobs will
be removed without triggering any merge/repo state requests.

In addition we will reduce the number of merge requests for branch/ref
changes as the initial merge is skipped in all cases.

Change-Id: I157ed52dba8f4e197b35798217b23ec7f035b2d9
2023-10-27 12:20:57 +02:00
James E. Blair 6fda08b8eb Load configuration from unknown dynamic branches
The always-dynamic-branches option specifies a regex such that
branches that match it are ignored for Zuul configuration purposes,
unless a change is proposed, at which point the zuul.yaml config
is read from the branch in the same way as if a change was made
to the file.

Because creading and deleting dynamic branches do not cause
reconfigurations, the list of project branches stored on a tenant
may not be updated after a dynamic branch is created.  This list
is used to decide from what branches to try to load config files.

Together, all of this means that if you create an always-dynamic-branch
and propose a change to it shortly afterwords, Zuul is likely to
ignore the change since it won't know to load configuration from
its branch.

To correct this, we extend the list of branches from which Zuul
knows to read configuration with the branch of the item under test
and any items ahead of it in the queue (but only if these branches
match the dynamic config regex so that we don't include an excluded
branch).

Also add a log entry to indicate when we are loading dynamic
configuration from a file.

Change-Id: Ibd15ce4a154311cdb523c5603f4ad17f761d1078
2023-10-09 15:38:46 -07:00
Simon Westphahl e92bb01447
Improve error reporting for circular dependencies
Make it clear from the message reported to the change which project
doesn't allow circlar dependencies.

Change-Id: Id614265535dd6f2af419f7eda7dda9799f18ea56
2023-09-29 11:35:00 +02:00
Simon Westphahl 1da1c5e014
Fix child job skip with paused deduplicated parent
When a build pauses, it can also return a list of child jobs to execute.
If the paused build was deduplicated we need to call `setResult()` on
all items that have that particular build.

Change-Id: Iead5c02032bccf46852ee6b2c8adf714689aa2f5
2023-09-22 12:28:45 +02:00
Zuul a75d640b8e Merge "Add a bundle-changed safety check" 2023-09-19 17:05:03 +00:00
Zuul d44b9875b0 Merge "Fix deduplicating child jobs in check" 2023-09-15 22:22:02 +00:00
Zuul 5b7b0aed5f Merge "Fix deduplication with paused jobs" 2023-09-15 22:13:52 +00:00
Zuul 4b347ce91b Merge "Avoid leaked items caused by config errors" 2023-09-15 18:41:33 +00:00
Zuul 5294c582b1 Merge "Fix deduplication of child jobs in check" 2023-09-15 18:24:14 +00:00
James E. Blair 9406bcc2d3 Add a bundle-changed safety check
Several recent bugs and attempted fixes have shown that there may
be some edge cases in the handling of dependency cycles that have
the potential to cause jobs to run with the wrong changes in place.

While we work on longer-term fixes to those, add a safety check to
the pipeline processor so that if we detect a change to the bundle
contents of a queue item, we remove the item from the queue.  We
may not necessarily perform the optimal behavior with this, but it
should keep us from running jobs with known incorrect changes.

This change requires some minor adjustment to some existing unit
tests (it doesn't significantly change the outcome, but it does
cause some jobs to be aborted sooner).  A followup change will add
some more tests which would fail without this change but merit
separate review.

Change-Id: Ia7b1d5b7e3d6910a709478082929f96364ca996b
2023-09-13 14:07:19 -07:00
James E. Blair 930c42cd28 Fix deduplicating child jobs in check
If a second change is enqueued in check significantly after the
first, then node requests for child jobs may not be deduplicated
correctly.

Before deduplicating a build, Zuul applies parent data to child
jobs, then compares the child jobs to determine if they are
equivalent.  If so, then they are deduplicated.

This only happens one level in the hierarchy at a time.  Consider
the case where a change is enqueued and both the parent and child
jobs have completed (but the change is still in the queue waiting
on a third, unrelated, job).

If the second change in the bundle is enqueued, Zuul will:
1) Attempt to apply parent data to child jobs.
   Since no jobs have completed yet for this item, no parent data
   are applied.
2) Deduplicate jobs in the second change.
   Zuul will deduplicate the parent job at this point.
3) Zuul will compare the child jobs in the two changes and determine
   they are different because one has parent data and the other does
   not.
4) Zuul submits a node request for the child job.
5) On the next pipeline process, Zuul applies the parent data from
   the deduplicated parent job to the new child job.
6) Zuul deduplicates the child job, and the nodepool request is
   orphaned.

To correct this, we will repeat the process of applying parent data
to child jobs each time we find at least one build to deduplicate.

That means that all existing parent data will be applied to all jobs
on each pass through the pipeline processor no matter how deep the
dependency hierarchy.

Change-Id: Ifff17df40f0d59447f74cdde619246171279b553
2023-09-08 14:20:46 -07:00
James E. Blair 68f80f9749 Fix deduplication with paused jobs
When a deduplicated job paused, it would not wait for all children
across all queue items to complete before resuming; instead it
would wait only for the children in its own queue item.

Check all queue items a build is in before resuming it.

Change-Id: Ic2dec3a6dc58230b0873d7e8ba474bc39ed28385
2023-09-08 12:54:33 -07:00
James E. Blair 742669ab09 Fix deduplication of child jobs in check
When we deduplicate jobs, we intend to call setResult on all of
the queue items with the deduplicated build.  This only worked
in dependent pipelines because we only looked for queue items in
the current bundle.  In independent pipelines, the queue items
can be in different bundles.

To resolve this, search for items with deduplicated builds in
across the whole queue in independent pipelines (using the approach
we use when deduplicating them to begin with).

Change-Id: I16436710c47b4f22df39e0cd82d0e289b2293c32
2023-09-07 16:58:19 -07:00
James E. Blair 70c34607f5 Add support for limiting dependency processing
To protect Zuul servers from accidental DoS attacks in case someone,
say, uploads a 1k change tree to gerrit, add an option to limit the
dependency processing in the Gerrit driver and in Zuul itself (since
those are the two places where we recursively process deps).

Change-Id: I568bd80bbc75284a8e63c2e414c5ac940fc1429a
2023-09-07 11:01:29 -07:00
Felix Edel 7ba9307f11 Avoid leaked items caused by config errors
The _reportNonEqueuedItem() method is used to temporarily enqueue a
change, report it and directly dequeue it. However, when the reporting
fails e.g. due to a config error, the item will never be dequeued.

This results in a leaked change that causes the queue processor to
loop over it indefinitely.

In our case the config error was caused by disabling the branch
protection in GitHub for a release branch in a certain repository. This
branch also defined a project-template which could not be found by Zuul
anymore after the branch protection was disabled [1].

This behaviour can be reproduced in a unit test by enforcing a broken
tenant configuration that references a non-existing project template
during a pipeline run with a circular dependency.

To fix this, ensure that the temporary enqueued item in
_reportNonEqueuedItem() will be dequeued in any case.

Although this fixes the endless loop in the queue processor, the same
exception will still be raised on pipeline level ("exception processing
pipeline...").

[1]:
2023-08-28 15:28:53,507 ERROR zuul.Pipeline.example-tenant.check: [e: 06d1ab80-45b7-11ee-8c99-721bf9f22e8c] Unable to re-enqueue change <Change 0x7f066ba36090 example-tenant/project 1234,80b4068eb1fe485df59185f0c93059fe7b15c23e> which is missing dependencies
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1614, in _processOneItem
    quiet_dequeue = self.addChange(
                    ^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 583, in addChange
    if not self.enqueueChangesAhead(change, event, quiet,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/independent.py", line 73, in enqueueChangesAhead
    r = self.addChange(needed_change, event, quiet=True,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 611, in addChange
    self._reportNonEqueuedItem(change_queue,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 676, in _reportNonEqueuedItem
    if self.pipeline.tenant.layout.getProjectPipelineConfig(ci):
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8194, in getProjectPipelineConfig
    templates = self.getProjectTemplates(template_name)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/model.py", line 8127, in getProjectTemplates
    raise TemplateNotFoundError("Project template %s not found" % name)
zuul.model.TemplateNotFoundError: Project template template-foo not found

Change-Id: I2514b783b646caae2863ee1ccbac4600772fe4d6
2023-09-07 18:26:49 +02:00
James E. Blair 267f675533 Allow new warnings when errors exist
If a configuration error existed for a project on one branch
and a change was proposed to update the config on another branch,
that would activate a code path in the manager which attempts to
determine whether errors are relevant.  An error (or warning) is
relevant if it is not in a parent change, and is on the same
project+branch as the current patch.  This is pretty generous.

This means that if a patch touches Zuul configuration with a
warning, all warnings on that branch must be updated.  This was
not the intended behavior.

To correct that, we no longer consider warnings in any of the
places where we check that a queue item is failing due to
configuration errors.

An existing test is updated to include sufficient setup to trigger
the case where a new valid configuration is added to a project
with existing errors and warnings.

A new test case is added to show that we can add new deprecations
as well, and that they are reported to users as warnings.

Change-Id: Id901a540fce7be6fedae668390418aca06a950af
2023-09-04 14:02:13 -07:00
Clark Boylan 4effa487f5 Allow new configs to be used when warnings are present
Prior to this change we checked if there are any errors in the config
(which includes warnings by default) and return a build error if there
are. Now we only check that proper errors are present when returning
errors.

This allows users to push config updates that don't fix all warnings
immediately. Without this any project with warnings present would need
to fix all warnings before newly proposed configs can take effect. This
is particularly problematic for speculative testing, but in general it
seems like warnings shouldn't be fatal.

Change-Id: I31b094fb366328696708b019354b843c4b94ffc0
2023-09-04 11:20:13 -07:00
Zuul 90dce8ed12 Merge "Add pipeline queue stats" 2023-08-30 01:28:50 +00:00
Zuul fc622866ec Merge "Add window-ceiling pipeline parameter" 2023-08-30 01:28:43 +00:00
James E. Blair a316015f56 Add pipeline queue stats
Also add the configured window size to the pipeline stats.

Remove the ambiguous phrasing "since Zuul started" from some of
the counter documentation.

Change-Id: Icbb7bcfbf25a1e34d26dd865fa29f61faceb4683
2023-08-29 15:49:52 -07:00
James E. Blair 7044963857 Add window-ceiling pipeline parameter
This allows users to set a maximum value for the active window
in the event they have a project that has long stretches of
passing tests but they still don't want to commit too many resources
in case of a failure.

We should all be so lucky.

Change-Id: I52b5f3a9e7262b88fb16afc4520b35854e8df184
2023-08-29 15:43:28 -07:00
Tobias Henkel 188e1c36ef Only report dequeue if we have reported start
The dequeue reporting has initially been introduced in order to make
it possible to mark pending check runs as canceled. This is currently
done unconditionally even if zuul hasn't reported the start
already. This leads to occssional spam of canceled check runs that
weren't supposed to be reported at all. For instance we've seen this
on a repo similar to zuul-jobs that is part of all tenants but is only
gated in one tenant. When a PR in such a repo is approved it enters
the gate in all tenants but doesn't run any jobs in all but one
tenant. If the item gets dequeued before the job freezing has been
finished zuul reports canceled check runs from wrong tenants. This
doesn't harm the workflow but leads to user confusion.

A similar problem can be observed when a user creates a PR against a
non-protected branch which typically runs no jobs. In this case an
abandon of the PR can also lead to canceled check run reporting where
zuul was not supposed to report anything at all on the pr.

This can be fixed by skipping dequeue reporting if start hasn't been
reported yet.

Change-Id: Ibd1d8047168dcb5035c90fa25a629f4a7714c0f7
2023-08-17 15:46:05 -07:00
Zuul 0a82e72521 Merge "Don't cancel Github check-run during re-enqueue" 2023-08-15 07:12:39 +00:00
James E. Blair 76f791e4d3 Fix linting errors
A new pycodestyle errors on ",\".  We only use that to support
Python <3.10, and since Zuul is now targeting only 3.11+, these
instances are updated to use implicit continuation.

An instance of "==" is changed to "is".

A function definition which overrides an assignment is separated
so that the assignment always occurs regardless of whether it
ends up pointing to the function def.

Finally, though not required, since we're editing the code anyway
for nits, some typing info is removed.

Change-Id: I6bb096b87582ab1450bed02541483fc6f1d6c44a
2023-08-02 10:28:22 -07:00
Zuul 816afcfdd1 Merge "Add manager/reporter support for config warnings" 2023-07-21 06:11:19 +00:00
James E. Blair 5be57fb87e Add manager/reporter support for config warnings
We recently added a severity field to configuration errors (but all
errors are currently at the "error" severity).  To prepare for
"warning" severity, update the pipeline managers and reporters to
expect both warnings and errors.

Errors will still trigger buildset failures, but warnings will not.
Both will be reported as comments.

Change-Id: Ia24e91f5ddff7d9869e9e83886f996e4f425e110
2023-07-20 16:20:22 -07:00
Simon Westphahl ea5f8fea7c
Don't cancel Github check-run during re-enqueue
So far, when the scheduler re-enqueued a change that was missing
dependencies, it also reported the Github check-run as cancelled but did
not report start as the re-enqueued item was added as a "quiet" item.

The check-run on Github was still marked as success after the item
finished. But until then it appeared as cancelled even if the change was
successfully re-enqueued.

To fix this we'll not call the dequeue reporters when the change could
be re-enqueued. The dequeued item will still be reported to the
database though.

Change-Id: Iea465ca1d9132322b912f7723e3ae41a8c6d3002
2023-07-20 12:45:27 +02:00
James E. Blair 2436c1a5df Don't issue multiple merge requests for bundles
In I82848367bd6f191ec5ae5822a1f438070cde14e1 we avoided spawning
merge jobs for non-live items.

In Id533772f35ebbc76910398e0e0fa50a3abfceb52 we backed that out
partially by spawning merge jobs for non-live items if they update
the config (so we can create a layout).

In I38925e5fd0ed5ff45aab17d108740345716fd478 we accepted that in
the case of non-live items in a bundle that updated config, we
would spawn multiple merge jobs and each one should be responsible
for updating its own item.

However, we can revisit the assumptions in
Id533772f35ebbc76910398e0e0fa50a3abfceb52 which appears not to have
taken bundles into consideration.

A bundle should have the same files results for every item in the
bundle, so, channeling the original spirit of
I82848367bd6f191ec5ae5822a1f438070cde14e1, we can try to avoid
spawning merge jobs for multiple items in a bundle.  This is an
alternate solution to the issue addressed by
I38925e5fd0ed5ff45aab17d108740345716fd478 in that rather than
accepting that we will receive multiple merge jobs in the case of
a bundle with non-live items that each update config, we will instead
receive only one merge job for the entire bundle regardless of
whether they update config, or even whether they are live.

This is accomplished by establishing a single "bundle item" for
the bundle which is defined as the first live item in the bundle.
This is the only item in the bundle that will spawn merge jobs.
When the merge job for that item completes, all of the items in
the bundle will be updated with the results.

Change-Id: Icfe1f2a126eb13349b510107a305c6eef7b622fb
2023-06-26 10:42:17 +00:00