Commit Graph

557 Commits

Author SHA1 Message Date
Zuul a3abea408b Merge "Emit per-branch queue stats separately" 2024-03-25 19:22:37 +00:00
Zuul 5e7c2f2ef6 Merge "Add job name back to node request data" 2024-03-11 10:12:16 +00:00
Simon Westphahl e41af7c312
Add job name back to node request data
With the circular dependency refactoring we also removed the job name
from the requestor data in the node request. However, this could
previously be used as part of the dynamic-tags in Nodepool which might
be useful for billing and cost calculations.

Add back the job name so those use-cases start working again.

Change-Id: Ie3be39819bf84d05a7427cd0e859f485de90835d
2024-03-07 08:02:30 +01:00
James E. Blair 794545fc64 Emit per-branch queue stats separately
We currently emit 4 statsd metrics for each shared queue, but in
the case that a queue is configured as per-branch, we disregard
the branch and emit the stats under the same hierarchy for any
branch of that queue.  This means that if we have a queue for
integrated-master and a queue for integrated-stable at the same
time, we would emit the stats for the master queue, then
immediately emit the same stats for the stable queue, overwriting
the master stats.

To correct this, move the metrics down a level in the case that
the queue is configured per-branch, and include the branch name
in the key.

Change-Id: I2f4b22394bc3774410a02ae76281eddf080e5c7f
2024-03-06 06:32:22 -08:00
James E. Blair 3e4caaac4b Produce consistent merge commit shas
Use a fixed timestamp and merge message so that zuul mergers
produce the exact same commit sha each time they perform a merge
for a queue item.  This can help correlate git repo states for
different jobs in the same change as well as across different
changes in the case of a dependent change series.

The timestamp used is the "configuration time" of the queue item
(ie, the time the buildset was created or reset).  This means
that it will change on gate resets (which could be useful for
distinguishing one run of a build from another).

Change-Id: I3379b19d77badbe2a2ec8347ddacc50a2551e505
2024-02-26 16:32:46 -08:00
James E. Blair 6819f47525 Fix test_timer_preserve_jobs race
This test was racing the assertFinalState method which checks that
all buildsets are complete.  But we might leave timer builds running
at the end of the job.

To avoid this, adopt the model used by most timer tests: add and
remove the timer pipeline configuration during the test so that the
system is idle at shutdown.

Change-Id: I2d9a7761686ddb0263bbac1b9e8b3cbc476c22b1
2024-02-12 13:11:43 -08:00
James E. Blair 1cc276687a Change status json to use "refs" instead of "changes"
This is mostly an internal API change to replace the use of the
word "change" with "ref" in the status json.  This matches the
database and build/buildsets records.

Change-Id: Id468d16d6deb0af3d1c0f74beb1b25630455b8f9
2024-02-09 07:39:52 -08:00
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
James E. Blair 9201f9ee28 Store builds on buildset by uuid
This is part of the circular dependency refactor.

This updates the buildset object in memory (and zk) to store builds
indexed by frozen job uuid rather than job name.  This also updates
everal related fields and also temporary dictionaries to do the same.

This will allow us, in the future, to have more than one job/build
in a buildset with the same name (for different changes/refs).

Change-Id: I70865ec8d70fb9105633f0d03ba7c7e3e6cd147d
2023-12-12 11:58:21 -08:00
Simon Westphahl 68d7a99cee
Send job parent data + artifacts via build request
With job parents that supply data we might end up updating the (secret)
parent data and artifacts of a job multiple times in addition to also
storing duplicate data as most of this information is part of the
parent's build result.

Instead we will collect the parent data and artifacts before scheduling
a build request and send it as part of the request paramters.

If those parameters are part of the build request the executor will use
them, otherwise it falls back on using the data from the job for
backward compatibility.

This change affects the behavior of job deduplication in that input data
from parent jobs is no longer considered when deciding if a job can be
deduplicated or not.

Change-Id: Ic4a85a57983d38f033cf63947a3b276c1ecc70dc
2023-11-15 07:24:52 +01:00
Simon Westphahl 93b4f71d8e
Store frozen jobs using UUID instead of name
Change the frozen job storage in ZK from being identified by name to
UUID. This allows us to handle multiple frozen jobs in a buildset with
the same name.

The job graph will get a new field that is a sorted list of job UUIDs
with the index being the same as for the job name list. Jobs that are
created with the old name-based path will have their UUID set to None.

This is done in preparation of the circular dependency refactoring as
detailed in I8c754689ef73ae20fd97ac34ffc75c983d4797b0.

Change-Id: Ic4df16e8e1ec6908234ecdf91fe08408182d05bb
2023-11-10 07:24:35 +01:00
Simon Westphahl 6c6872841b
Don't schedule initial merge for branch/ref items
Currently we schedule a merge/repo-state for every item that is added to
a pipeline. For changes and tags we need the initial merge in order to
build a dynamic layout or to determine if a given job variant on a
branch should match for a tag.

For other change-types (branches/refs) we don't need the initial
merge/repo-state before we can freeze the job graph. The overhead of
those operations can become quite substantial for projects with a lot of
branches that also have a periodic pipeline config, but only want to
execute jobs for a small subset of those branches.

With this change, branch/ref changes that don't execute any jobs will
be removed without triggering any merge/repo state requests.

In addition we will reduce the number of merge requests for branch/ref
changes as the initial merge is skipped in all cases.

Change-Id: I157ed52dba8f4e197b35798217b23ec7f035b2d9
2023-10-27 12:20:57 +02:00
James E. Blair 1a226acbd0 Emit stats for more build results
We currently typically only emit build stats for success and failure,
but do not emit stats for NODE_FAILURE, CANCELED, SKIPPED, etc.

To round out the feature so that all build results are emitted, this
includes a small refactor to report build stats at the same time we
report build results to the database (it almost all cases).  One
exception to that is when we receive a non-current build result --
that generally happens after a build is canceled, so we don't need
to emit the stats again.

Change-Id: I3bdf4e2643e151e2eae9f204f62cdc106df876b4
2023-09-26 11:32:03 -07:00
Zuul 90dce8ed12 Merge "Add pipeline queue stats" 2023-08-30 01:28:50 +00:00
Zuul fc622866ec Merge "Add window-ceiling pipeline parameter" 2023-08-30 01:28:43 +00:00
James E. Blair a316015f56 Add pipeline queue stats
Also add the configured window size to the pipeline stats.

Remove the ambiguous phrasing "since Zuul started" from some of
the counter documentation.

Change-Id: Icbb7bcfbf25a1e34d26dd865fa29f61faceb4683
2023-08-29 15:49:52 -07:00
James E. Blair 7044963857 Add window-ceiling pipeline parameter
This allows users to set a maximum value for the active window
in the event they have a project that has long stretches of
passing tests but they still don't want to commit too many resources
in case of a failure.

We should all be so lucky.

Change-Id: I52b5f3a9e7262b88fb16afc4520b35854e8df184
2023-08-29 15:43:28 -07:00
James E. Blair 3d5f87359d Add configuration support for negative regex
The re2 library does not support negative lookahead expressions.
Expressions such as "(?!stable/)", "(?!master)", and "(?!refs/)" are
very useful branch specifiers with likely many instances in the wild.
We need to provide a migration path for these.

This updates the configuration options which currently accepts Python
regular expressions to additionally accept a nested dictionary which
allows specifying that the regex should be negated.  In the future,
other options (global, multiline, etc) could be added.

A very few options are currently already compiled with re2.  These are
left alone for now, but once the transition to re2 is complete, they
can be upgraded to use this syntax as well.

Change-Id: I509c9821993e1886cef1708ddee6d62d1a160bb0
2023-08-28 15:03:58 -07:00
James E. Blair 76f791e4d3 Fix linting errors
A new pycodestyle errors on ",\".  We only use that to support
Python <3.10, and since Zuul is now targeting only 3.11+, these
instances are updated to use implicit continuation.

An instance of "==" is changed to "is".

A function definition which overrides an assignment is separated
so that the assignment always occurs regardless of whether it
ends up pointing to the function def.

Finally, though not required, since we're editing the code anyway
for nits, some typing info is removed.

Change-Id: I6bb096b87582ab1450bed02541483fc6f1d6c44a
2023-08-02 10:28:22 -07:00
Zuul e812ce6a3d Merge "Add missing event id to management events" 2023-05-22 12:07:51 +00:00
Zuul ad532eb3d0 Merge "Allow duplicate queue definitions on same project branches" 2023-05-22 10:53:55 +00:00
James E. Blair d0776916de Refresh builds even less
Change I3824af6149bf27c41a8d895fc682236bd0d91f6b intended to refresh
builds from ZK only when necessary.  Before that change we would
refresh them only if they did not have a result (because once they
have a result they won't change).  That change should have reduced
that so that in the cases where we don't have a result yet, we
still only refresh if the build has changed.

In other words, that change should have used an "and" instead of
an "or".

Logically, if builds can't change after they have a result, then
the check of whether we have a result is not necessary.  So rather
that change the operator, we can just drop the build.result check
altogether and rely on the object update check.  This has the effect
of making the code more future proof as well, in that we remove
the assumption that the build will never change after receiving a
result.

This change also surfaced a bug in the original implementation:
because refreshing the Build objects happens inside the deserialize
method of the BuildSet object, the BuildSet has not actually updated
its build_versions variable from ZK yet, which means our comparisons
in shouldRefreshBuild were using outdated data.  To correct, we now
pass in the newly deserialized value.  And the same for Jobs.

Change-Id: Ie688f2ee0343cab5d82776ccfc7b0f2edc5f91e5
2023-05-18 13:33:02 -07:00
Simon Westphahl 711e1e5c98
Add missing event id to management events
The change management events via Zuul web and the command socket did not
have an event ID assigned. This made it harder to debug issues where we
need to find the logs related to a certain action.

Change-Id: I05ccbc13c7f906f91e13fb66e4a01a51fc822676
2023-04-14 08:27:29 +02:00
James E. Blair 6caba8e057 Allow duplicate queue definitions on same project branches
Just like secrets and semaphores, queues can be defined in-repo in
unstrusted projects with multiple branches.  To aid the branching
workflow in these cases, we will now ignore duplicate queue definitions
on multiple branches of the same project if they are identical.

Change-Id: Ib74e71f425f8e2835ac0000fd76fde478b9d1653
2023-03-09 15:29:15 -08:00
Joshua Watt 28428942f4 merger: Keep redundant cherry-pick commits
In normal git usage, cherry-picking a commit that has already been
applied and doesn't do anything or cherry-picking an empty commit causes
git to exit with an error to let the user decide what they want to do.
However, this doesn't match the behavior of merges and rebases where
non-empty commits that have already been applied are simply skipped
(empty source commits are preserved).

To fix this, add the --keep-redundant-commit option to `git cherry-pick`
to make git always keep a commit when cherry-picking even when it is
empty for either reason. Then, after the cherry-pick, check if the new
commit is empty and if so back it out if the original commit _wasn't_
empty.

This two step process is necessary because git doesn't have any options
to simply skip cherry-pick commits that have already been applied to the
tree.

Removing commits that have already been applied is particularly
important in a "deploy" pipeline triggered by a Gerrit "change-merged"
event, since the scheduler will try to cherry-pick the change on top of
the commit that just merged. Without this option, the cherry-pick will
fail and the deploy pipeline will fail with a MERGE_CONFICT.

Change-Id: I326ba49e2268197662d11fd79e46f3c020675f21
2023-03-01 16:22:17 -06:00
Simon Westphahl b23f76e677
Update reconfig event ltime on (smart) reconfig
Make sure to update the last reconfig event ltime in the layout state
during a (smart) reconfig. This is important in order to unblock
pipelines when a tenant reconfig event is lost.

So far the last reconfig event ltime was passed as -1, but wasn't set in
the layout state since the ltime is not allowed to go backward.

Change-Id: Iab04a962abbfbe901c22e4d5f1d484e3f53b0d33
2023-02-17 08:39:07 +01:00
Zuul af96e5786f Merge "Add scheduler run handler metric" 2023-02-15 08:43:19 +00:00
Zuul 5e276007c6 Merge "Fix more file opening ResourceWarnings" 2023-02-14 07:51:38 +00:00
James E. Blair 98dcd51d90 Fix race condition in pipeline change list init
Simon Westphahl describes the race condition:

> [The race condition] can occur after a reconfiguration while
> some schedulers are updating their local layout and some
> already start processing pipelines in the same tenant.
>
> In this case the pipeline manager's `_postConfig()` method that
> calls `PipelineChangeList.create(...)` races with the pipeline
> processor updating the change keys.
>
> This leads to two change lists being written as separate
> shards, that can't be correctly loaded, as all shards combined
> are expected to form a single JSON object.
>
> The sequence of events seems to be the following:
> 1. S1: pipeline processor needs to update the change keys
> 2. S1 the shard writer will delete the `change_list` key with the old
>    shards
> 3. S2: configloader calls the `_postConfig()` method
> 4. S2: `PipelineChangeList.create()` notices that the `change_list` node
>    doesn't exists in Zookeeper:
>    https://opendev.org/zuul/zuul/src/branch/master/zuul/model.py#L921
> 6. S2: the shard writer creates the first shard `0000000000`
> 7. S1: the shard writer creates the second shared `0000000001`
>
> The race condition seems to be introduced with
> Ib1e467b5adb907f93bab0de61da84d2efc22e2a7

That change updated the pipeline manager _postConfig method so
that it no longer acquires the pipeline lock when initalizing the
pipeline state and change lists.  This greatly reduces potential
delays during reconfiguration, but has, perhaps predictably, lead
to the race condition above.

In the commit message for that change, I noted that we might be
able to avoid even more work if we accept some caveats related to
error reporting.  Avoiding that work mean avoiding performing any
writes during _postConfig which addresses the root cause of the
race condition (steps 3-6 above.  Ed. note: there is no step 5).

From the commit message:

> We still need to attach new state and change list objects to
> the pipeline in _postConfig (since our pipeline object is new).
> We also should make sure that the objects exist in ZK before we
> leave that method, so that if a new pipeline is created, other
> schedulers will be able to load the (potentially still empty)
> objects from ZK.  As an alternative, we could avoid even this
> work in _postConfig, but then we would have to handle missing
> objects on refresh, and it would not be possible to tell if the
> object was missing due to it being new or due to an error.  To
> avoid masking errors, we keep the current expectation that we
> will create these objects in ZK on the initial reconfiguration.

The current change does exactly that.  We no longer perform any
ZK write operations on the state and change list objects in
_postConfig.  Instead, inside of the refresh methods, we detect
the cases where they should be newly created and do so at that
time.  This happens with the pipeline lock, so is safe against
any simultaneous operation from other components.

There will be "ERROR" level log messages indicating that reading
the state from ZK has failed when these objects are first
initialized.  To indicate that this is probably okay, they will
now be immediately followed by "WARNING" level messages explaining
that.

Strictly speaking, this particular race should only occur for the
change list object, not the pipeline state, since the race
condition above requires a sharded object and of the two, only
the change list is sharded.  However, to keep the lifecycle of
these two objects matched (and to simplify _postConfig) the same
treatment is applied to both.

Note that change I7fa99cd83a857216321f8d946fd42abd9ec427a3 merged
after Ib1e467b and changed the behavior slightly, introducing the
old_state and old_list arguments.  Curiously, the old_list
argument is effectively unused, so it is removed entirely in this
change.  Old_state still has a purpose and is retained.

Change-Id: I519348e7d5d74e675808e990920480fb6e1fb981
2023-02-10 15:03:08 -08:00
James E. Blair c3334743f6 Fix race in test_queue unit tests
These tests call getChangeQueue which mutates the pipeline state
objects in ZK.  They should therefore hold the pipeline lock while
doing this.  Otherwise, cleanup routines may run during the test
and delete the state out from under them.

Change-Id: If85d3cf66669f5786203309294528e1f528b0423
2023-02-10 15:01:28 -08:00
Clark Boylan ee3339c8e6 Fix more file opening ResourceWarnings
I've managed to get better at grepping for this and this finds some of
the stragglers. They are all file opens without closes fixed by using a
with open() context manager.

Change-Id: I7b8c8516a86558e2027cb0f01aefe2dd1849069c
2023-02-07 17:12:15 -08:00
Simon Westphahl 95ecb41c51
Add scheduler run handler metric
In order to better understand the scheduler run handler performance this
change adds a new `zuul.scheduler.run_handler` metric to measure the
duration of one run handler loop.

Change-Id: I77e862cf99d6a8445e71d7daab410d5853487dc3
2023-02-06 08:05:41 +01:00
Simon Westphahl a2b114e1a3
Periodically cleanup leaked pipeline state
So far we did not cleanup the pipeline state and event queues of deleted
pipelines. To fix that we'll remove the data of pipelines that are no
longer part of the tenant layout in a periodic cleanup task as part of
the general cleanup.

To support the use-case when a pipeline is added back we also need to
initialize the event queues during a tenant reconfiguration. The local
event queue registry usually takes care of creating the queues when the
even queue is first accessed. However, the old event queue object could
still be cached in the registry when we remove and re-add a pipeline.

For the same use-case we also need to remove the pipeline from the list
of watches in the event watcher. Otherwise we won't re-create the
children watch when the pipeline is added.

Change-Id: I02127fe462cc390c81330e717be55780bc2535eb
2023-01-24 10:25:04 +01:00
Zuul 28eefca0de Merge "Honor independent pipeline requirements for non-live changes" 2023-01-17 19:16:16 +00:00
James E. Blair 3f3101216e Honor independent pipeline requirements for non-live changes
Independent pipelines ignore requirements for non-live changes
because they are not actually executed.  However, a user might
configure an independent pipeline that requires code review and
expect a positive code-review pipeline requirement to be enforced.
To ignore it risks executing unreviewed code via dependencies.

To correct this, we now enforce pipeline requirements in independent
pipelines in the same way as dependent ones.

This also adds a new "allow-other-connections" pipeline configuration
option which permits users to specify exhaustive pipeline requirements.

Change-Id: I6c006f9e63a888f83494e575455395bd534b955f
Story: 2010515
2023-01-17 09:37:24 -08:00
James E. Blair f82ef0882c Further avoid unnecessary change dependency updates
When adding a unit test for change I4fd6c0d4cf2839010ddf7105a7db12da06ef1074
I noticed that we were still querying the dependent change 4 times instead of
the expected 2.  This was due to an indentation error which caused all 3
query retry attempts to execute.

This change corrects that and adds a unit test that covers this as well as
the previous optimization.

Change-Id: I798d8d713b8303abcebc32d5f9ccad84bd4a28b0
2023-01-04 15:33:49 -08:00
James E. Blair 592d47648e Avoid replacing timer apscheduler jobs
If a timer trigger is configured with a large jitter and a
reconfiguration happens within the jitter time, it is possible
to miss an expected scheduled trigger because the act of
reconfiguration removes and re-adds all of a tenant's timer
trigger apscheduler jobs.

To avoid this situation, we will try to preserve any jobs with
identical configurations.

Change-Id: I5d3a4d7be891fcb4b9a3f268ee347f2069aaded3
2022-11-21 13:36:45 -08:00
James E. Blair 3a981b89a8 Parallelize some pipeline refresh ops
We may be able to speed up pipeline refreshes in cases where there
are large numbers of items or jobs/builds by parallelizing ZK reads.

Quick refresher: the ZK protocol is async, and kazoo uses a queue to
send operations to a single thread which manages IO.  We typically
call synchronous kazoo client methods which wait for the async result
before returning.  Since this is all thread-safe, we can attempt to
fill the kazoo pipe by having multiple threads call the synchronous
kazoo methods.  If kazoo is waiting on IO for an earlier call, it
will be able to start a later request simultaneously.

Quick aside: it would be difficult for us to use the async methods
directly since our overall code structure is still ordered and
effectively single threaded (we need to load a QueueItem before we
can load the BuildSet and the Builds, etc).

Thus it makes the most sense for us to retain our ordering by using
a ThreadPoolExecutor to run some operations in parallel.

This change parallelizes loading QueueItems within a ChangeQueue,
and also Builds/Jobs within a BuildSet.  These are the points in
a pipeline refresh tree which potentially have the largest number
of children and could benefit the most from the change, especially
if the ZK server has some measurable latency.

Change-Id: I0871cc05a2d13e4ddc4ac284bd67e5e3003200ad
2022-11-09 10:51:29 -08:00
James E. Blair eb7e0998e6 Add a zkprofile scheduler debug command
This adds a temporary debug command to the zuul-scheduler process.

It allows an operator to enable detailed ZK request profiling for
a given tenant-pipeline.  This will be used to identify opportunities
for further optimization.

Change-Id: Id6e0ee3ffc78874e91ebcdbfe14269c93af958cd
2022-10-27 17:05:35 -07:00
Zuul 411a7d0902 Merge "Include some skipped jobs in the code-review report" 2022-10-25 21:18:36 +00:00
James E. Blair ec4c6264ca Add JobData refresh test
We try to avoid refreshing JobData from ZK when it is not necessary
(because these objects rarely change).  However, a bug in the avoidance
was recently discovered and in fact we have been refreshing them more
than necessary.

This adds a test to catch that case, along with fixing an identical
bug (the same process is used in FrozenJobs and Builds).

The fallout from these bugs may not be exceptionally large, however,
since we generally avoid refreshing FrozenJobs once a build has
started, and avoid refreshing Builds once they have completed,
meaning these bugs may have had little opportunity to show themselves.

Change-Id: I41c3451cf2b59ec18a20f49c6daf716de7f6542e
2022-10-15 14:19:10 -07:00
James E. Blair 4f97f953b3 Include some skipped jobs in the code-review report
In the recent change to omit skipped jobs when reporting, we may
have swung the pendulum too far.  While it seems that users may
not want to see a list of hundreds of skipped child_jobs, they may
want to see a list of a small number of skipped jobs due to failed
dependencies.

To try to thread the needle, we omit skipped jobs from the report
iff they were skipped due to zuul_return child_jobs; otherwise
we include them.

Change-Id: I66a223da344a93b4691a969876e887b5eec0e67c
2022-10-11 09:45:28 -07:00
Zuul e612a442e3 Merge "Add nodeset alternatives" 2022-09-16 16:52:44 +00:00
James E. Blair 1958bbad03 Add nodeset alternatives
This feature instructs Zuul to attempt a second or more node request
with a different node configuration (ie, possibly different labels)
if the first one fails.

It is intended to address the case where a cloud provider is unable
to supply specialized high-performance nodes, and the user would like
the job to proceed anyway on lower-performance nodes.

Change-Id: Idede4244eaa3b21a34c20099214fda6ecdc992df
2022-09-08 13:01:28 -07:00
James E. Blair b1ce0c469c Fix race in test_periodic_freeze_job_failure
Timer unit test jobs should disable timer triggers before ending,
otherwise we may not shut down cleanly and will fail the test.

Change-Id: I2bbbfcaa7da50cd2daedb8f7dea11eb5725d56e4
2022-09-07 10:29:54 -07:00
James E. Blair 5ac9367b25 Add config-error reporter and report config errors to DB
This adds a config-error pipeline reporter configuration option and
now also reports config errors and merge conflicts to the database
as buildset failures.

The driving use case is that if periodic pipelines encounter config
errors (such as being unable to freeze a job graph), they might send
email if configured to send email on merge conflicts, but otherwise
their results are not reported to the database.

To make this more visible, first we need Zuul pipelines to report
buildset ends to the database in more cases -- currently we typically
only report a buildset end if there are jobs (and so a buildset start),
or in some other special cases.  This change adds config errors and
merge conflicts to the set of cases where we report a buildset end.

Because of some shortcuts previously taken, that would end up reporting
a merge conflict message to the database instead of the actual error
message.  To resolve this, we add a new config-error reporter action
and adjust the config error reporter handling path to use it instead
of the merge-conflicts action.

Tests of this as well as the merge-conflicts code path are added.

Finally, a small debug aid is added to the GerritReporter so that we
can easily see in the logs which reporter action was used.

Change-Id: I805c26a88675bf15ae9d0d6c8999b178185e4f1f
2022-08-22 14:35:25 -07:00
Simon Westphahl d61b9772ff Fix zoned executor metric when unzoned is allowed
An executor can have a zone configured and at the same time allow
unzoned jobs. In this case the executor was not counted for the zoned
executor metric (online/accepting).

Change-Id: Ib39947e3403d828b595cf2479e64789e049e63cc
2022-08-11 16:04:31 +02:00
Zuul de1e2a325e Merge "Add pipeline-based merge op metrics" 2022-07-29 19:06:56 +00:00
Zuul cd984ec53c Merge "Hide skipped jobs in status/reports" 2022-07-22 16:05:21 +00:00
James E. Blair feb032d9b5 Hide skipped jobs in status/reports
For heavy users of "dispatch jobs" (where many jobs are declared as
dependencies of a single job which then mutates the child_jobs
return value to indicate which few of those should be run), there
may be large numbers of "SKIPPED" jobs in the status page and in the
final job report, which reduces the usability of both of those.

Yet it is important for users to be able to see skipped jobs since
they may represent an error (they may be inadvertently skipped).

To address this, we remove "SKIPPED" jobs from the status page by
default, but add a button at the bottom of the change box which
can toggle their display.

We remove "SKIPPED" jobs from the report, but add a note at the
bottom which says "Skipped X jobs".  Users can follow the buildset
link to see which ones were skipped.

The buildset page will continue to show all the jobs for the buildset.

Change-Id: Ie297168cdf5b39d1d6f219e9b2efc44c01e87f35
2022-07-21 18:16:42 -07:00