Commit Graph

3889 Commits

Author SHA1 Message Date
Zuul a3abea408b Merge "Emit per-branch queue stats separately" 2024-03-25 19:22:37 +00:00
Zuul 3b19ca9cb3 Merge "Add zuul_unreachable ansible host group" 2024-03-25 18:26:14 +00:00
Zuul 0496c249be Merge "Reset jobs behind non-mergeable cycle" 2024-03-25 18:21:43 +00:00
Zuul b2c0d69cbc Merge "Add a zuul.buildset_refs variable" 2024-03-25 10:09:16 +00:00
Zuul b0a7ed2899 Merge "Attempt to preserve triggering event across re-enqueues" 2024-03-25 10:09:13 +00:00
Zuul 51ec0d9159 Merge "Use the triggering change as the zuul change" 2024-03-25 09:56:06 +00:00
James E. Blair 632839804c Add a zuul.buildset_refs variable
This adds information about the changes associated with a
circular dependency queue item.  Currently the bundle_id can be
used to identify which of the items in zuul.items is related to
the current dependency cycle.  That variable is deprecated, so
zuul.buildset_refs can be used to replace that functionality.

Since it repeats some of the information at the top level (eg
zuul.change, zuul.project, etc), the code is refactored so they
can share the dictionary construction.  That is also used by
zuul.items.  This results in a few extra fields in zuul.items
now, such as the change message, but that is relatively
inconsequential, so is not called out in the release notes.

The src_dir is similarly included in all of these places.  In
writing this change it was discovered that
zuul.items.project.src_dir always used the golang scheme, but
zuul.project.src_dir used the correct per-job workspace scheme.
This has been corrected so that they both use the per-job scheme
now.

A significant reorganization of the job variable documentation is
included.  Previously we had a section with additional variables
for each item type, but since many of these are duplicated at the
top level, in the item list, and now in the refs list, that
structure became difficult to work with.  Instead, the
documentation for each of these sections now exhaustively lists
all of the possible variables.  This makes for some repitition,
but it also means that a user can see at a glance what variables
are available, and we can deep-link to each one.  To address the
variation between different item types, the variables that mutate
based on item type now contain a definition list indicating what
types they are valid for and their respective meanings.

Change-Id: Iab8f99d4c4f40c44d630120c458539060cc725b5
2024-03-22 06:41:36 -07:00
Simon Westphahl 349c6a029d Don't reset buildset when cycle dependency merged
In case a live change depends on a cycle and the cycle is merged while
the item is still active the scheduler will detect the cycle as changed
and re-enqueue the dependent change.

The reason for this behavior is that we don't consider dependencies of
merged changes when building the dependency graph.

Change-Id: Ibc952886b56655c0705882497511b120e5a731cd
2024-03-21 13:35:50 -07:00
Simon Westphahl 305d4dbab9
Handle dependency limit errors more gracefully
When the dependency graph exceeds the configured size we will raise an
exception. Currently we don't handle those exceptions and let them
bubble up to the pipeline processing loop in the scheduler.

When this happens during trigger event processing this is only aborting
the current pipeline handling run and the next scheduler will continue
processing the pipeline as usual.

However, in case where the item is already enqueued this exception can
block the pipeline processor and lead to a hanging pipeline:

ERROR zuul.Scheduler: Exception in pipeline processing:
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/scheduler.py", line 2370, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1800, in processQueue
    item_changed, nnfi = self._processOneItem(
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 1624, in _processOneItem
    self.getDependencyGraph(item.changes[0], dependency_graph, item.event,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 822, in getDependencyGraph
    self.getDependencyGraph(needed_change, dependency_graph,
  [Previous line repeated 8 more times]
  File "/opt/zuul/lib/python3.11/site-packages/zuul/manager/__init__.py", line 813, in getDependencyGraph
    raise Exception("Dependency graph is too large")
Exception: Dependency graph is too large

To fix this, we'll handle the exception and remove the affected item.
We'll also handle the exception during enqueue and ignore the trigger
event in this case.

Change-Id: I210c5fa4c568f2bf03eedc18b3e9c9a022628dc3
2024-03-19 14:37:26 +01:00
Zuul 4d06f081bd Merge "Zuul-Web: substring search for builds, buildsets" 2024-03-19 08:34:42 +00:00
James E. Blair 99b3c11ce2 Use ProjectNotFoundError
This error is used in some places, but not all.  Correct that to
improve config error structured data.

Change-Id: Ice4fbee679ff8e7ab05042452bbd4f45ca8f1122
2024-03-18 15:09:47 -07:00
James E. Blair 1350ce8ad6 Use NodesetNotFoundError class
This error exception class went unused, likely due to complications
from circular imports.

To resolve this, move all of the configuration error exceptions
into the exceptions.py file so they can be imported in both
model.py and configloader.py.

Change-Id: I19b0f078f4d215a2e14c2c7ed893ab225d1e1084
2024-03-18 15:03:58 -07:00
Simon Westphahl 4680c58a27
Allow rerequested action for Github triggers
The 'requested' action is deprecated in favor of 'rerequested', but the
new schema did not permit the new action name.

Change-Id: I047d2676f44151e7569d38bc1df3d26ffee83202
2024-03-14 14:48:05 +01:00
Simon Westphahl 382e9d386c
Use Github label schema for 'unlabeled' actions
The schema validation for Github trigger events did not use the label
schema for 'unlabeled' actions leading to bogus config warnings.

Change-Id: I6c888d990047e611b560491be9bc784eb1981ada
2024-03-14 12:39:34 +01:00
James E. Blair 6ccbdacdf2 Attempt to preserve triggering event across re-enqueues
When a dependency cycle is updated, we will re-enqueue the changes
in the cycle so that each of the changes goes thorugh the process
of being added to the queue with the updated contents of the cycle.
That may mean omitting changes from the cycle, or adding new ones,
or even splitting into two cycles.

In that case, in order to preserve the idea of the
"triggering change", carry over the triggering change from when the
cycle was originally enqueued.

Note that this now exposes us to the novel idea that the triggering
change may not be part of the queue item.

Change-Id: I9e00009040f91d7edc31f4928e632edde4b2745f
2024-03-13 13:07:08 -07:00
James E. Blair 802c5a8ca6 Use the triggering change as the zuul change
In the case of circular dependencies with job deduplication, we
arbitrarily pick one of the changes as the zuul change (to use
when setting the zuul.change job variable and friends).  In
theory, it shouldn't matter which change we use, but in practice,
users may be surprised if it is something other than the triggering
change. Since it doesn't really matter to Zuul, let's set the zuul
change to the triggering change when possible.  It still needs to
be one of the changes for the job, so if the triggering change
itself doesn't actually run the job (easily possible if the job is
only run on dependent changes), then we will fall back to the
current behavior.  And of course the change must be one of the
item's changes, so in the case of linear dependencies, we're not
going to start setting it to some other queue item's change.

If we are unable to set it to the triggering change, then the
behavior remains undefined beyond setting it to one of the job's
changes arbitrarily.

Included in this change is a cleanup of a no-longer-needed api
migration from 12->13 related to EventInfo objects that was
missed due to a missing MODEL_API tag.

Information about the triggering change is added to the EventInfo
object to implement this feature.

Because the fallback behavior and the model upgrade behavior are
the same, we don't need to add any conditional api behavior or
upgrade testing -- in both cases we will simply use the current
behavior.

Change-Id: Iee5a7d975fea1f7491b652c406c24d73ada7a1a1
2024-03-13 13:07:08 -07:00
James E. Blair c2103f7058 Reset jobs behind non-mergeable cycle
In the case of a dependency cycle, we check the mergeability of
each change in the item before we try to merge any of them, and
dequeue the item if it looks like one of them won't be able to
merge.  However, that bypasses the normal behavior where we reset
changes behind failing items, which could lead to merging changes
that were tested with changes ahead that did not merge.

To correct this, update the cycle-can-not-be-merged dequeue stanza
with a reset, to mirror the stanza below which handles the failure
of any individual change to merge.

Change-Id: I52a9fc2da4dd89131722d69d2b5dea886eb3d51c
2024-03-13 09:03:16 -07:00
Zuul da07d5ff5e Merge "Report topic to jobs as zuul.topic" 2024-03-12 22:38:31 +00:00
Zuul 93d2118ecf Merge "Replace special characters in MQTT topic" 2024-03-12 14:32:49 +00:00
Benjamin Schanzel f9ebf6a1c9
Zuul-Web: substring search for builds, buildsets
Allow to search for builds and buildsets using substrings of job_name,
project, branch, and pipeline. This is done by placing wildcard
characters (*) into the filter string which get translated to SQL
wildcards (%), representing zero, one, or multiple characters.

Translating SQL style wildcards (%) to asterisks is done because
asterisks as wildcard chars might feel more intuitive, cf. shell file
globbing or regexp.

The SQL LIKE operator is only used if a wildcard is present in the
filter string. This is to not rely on the underlying SQL implementation
of optimizing queries with a LIKE op but no wildcard (ie. exact match),
so that we don't introduce unnecessary performance penalties.

Change-Id: I827a27915308f78fc01019bd988b34ea987c90ea
2024-03-12 13:58:01 +01:00
Zuul 1242e1b5f0 Merge "Include job dependency UUIDs in MQTT payload" 2024-03-12 11:06:52 +00:00
Simon Westphahl ba19e1fa6d
Fix retried build result and URL in MQTT payload
The wrong build object was used when formatting the result and web-URL
for a retried builds.

Change-Id: I17e2caac833ab7969382257791d6160b2e25ade8
2024-03-11 15:54:50 +01:00
Simon Westphahl 7fd84658e3
Include job dependency UUIDs in MQTT payload
Since jobs are no longer identified by name but by UUID we also need to
reference job dependencies in the MQTT payload by UUID.

For backward-compatibility we'll keep the old "dependencies" field and
add a new "job_dependencies" mapping with the job names and UUIDs.

Change-Id: Ib74b11faf72602e1708ea6364cc4a1000e3f0d3b
2024-03-11 14:19:11 +01:00
Zuul 239a4b9142 Merge "Only return the latest config for project-branch" 2024-03-11 10:57:17 +00:00
Zuul 5e7c2f2ef6 Merge "Add job name back to node request data" 2024-03-11 10:12:16 +00:00
Zuul 2354b8a631 Merge "Only use latest proposed config for project-branch" 2024-03-11 10:09:13 +00:00
Simon Westphahl c24314a47f
Replace special characters in MQTT topic
The characters '+' and '#' have a special meaning (wildcards) and are
not allowed when publishing messages.

ERROR zuul.MQTTConnection: Could not publish message to topic 'foobar/zuul/c++-test' via mqtt
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/driver/mqtt/mqttconnection.py", line 97, in publish
    self.client.publish(topic, payload=json.dumps(message), qos=qos)
  File "/opt/zuul/lib/python3.11/site-packages/paho/mqtt/client.py", line 1233, in publish
    raise ValueError('Publish topic cannot contain wildcards.')
ValueError: Publish topic cannot contain wildcards.

Change-Id: Iad2ad551151284910de076cec15b3ac6b1cbda52
2024-03-11 07:32:29 +01:00
Joshua Watt d5dcb7eb35 Report topic to jobs as zuul.topic
Reports the change topic to jobs as an ansible variable. This can be
useful for jobs that either want to name artifact output based on a
topic, or enforce that a topic is set using a zuul job.

Change-Id: I678404523d228947541160554623bf4066a729c4
2024-03-08 11:30:45 -07:00
Simon Westphahl e41af7c312
Add job name back to node request data
With the circular dependency refactoring we also removed the job name
from the requestor data in the node request. However, this could
previously be used as part of the dynamic-tags in Nodepool which might
be useful for billing and cost calculations.

Add back the job name so those use-cases start working again.

Change-Id: Ie3be39819bf84d05a7427cd0e859f485de90835d
2024-03-07 08:02:30 +01:00
James E. Blair 794545fc64 Emit per-branch queue stats separately
We currently emit 4 statsd metrics for each shared queue, but in
the case that a queue is configured as per-branch, we disregard
the branch and emit the stats under the same hierarchy for any
branch of that queue.  This means that if we have a queue for
integrated-master and a queue for integrated-stable at the same
time, we would emit the stats for the master queue, then
immediately emit the same stats for the stable queue, overwriting
the master stats.

To correct this, move the metrics down a level in the case that
the queue is configured per-branch, and include the branch name
in the key.

Change-Id: I2f4b22394bc3774410a02ae76281eddf080e5c7f
2024-03-06 06:32:22 -08:00
Zuul a56c9c0ea9 Merge "Produce consistent merge commit shas" 2024-03-06 09:47:14 +00:00
Zuul 496693bdaa Merge "Ignore circular dependencies in supercedent pipelines" 2024-03-06 01:08:28 +00:00
Simon Westphahl 76882e1b3a
Only return the latest config for project-branch
In addition to the safeguard in
Iebf49a9efe193788199197bf7846e336d96edf19 we will only return the final
config for a project-branch as part of the merge result.

Change-Id: I1eb3b75d8762aff4e1ebd057661869df985a79e2
2024-03-05 09:01:57 +01:00
James E. Blair 79a9f86c8d Ignore circular dependencies in supercedent pipelines
There are two issues with supercedent pipelines related to circular deps:

1) When operating in a post-merge configuration on changes (not refs), the
   pipeline manager would throw an exception starting with 10.0.0 because
   any time it operates on change objects, it attempts to collect the
   dependency cycle before enqueing a change, and starting with 10.0.0,
   the supercedent manager raises an exception in that case.
2) When operating in a pre-merge configuration on changes, the behavior
   regarding circular dependencies was undefined before 10.0.0.  It is
   likely that they were ignored because the manager creates a dynamic
   queue based on the project-ref, but it wasn't explicitly documented
   or tested.

To correct both of these:

Override the cycleForChange method in the supercedent manager so that it
always returns an empty cycle.

Document the expected behavior.

Add tests that cover the cases described above.

Change-Id: Icf30d488334d40a929f31c2f390e18ae599a3c42
2024-03-04 10:50:23 -08:00
Simon Westphahl c52a059821
Revise decision to not deduplicate noop jobs
Since circular dependencies are now modelled as multiple changes in a
single item there is no good reason anymore not to deduplicate noop
jobs.

So far the noop job was listed separately for each change that was part
of a cycle and with that cluttered the UI, especially for large
dependency cycles.

Change-Id: Ic8d447df7d9040a767c88127ba361badc9ef016a
2024-03-04 15:35:18 +01:00
James E. Blair d1a5291882 Fix stack_dump_handler test
If the test suite is run in an environment with yappi or objgraph
installed, the stack_dump_handler test will activate them, and so
we need to call the handler a second time to turn them off.
Otherwise, test performance or errors can occur.

Change-Id: If073f0e46b24fc4e9f1281f911ce287f5c23d4dd
2024-03-01 14:25:39 -08:00
Zuul 3d30928d39 Merge "Add some github configuration deprecations" 2024-03-01 18:54:10 +00:00
Simon Westphahl 1b3649e754
Only use latest proposed config for project-branch
When an item updates config we will schedule a merge for the proposed
change and its dependencies.

The merger will return a list of config files for each merged change.
The scheduler upon receiving the merge result will combine the collected
config files for a project-branch from all involved changes.

This lead to the problem that the old content of renamed config files
were still used when building the dynamic layout.

Since the config we receive from the merger is always exhaustive, we
just need to keep the latest config files.

Another (or additional) fix would be to only return the latest config
files for a project-branch from the mergers. However, in case of
circular dependencies it could make sense in the future to get the
update config per change to report errors more precisely.

Change-Id: Iebf49a9efe193788199197bf7846e336d96edf19
2024-03-01 14:42:33 +01:00
James E. Blair 171d4c56b1 Add some github configuration deprecations
The "event" trigger attribute can currently be a list.  Technically,
it is possible to construct a trigger with an event list, such as:

    trigger:
      github:
        - event:
            - pull_request
            - pull_request_review
          branch: master

Which would trigger on any pull_request or pull_request_review event
on the master branch.  However in practice users typically have much
more narrow event selections, such as only triggering on pull_request
events with the opened action, or a pull_request event with a certain
comment.  It is not possible to construct that example with a single
trigger; the following is invalid:

    trigger:
      github:
        - event:
            - pull_request
            - pull_request_review
          actions:
            - opened
            - commented
          branch: master
          comment: recheck

That will pass syntax validation but would only fire on a recheck
comment; it would never fire on a PR opened event because that event
won't have a comment.

To help users avoid these problems, or worse, let's limit the event
specifier to a single event (of course users can add more triggers for
other events).  That will allow us to inform users when they use
options incompatible with the event they selected.

For now, we make this a deprecation so that in the future we can
enforce it and improve feedback.

This adds syntax validation for each of the possible event/action
combinations in the case where the user has already specified a single
event.  This allows us to go ahead and issue warnings if users specify
incompatible options.  Later, all of these can become errors.

Some time ago (8.3.0) we deprecated the require-status attribute.  It
is eligible for removal now, but that predated the deprecation
warnings system.  Since we haven't yet removed it, and we now have
that system, let's add a deprecation warning for it and give users a
little more time to notice that and remove it before it becomes an
error.

When a Github user requests that a check run start again, Github emits
a "check_run" event with a "rerequested" action.  In zuul < 5.0.0, we
asked users to configure the check_run trigger with the "requested"
action and we silently translated the "rerequested" from github to the
zuul "requested".  In 5.0.0, we reversed that decision in order to
match our policy of passing through data from remote systems as
closely as possible to enable users to match the corresponding
documentation of zuul and the remote system.  We deprecated
"requested" and updated the examples in the documentation to say
"rerequested".  Unfortunately, we left the canonical documentation of
the value as "requested".  To correct this oversight, that
documentation is updated to say "rerequested" and a configuration
deprecation warning is added for uses of "requested".

The "unabel" trigger attribute is undocumented and unused.  Deprecate
it from syntax checking here so we can gracefully remove it later.

Some unit tests configs are updated since they passed validation
previously but no longer do, and the actual github pull request
review state constants ('approved', etc) are updated to match
what github sends.

Change-Id: I6bf7753d74ec0c5f19dad508c33762a7803fe805
2024-02-29 16:37:47 -08:00
Zuul 617bbb229c Merge "Fix validate-tenants isolation" 2024-02-28 02:46:55 +00:00
James E. Blair 4421a87806 Add zuul_unreachable ansible host group
This will allow users to write post-run playbooks that skip
running certain tasks on unreachable hosts.

Change-Id: I04106ad0222bcd8073ed6655a8e4ed77f43881f8
2024-02-27 13:57:07 -08:00
James E. Blair 3e4caaac4b Produce consistent merge commit shas
Use a fixed timestamp and merge message so that zuul mergers
produce the exact same commit sha each time they perform a merge
for a queue item.  This can help correlate git repo states for
different jobs in the same change as well as across different
changes in the case of a dependent change series.

The timestamp used is the "configuration time" of the queue item
(ie, the time the buildset was created or reset).  This means
that it will change on gate resets (which could be useful for
distinguishing one run of a build from another).

Change-Id: I3379b19d77badbe2a2ec8347ddacc50a2551e505
2024-02-26 16:32:46 -08:00
James E. Blair 8dd4011aa0 Monitor and report executor inode usage
This adds inodes to the hdd executor sensor and reports usage
to statsd as well.

Change-Id: Ifd9a63cfc7682f6679322e39809be69abca6827e
2024-02-19 11:20:57 -08:00
James E. Blair 922a6b53ed Make executor sensors slightly more efficient
Rather than checking all of the sensors to see if they are okay,
then collecting all the data again for stats purposes, do both
at the same time.

Change-Id: Ia974a7d013057880171fd1695a1d17169d093410
2024-02-19 09:04:41 -08:00
James E. Blair 5a8e373c3b Replace Ansible 6 with Ansible 9
Ansible 6 is EOL and Ansible 9 is available.  Remove 6 and add 9.

This is usually done in two changes, but this time it's in one
since we can just rotate the 6 around to make it a 9.

command.py has been updated for ansible 9.

Change-Id: I537667f66ba321d057b6637aa4885e48c8b96f04
2024-02-15 16:20:45 -08:00
Zuul 75860f3bf6 Merge "Fix test_timer_preserve_jobs race" 2024-02-15 18:22:11 +00:00
James E. Blair 56731826ab Increase timeouts in TestComponentRegistry (#2)
The test_executor_component test has been flaky recently.  The cause
is not clear, but one potential cause is a test node that is slow
enough that we hit our 10 second timeouts waiting for zk events
or thread scheduling.  To attempt to improve the test, increase the
timeouts we use for state changes.  25 seconds is used to avoid
interation with the 30 second zk connection timeout.

This is similar to change I263f853f57f64252f651c898897536afdb034063
which made this change to similar tests in the test_zk file.

Change-Id: I38334943ddb311ea3600573ab8ebd75bcb6279c0
2024-02-14 12:59:39 -08:00
Zuul d91efe232d Merge "Increase timeouts in TestComponentRegistry" 2024-02-12 23:34:32 +00:00
James E. Blair 6819f47525 Fix test_timer_preserve_jobs race
This test was racing the assertFinalState method which checks that
all buildsets are complete.  But we might leave timer builds running
at the end of the job.

To avoid this, adopt the model used by most timer tests: add and
remove the timer pipeline configuration during the test so that the
system is idle at shutdown.

Change-Id: I2d9a7761686ddb0263bbac1b9e8b3cbc476c22b1
2024-02-12 13:11:43 -08:00
Zuul 478d70c4b1 Merge "Fix race in test_deps_by_topic_git_needs" 2024-02-12 18:12:55 +00:00