Commit Graph

955 Commits

Author SHA1 Message Date
Clark Boylan 3984e11020 Handle annotated and signed tags when packing refs
Zuul packs refs directly rather than rely on git to do so. The reason
for this is it greatly speeds up repo resetting. Typically there are two
pieces of information for each packged ref (sha and refname). Git
annotated and signed tags are special because they have the sha of the
tag object proper, the tag refname, and finally the sha of the object
the tag refers to.

Update Zuul's ref packing to handle this extra piece of information for
git tags.

Co-Authored-By: James E. Blair <jim@acmegating.com>
Change-Id: I828ab924a918e3ded2cd64deadf8ad0b4726eb1e
2024-04-05 13:12:59 -07:00
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
Artem Goncharov 7e7ce18e8c Check blocking_discussions_resolved in gitlab driver
In addition to `merge_status` attribute `blocking_discussions_resolved`
should be checked to know whether it makes sense to attempt merging.
In the project setting it is possible to enable "All discussions must be
resolved" check box what will result in the attribute to be set to false
once there are open discussions. With that (while merge_status is still
can_be_merged) the merge request can not be merged.

Sadly this is another badly documented case.

Change-Id: Iba3c7b424fb8acb3134622776eb1518ffddd5374
2024-01-24 20:35:28 +00:00
Zuul a1acf5f659 Merge "gitlab - avoid trigger build when reviewers added/removed" 2024-01-24 15:38:34 +00:00
Fabien Boucher ff94910877 gitlab - avoid trigger build when reviewers added/removed
This change fixes an unexpected behavior where Zuul triggers
the buildset when a reviewer is added or removed but also
in case of comment thread resolved.

Now the driver only rely on the 'update' action specifying
the 'oldrev' attribute for code change. Actually this is stated
in the doc that this attribute is set in case of code change.

https://docs.gitlab.com/ee/user/project/integrations/webhook_events.html#merge-request-events

Change-Id: Ibee53e3e9ead9a0bbfdc0d60a35dcdd4b0a0dba7
2024-01-22 10:45:29 +00:00
Benjamin Schanzel 252b63f097
Gerrit driver: fix for topics containing white space
When using gerrit topics containing white spaces, zuul fails to find the
changes contained in the topic because the query it builds does not
enclose the topic in quotes. So only the first word of the topic is
considered by the gerrit driver. Fixing this by quoting the topic in the
query.

Change-Id: I99d2890d317fb8424740e25d166d17381f1319c8
2024-01-09 15:04:47 +01:00
James E. Blair 164b1784c6 Add gerrit hashtags support
This adds support for the hashtags-changed trigger event as well
as using hashtags as pipeline and trigger requirements.

Change-Id: I1f6628d7c227d12355f651c3c822b06e2d5c5562
2023-12-07 07:07:14 -08:00
Simon Westphahl 93b4f71d8e
Store frozen jobs using UUID instead of name
Change the frozen job storage in ZK from being identified by name to
UUID. This allows us to handle multiple frozen jobs in a buildset with
the same name.

The job graph will get a new field that is a sorted list of job UUIDs
with the index being the same as for the job name list. Jobs that are
created with the old name-based path will have their UUID set to None.

This is done in preparation of the circular dependency refactoring as
detailed in I8c754689ef73ae20fd97ac34ffc75c983d4797b0.

Change-Id: Ic4df16e8e1ec6908234ecdf91fe08408182d05bb
2023-11-10 07:24:35 +01:00
James E. Blair 77633e0005 Add more deduplication tests
This adds more test cases for automatic job deduplication, as well
as some explanatory comments.

Change-Id: I5ca96ddf655e501af3c9490ea86e8cd6a13d7e44
2023-09-07 14:11:30 -07:00
James E. Blair eee6ef3fcf Strip refs/heads from gerrit default branches
The HEAD Gerrit API endpoint returns '/refs/heads/master', not
'master' as the test fixture was constructed with.  Correct this.

Change-Id: I98b0759516bd50c0eddeb9245fc951c58e80ee45
2023-09-06 07:03:27 -07:00
Zuul 3532828c82 Merge "Fix zk host env var for tests" 2023-09-05 12:52:49 +00:00
James E. Blair 5c12ea68c6 Add default branch support to the Gerrit driver
This extends the previous change to include project default branch
support for the Gerrit driver as well as GitHub.

Change-Id: I2b1f6feed72277f5e61a2789d8af5276ee4c7b05
2023-08-23 11:07:09 -07:00
James E. Blair 57a9c13197 Use the GitHub default branch as the default branch
This supplies a per-project default value for Zuul's default-branch
based on what the default branch is set to in GitHub.  This means
that if users omit the default-branch setting on a Zuul project
stanza, Zuul will automatically use the correct value.

If the value in GitHub is changed, an event is emitted which allows
us to automatically reconfigure the tenant.

This could be expanded to other drivers that support an indication
of which branch is default.

Change-Id: I660376ecb3f382785d3bf96459384cfafef200c9
2023-08-23 11:07:08 -07:00
Simon Westphahl d1d886bc03
Report error details on Ansible failure
In case of a retry there might be no logs available to help the user
understand the reason for a failure. To improve this we can the details
of the failure as part of the build result.

Change-Id: Ib9fdbdec5d783a347d1b6e5ce6510d50acfe1286
2023-08-07 10:13:16 +02:00
James E. Blair 76f791e4d3 Fix linting errors
A new pycodestyle errors on ",\".  We only use that to support
Python <3.10, and since Zuul is now targeting only 3.11+, these
instances are updated to use implicit continuation.

An instance of "==" is changed to "is".

A function definition which overrides an assignment is separated
so that the assignment always occurs regardless of whether it
ends up pointing to the function def.

Finally, though not required, since we're editing the code anyway
for nits, some typing info is removed.

Change-Id: I6bb096b87582ab1450bed02541483fc6f1d6c44a
2023-08-02 10:28:22 -07:00
Zuul 6c0ffe565f Merge "Report early failure from Ansible task failures" 2023-07-29 18:28:08 +00:00
James E. Blair a485ff5e67 Refactor Gerrit driver event sources
Gerrit supports a number of pub-sub plugins which can act as
alternatives to stream-events.  These can often be easier for
users to configure than ssh access and have the advantage of
providing queueing and delivery guarantees for messages.

Subsequent changes will add support for multiple pub-sub event
sources, so to make driver maintenance easier, this change
refactors the gerrit driver into its two current event sources:
SSH stream-events and the checks plugin.

The checks plugin is a bit of a special case in that we always
start the polling method for it, so event source selection isn't
quite as clean as for the other sources.  But it's still useful
to compartmentalize it as much as possible, so it is moved to
its own file and treated as similarly as possible.

The stream events listener behaves much more like the pub-sub
listeners will.

Change-Id: I28d0f1e37b87927c5f2dd5e9bdc162391ad66d07
2023-07-13 14:02:46 -07:00
James E. Blair 1170b91bd8 Report early failure from Ansible task failures
We can have the Ansible callback plugin tell the executor to tell
the scheduler that a task has failed and therefore the job will
fail.  This will allow the scheduler to begin a gate reset before
the failing job has finished and potentially save much developer
and CPU time.

We take some extra precautions to try to avoid sending a pre-fail
notification where we think we might end up retrying the job
(either due to a failure in a pre-run playbook, or an unreachable
host).  If that does happen then a gate pipeline might end up
flapping between two different NNFI configurations (ie, it may
perform unecessary gate resets behind the change with the retrying
job), but should still not produce an incorrect result.  Presumably
the detections here should catch that case sufficiently early, but
due to the nature of these errors, we may need to observe it in
production to be sure.

Change-Id: Ic40b8826f2d54e45fb6c4d2478761a89ef4549e4
2023-06-29 13:40:34 -07:00
Zuul c8f88a8154 Merge "Ensure cycle dependencies are enqueued ahead" 2023-05-23 16:48:47 +00:00
Simon Westphahl 381ba7c24f
Ensure cycle dependencies are enqueued ahead
This change fixes a bug related to circular dependency resolution where
non-cycle changes could be enqueued between changes of the same cycle.

This violated the invariant assumption that changes of the same
dependency cycle are enqueued in sequence. This could cause the pipeline
processor to loop indefinitely under certain conditions.

The idea behind this fix is to treat all unprocessed dependencies of
other changes in the same cycle as if they were direct dependencies of
the current change. By that we will try to enqueue dependencies of any
change in the cycle ahead of the whole cycle.

Change-Id: I3eeb9fc9f6fca73982ce01d180dca9f58868bff3
2023-05-23 09:14:16 +02:00
Clark Boylan c1b0a00c60 Only check bwrap execution under the executor
The reason for this is that containers for zuul services need to run
privileged in order to successfully run bwrap. We currently only expect
users to run the executor as privilged and the new bwrap execution
checks have broken other services as a result. (Other services load the
bwrap system bceause it is a normal zuul driver and all drivers are
loaded by all services).

This works around this by add a check_bwrap flag to connection setup and
only setting it to true on the executor. A better longer term followup
fixup would be to only instantiate the bwrap driver on the executor in
the first place. This can probably be accomplished by overriding the
ZuulApp configure_connections method in the executor and dropping bwrap
creation in ZuulApp.

Temporarily stop running the quick-start job since it's apparently not
using speculative images.

Change-Id: Ibadac0450e2879ef1ccc4b308ebd65de6e5a75ab
2023-05-17 13:45:23 -07:00
Zuul bbdbe81790 Merge "Add Gerrit pipeline trigger requirements" 2023-04-29 21:20:01 +00:00
James E. Blair 546ad5353a Add Gerrit pipeline trigger requirements
This updates the Gerrit driver to match the pattern in the GitHub
driver where instead of specifying individual trigger
requirements such as "require-approvals", instead a complete ref
filter (a la "requirements") can be embedded in the trigger
filter.

The "require-approvals" and "reject-approvals" attributes are
deprecated in favor of the new approach.

Additionally, all require filters in Gerrit are now available as
reject filters.

And finally, the Gerrit filters are updated to return
FalseWithReason so that log messages are more useful, and the
Github filters are updated to improve the language, avoid
apostraphes for ease of grepping, and match the new Gerrit
filters.

Change-Id: Ia9c749f1c8e318fe01e84e52831a9d0d2c10b203
2023-04-28 11:50:11 -07:00
James E. Blair 4e0da62214 Further fix getting topic changes by git needs
The test helper method that handles fake gerrit queries had a bug
which would cause the "topic:" queries to return all open changes.

When we correct that, we can see, by virtue of a newly raised
execption that there was some unexercised code in getChangesByTopic
which is now exercised.  This change also corrects the exception
that is raised when mutating a set while iterating over it.

Change-Id: I1874482b2c28fd1082fcd56036afb20333232409
2023-04-17 16:51:50 -07:00
Benjamin Schanzel 20eb8d15d1
Fix zk host env var for tests
NODEPOOL_ZK_HOST was used instead of ZUUL_ZK_HOST

Change-Id: Ib5c694c74a400671093ef3ccf0b7a47b3bb1eab2
2023-04-05 14:01:08 +02:00
James E. Blair 7b08cb15d4 Check Gerrit submit requirements
With newer versions of Gerrit, we are increasingly likely to encounter
systems where the traditional label requirements are minimized in favor
of the new submit requirements rules.  If Gerrit is configured to use
a submit requirement instead of a traditional label blocking rule, that
is typically done by switching the label function to "NoBlock", which,
like the "NoOp" function, will still cause the label to appear in the
"submit_record" field, but with a value of "MAY" instead of "OK", "NEED",
or "REJECT".

Instead, the interesting information will be in the "submit_requirements"
field.  In this field we can see the individual submit requirement rules
and whether they are satisfied or not.

Since submit requirements do not have a 1:1 mapping with labels, determining
whether an "UNSATISFIED" submit requirement should be ignored (because it
pertains to a label that Zuul will alter, like "Verified") is not as
straightforward is it is for submit records.  To be conservative, this
change looks for any of the "allow needs" labels (typically "Verified") in
each unsatisfied submit record and if it finds one, it ignores that record.

With this change in place, we can avoid enqueing changes which we are certain
can not be merged into gate pipelines, and will continue to enqueue changes
about which we are uncertain.

Change-Id: I667181565684d97c1d036e2db6193dc606c76c57
2023-03-28 16:19:50 -07:00
Joshua Watt 28428942f4 merger: Keep redundant cherry-pick commits
In normal git usage, cherry-picking a commit that has already been
applied and doesn't do anything or cherry-picking an empty commit causes
git to exit with an error to let the user decide what they want to do.
However, this doesn't match the behavior of merges and rebases where
non-empty commits that have already been applied are simply skipped
(empty source commits are preserved).

To fix this, add the --keep-redundant-commit option to `git cherry-pick`
to make git always keep a commit when cherry-picking even when it is
empty for either reason. Then, after the cherry-pick, check if the new
commit is empty and if so back it out if the original commit _wasn't_
empty.

This two step process is necessary because git doesn't have any options
to simply skip cherry-pick commits that have already been applied to the
tree.

Removing commits that have already been applied is particularly
important in a "deploy" pipeline triggered by a Gerrit "change-merged"
event, since the scheduler will try to cherry-pick the change on top of
the commit that just merged. Without this option, the cherry-pick will
fail and the deploy pipeline will fail with a MERGE_CONFICT.

Change-Id: I326ba49e2268197662d11fd79e46f3c020675f21
2023-03-01 16:22:17 -06:00
Zuul 2f0a02124e Merge "Handle missing diff_refs attribute" 2023-03-01 09:39:22 +00:00
Zuul 09514e2faa Merge "Cleanup some Python ResourceWarnings in the test suite" 2023-02-09 09:05:27 +00:00
Clark Boylan 753bebbb2c Cleanup some Python ResourceWarnings in the test suite
First thing we add support for PYTHONTRACEMALLOC being passed through
nox to easily enable tracebacks on emitted ResourceWarnings. We set this
value to 0 by deafult as enabling this slows things down and requires
more memory. But it is super useful locally when debugging specific
ResourceWarnings to set `PYTHONTRACEMALLOC=5` in order o correct these
issues.

With that in place we identify and correct two classes of
ResourceWarnings.

First up is the executor server not closing its statsd socket when
stopping the executor server. Address that by closing the socket if
statsd is enabled and set the statsd attribute to None to prevent
anything else from using it later.

Second is a test only issue where we don't close the fake Gerrit,
Gitlab, or Web Proxy Fixture server's HTTP socket we only shutdown the
server. Add a close call to the server after it is shutdown to correct
this.

There are potentially other ResourceWarnings to be found and cleaned up,
but sifting through the noise will be easier as we eliminate these more
widespread warnings.

Change-Id: Iddabe79be1c8557e300dde21a6b34e57b04c48e0
2023-02-06 10:30:18 -08:00
Simon Westphahl 9048706d93
Cleanup deleted pipelines and and event queues
When a pipeline is removed during a reconfiguration Zuul will cancel
active builds and node requests. However, since we no longer refresh the
pipeline state during a reconfig we can run into errors when Zuul tries
to cancel builds and node requests based on a possibly outdated pipeline
state.

2023-01-17 10:41:32,223 ERROR zuul.Scheduler: Exception in run handler:
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2007, in run
    self.process_tenant_management_queue(tenant)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2452, in process_tenant_management_queue
    self._process_tenant_management_queue(tenant)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2462, in _process_tenant_management_queue
    self._doTenantReconfigureEvent(event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 1533, in _doTenantReconfigureEvent
    self._reconfigureTenant(ctx, min_ltimes,
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 1699, in _reconfigureTenant
    self._reconfigureDeletePipeline(old_pipeline)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 1804, in _reconfigureDeletePipeline
    self.cancelJob(build.build_set, build.job,
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2930, in cancelJob
    build.updateAttributes(
  File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 193, in updateAttributes
    self._save(context, serial)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 392, in _save
    zstat = self._retry(context, self._retryableSave,
  File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 314, in _retry
    return kazoo_retry(func, *args, **kw)
  File "/opt/zuul/lib/python3.10/site-packages/kazoo/retry.py", line 126, in __call__
    return func(*args, **kwargs)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/zk/zkobject.py", line 371, in _retryableSave
    zstat = context.client.set(path, compressed_data,
  File "/opt/zuul/lib/python3.10/site-packages/kazoo/client.py", line 1359, in set
    return self.set_async(path, value, version).get()
  File "/opt/zuul/lib/python3.10/site-packages/kazoo/handlers/utils.py", line 86, in get
    raise self._exception
kazoo.exceptions.BadVersionError

To fix this we need to refresh the pipeline state prior to canceling
those active builds and node requests.

We will also take care of removing the pipeline state and the event
queues from Zookeeper if possible. Errors will be ignored as the
periodic cleanup task takes care of removing leaked pipelines.

Change-Id: I2986419636d8c6557d68d65fb6aff589aa4a680e
2023-01-24 10:25:04 +01:00
Fabien Boucher ee7842961e Handle missing diff_refs attribute
Recently, on a production Zuul acting on projects hosted on gitlab.com,
it has been discovered that a merge requested fetched from the
API (just after Zuul receives the merge request created event) could have
the "diff_refs" attribute set to None.

Related bug: https://gitlab.com/gitlab-org/gitlab/-/issues/386562

Leading to the following stacktrace in the logs:

2022-12-14 10:08:47,921 ERROR zuul.GitlabEventConnector: Exception handling Gitlab event:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 102, in run
    self.event_queue.election.run(self._run)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/election.py", line 28, in run
    return super().run(func, *args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/kazoo/recipe/election.py", line 54, in run
    func(*args, **kwargs)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 110, in _run
    self._handleEvent(event)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 246, in _handleEvent
    self.connection._getChange(change_key, refresh=True,
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 621, in _getChange
    change = self._change_cache.updateChangeWithRetry(change_key, change,
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 432, in updateChangeWithRetry
    update_func(change)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 619, in _update_change
    self._updateChange(c, event, mr)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gitlab/gitlabconnection.py", line 665, in _updateChange
    change.commit_id = change.mr['diff_refs'].get('head_sha')
AttributeError: 'NoneType' object has no attribute 'get'

The attribute "diff_refs" becomes an object (with the expected keys) few
moments later.

In order to avoid this situation, this change adds a mechanism to retry
fetching a MR until it owns some expected fields. In our case only
"diff_refs".

https://docs.gitlab.com/ee/api/merge_requests.html#response

Tests are included with that change.

Change-Id: I6f279516728def655acb8933542a02a4dbb3ccb6
2023-01-17 07:01:54 -08:00
Clark Boylan 647940925f Cleanup test logging
We were overlogging because we check for an openssl flag early and warn
if it isn't present. That warning creates a default root streamhandler
that emits to stderr causing all our logging to be emitted there.

Fix this by creating a specific logger for this warning (avoids
polluting the root logger) and add an assertion that the root logger's
handler list is empty when we modify it for testing.

Note I'm not sure why this warning is happening now and wasn't before.
Maybe our openssl installations changed or cryptography modified the
flag? This is worth investigating in a followup.

Change-Id: I2a82cd6575e86facb80b28c81418ddfee8a32fa5
2023-01-11 10:36:15 -08:00
James E. Blair 279d7fb5cd
Fix deduplication exceptions in pipeline processing
If a build is to be deduplicated and has not started yet and has
a pending node request, we store a dictionary describing the target
deduplicated build in the node_requests dictionary on the buildset.

There were a few places where we directly accessed that dictionary
and assumed the results would be the node request id.  Notably, this
could cause an error in pipeline processing (as well os potentially
some other edge cases such as reconfiguring).

Most of the time we can just ignore deduplicated node requests since
the "real" buildset will take care of them.  This change enriches
the API to help with that.  In other places, we add a check for the
type.

To test this, we enable relative_priority in the config file which
is used in the deduplication tests, and we also add an assertion
which runs at the end of every test that ensures there were no
pipeline exceptions during the test (almost all the existing dedup
tests fail this assertion before this change).

Change-Id: Ia0c3f000426011b59542d8e56b43767fccc89a22
2022-11-21 09:22:25 +01:00
Simon Westphahl c8aac6a118
Check if Github detected a merge conflict for a PR
Github uses libgit2 to compute merges without requiring a worktree [0].
In some cases this can lead to Github detecting a merge conflict while
for Zuul the PR merges fine.

To prevent such changes from entering dependent pipelines and e.g. cause
a gate reset, we'll also check if Github detected any merge conflicts.

[0] https://github.blog/2022-10-03-highlights-from-git-2-38/

Change-Id: I22275f24c903a8548bb0ef6c32a2e15ba9eadac8
2022-11-18 11:59:32 +01:00
Zuul ed013d82cc Merge "Parallelize some pipeline refresh ops" 2022-11-10 15:01:09 +00:00
James E. Blair 3a981b89a8 Parallelize some pipeline refresh ops
We may be able to speed up pipeline refreshes in cases where there
are large numbers of items or jobs/builds by parallelizing ZK reads.

Quick refresher: the ZK protocol is async, and kazoo uses a queue to
send operations to a single thread which manages IO.  We typically
call synchronous kazoo client methods which wait for the async result
before returning.  Since this is all thread-safe, we can attempt to
fill the kazoo pipe by having multiple threads call the synchronous
kazoo methods.  If kazoo is waiting on IO for an earlier call, it
will be able to start a later request simultaneously.

Quick aside: it would be difficult for us to use the async methods
directly since our overall code structure is still ordered and
effectively single threaded (we need to load a QueueItem before we
can load the BuildSet and the Builds, etc).

Thus it makes the most sense for us to retain our ordering by using
a ThreadPoolExecutor to run some operations in parallel.

This change parallelizes loading QueueItems within a ChangeQueue,
and also Builds/Jobs within a BuildSet.  These are the points in
a pipeline refresh tree which potentially have the largest number
of children and could benefit the most from the change, especially
if the ZK server has some measurable latency.

Change-Id: I0871cc05a2d13e4ddc4ac284bd67e5e3003200ad
2022-11-09 10:51:29 -08:00
James E. Blair c355adf44e Add playbook semaphores
This adds the ability to specify that the Zuul executor should
acquire a semaphore before running an individual playbook.  This
is useful for long running jobs which need exclusive access to
a resources for only a small amount of time.

Change-Id: I90f5e0f570ef6c4b0986b0143318a78ddc27bbde
2022-11-07 08:41:10 -08:00
Zuul c6340d7492 Merge "Add JobData refresh test" 2022-10-21 10:52:33 +00:00
James E. Blair ec4c6264ca Add JobData refresh test
We try to avoid refreshing JobData from ZK when it is not necessary
(because these objects rarely change).  However, a bug in the avoidance
was recently discovered and in fact we have been refreshing them more
than necessary.

This adds a test to catch that case, along with fixing an identical
bug (the same process is used in FrozenJobs and Builds).

The fallout from these bugs may not be exceptionally large, however,
since we generally avoid refreshing FrozenJobs once a build has
started, and avoid refreshing Builds once they have completed,
meaning these bugs may have had little opportunity to show themselves.

Change-Id: I41c3451cf2b59ec18a20f49c6daf716de7f6542e
2022-10-15 14:19:10 -07:00
James E. Blair e11ef280e1 Add debugging to waitUntilSettled
Some tests are failing to settle because the ZK queues are not
empty, but it is not clear which queue, and that makes the trouble
hard to track down.  Add debugging around this to try to understand
the problem more.

Change-Id: I5012dec9f80e5413c5303698325d510554d22d3a
2022-10-13 10:27:58 -07:00
Zuul 6fa84faf3f Merge "Add support for configuring and testing tracing" 2022-09-22 22:36:22 +00:00
James E. Blair ce40b29677
Add support for configuring and testing tracing
This adds support for configuring tracing in Zuul along with
basic documentation of the configuration.

It also adds test infrastructure that runs a gRPC-based collector
so that we can test tracing end-to-end, and exercises a simple
test span.

Change-Id: I4744dc2416460a2981f2c90eb3e48ac93ec94964
2022-09-19 08:42:28 +02:00
James E. Blair ccb00d6827 Log more info on gerrit 403 errors
If Gerrit returns a 403 on submit, log the text we get in reply to
help diagnose the problem.

Change-Id: I8c9b286bb63ba1703a6a8f3cd6cd9a4b86e62cf2
2022-09-06 15:04:09 -07:00
Zuul b2b36d413e Merge "Fix nodepool label query" 2022-07-27 21:24:16 +00:00
James E. Blair a9a9d32b21 Fix duplicate setResult calls in deduplicated builds
We call item.setResult after a build is complete so that the queue
item can do any internal processing necessary (for example, prepare
data structures for child jobs, or move the build to the retry_builds
list).

In the case of deduplicated builds, we should do that for every queue
item the build participates in since each item may have a different
job graph.

We were not correctly identifying other builds of deduplicated jobs
and so in the case of a dependency cycle we would call setResult on
jobs of the same name in that cycle regardless of whether they were
deduplicated.

This corrects the issue and adds a test to detect that case.

Change-Id: I4c47beb2709a77c21c11c97f1d1a8f743d4bf5eb
2022-07-25 13:22:19 -07:00
James E. Blair dcd860cfc1 Fix nodepool label query
Nodepool updated its internal interface for registering launchers.
Since Zuul uses that internal interface to determine what labels
are available, update it to match.

Change-Id: Iffa0c1c1d9ef8d195c5e1ea1b625de9d119add3b
2022-07-17 10:32:48 -07:00
James E. Blair 1d7a7df373 Fix submitWholeTopic post-merge check
The fake Gerrit did not actually merge all of the changes that
should be submitted together simultaneously.  This meant our testing
that the git repo branch pointer moved forward was not correct.

This change improves the fake Gerrit so that all changes under
submitWholeTopic are merged simultaneously.

This then shows the error where after two changes are merged
simultaneously, the git repo check fails.

That is because our check is performed by simply ensuring that the
branch pointer is at some sha other than the one it was at right
before we started the merge operation.  The sequence is this:

1) Start reporting change #1
2) Get branch sha and store on change #1: abc1
3) Submit change #1  (Gerrit also merges change #2)
4) Get branch sha: abc2
5) Verify abc1 != abc2 (success)
6) Start reporting change #2
7) Get branch sha and store on change #2: abc2
8) Submit change #2  (already merged)
9) Get branch sha: abc2
A) Verify abc2 != abc2 (failure)

This is corrected by using only the first branch sha in a bundle.
When we store the branch sha on the change, we check to see if
any other changes in the bundle for the same project+branch already
have a sha; if they do, we use that.  Otherwise we know we are
the first, so we fetch it.

This changes the steps above to:

2) Get branch sha for all Gerrit changes in bundle
7) [not needed -- branch sha already stored in step 2]
A) Verify abc1 != abc2 (success)

Change-Id: Ia9ef411cbf24d1e4e31456ddc660e5b2a6eb5321
2022-07-05 10:35:00 -07:00
Zuul c73606af95 Merge "Fix gitlab squash merge" 2022-07-01 02:44:06 +00:00
Zuul 9438b47e1e Merge "Fix merging with submitWholeTopic" 2022-06-30 06:45:30 +00:00