Commit Graph

386 Commits

Author SHA1 Message Date
Zuul 569fe78b5a Merge "Initialize github client manager if needed" 2024-02-16 17:41:06 +00:00
Simon Westphahl b6564e42ce
Fix Github protected check for renamed branches
When a branch is renamed in Github the REST API will redirect the
request to the endpoint for the new branch name. So far the the Github
client automatically followed those redirects and we did not check if
the branch name in the response matched our request.

This lead to cases where an old branch was added to the branch cache as
protected even though the branch no longer existed. This is not a
problem for the schedulers, but since there won't be any cached config
in Zookeeper, zuul-web will display a warning about missing config files
for the branch.

Since the REST endpoint also returns the (new) name of the branch we can
validate this against the requested branch name in addition to disabling
redirects.

However, this fix is not enough as the 'cachecontrol' adapter that we
use also caches the HTTP 301 redirects which is a problem when a new
branch with the same name as the renamed branch is created. To fix this
we will use a cache busting header (no-cache) to not return a cached
response in those cases.

Change-Id: I2670f951cac1bf41c6569f5495a60e9de262d4a4
2024-01-16 09:48:59 +01:00
Simon Westphahl 810191b60e
Select correct merge method for Github
Starting with Github Enterprise 3.8[0] and github.com from September
2022 on[1], the merge strategy changed from using merge-recursive to
merge-ort[0].

The merge-ort strategy is available in the Git client since version
2.33.0 and became the default in 2.34.0[2].

If not configured otherwise, we've so far used the default merge
strategy of the Git client (which varies depending on the client
version). With this change, we are now explicitly choosing the default
merge strategy based on the Github version. This way, we can reduce
errors resulting from the use of different merge strategies in Zuul and
Github.

Since the newly added merge strategies must be understood by the mergers
we also need to bump the model API version.

[0] https://docs.github.com/en/enterprise-server@3.8/admin/release-notes
[1] https://github.blog/changelog/2022-09-12-merge-commits-now-created-using-the-merge-ort-strategy/
[2] https://git-scm.com/docs/merge-strategies#Documentation/merge-strategies.txt-recursive

Change-Id: I354a76fa8985426312344818320980c67171d774
2023-10-24 07:15:39 +02:00
Ian Wienand 3c2e518c52 github: fallback to api_token when can't find installation
graphql queries (I77be4f16cf7eb5c8035ce0312f792f4e8d4c3e10) require
authentication. Enqueueing changes from GitHub (including Depends-On)
requires we run a graphql query. This means that Zuul must be able to
authenticate either via an application or api_token to support features
like Depends-On.

If the app is setup (app_id in config) but we aren't installed with
permissions on the project we're looking up, then fall back to using a
specified api_token. This will make Depends-On work.

Logging is updated to reflect whether or not we are able to fallback to
the api_token if the application is not installed. We log the lack of an
application installation at info level if we can fallback to the token,
and log at error level if we're falling back to anonymous access.

For backward compatibility we continue to fallback to anonymous access
if neither an application install or api_token are present. The reason
for this is features like Job required-projects: work fine anonymously,
and there may be Zuul installations that don't need additional
functionality.

Keep in mind that authenticated requests to GitHub get larger API rate
limits. Zuul installations should consider setting an API token even
when using an application for this reason. This gives Zuul the best
chance that fallback requests will not be rate limited.

Documentation is updated, a changelog added and several test
configuration files are padded with the required info.

Story: #2008940
Change-Id: I2107aeafc55591eea790244701567569fa6e80d4
2023-09-18 09:29:38 -07:00
James E. Blair 57a9c13197 Use the GitHub default branch as the default branch
This supplies a per-project default value for Zuul's default-branch
based on what the default branch is set to in GitHub.  This means
that if users omit the default-branch setting on a Zuul project
stanza, Zuul will automatically use the correct value.

If the value in GitHub is changed, an event is emitted which allows
us to automatically reconfigure the tenant.

This could be expanded to other drivers that support an indication
of which branch is default.

Change-Id: I660376ecb3f382785d3bf96459384cfafef200c9
2023-08-23 11:07:08 -07:00
Zuul 210ca5d235 Merge "Add github event processing debug logs" 2023-08-23 00:56:41 +00:00
Zuul cb8c2b7552 Merge "Remove unused github getBranchProtectionRule method" 2023-08-23 00:43:23 +00:00
Zuul 5dffba48a8 Merge "Increase Github event processor thread pool" 2023-08-18 06:33:41 +00:00
James E. Blair b320eba02b Remove unused github getBranchProtectionRule method
This method is no longer used since its functionality has been
replaced with graphql queries.

Change-Id: Ia92f7c5304926ee23bbb4b4727c9211fa3d063a9
2023-08-17 13:27:58 -07:00
James E. Blair 6aeb312c6e Add github event processing debug logs
Occasionally github event processing can be delayed; this will help
us narrow down some potential causes:

* installation-id lock contention
* change update
* branch cache update

Change-Id: I42af923d5aa9af03df562c447ad4873d96da40d7
2023-08-17 07:50:30 -07:00
Zuul e97d6cd540 Merge "Log durations around Github event pre-processing" 2023-08-16 18:47:54 +00:00
Simon Westphahl 9c9d69e058
Fix bug around Github token expiration
Even after increasing the grace time for Github app installation tokens
to 5min we were still seeing exceptions related to expired app tokens.

Upon furhter investigation it turned out that the current grace time had
no effect at all since we passed the *adjusted* expiry time to the
Github client, which takes it at face value and raises an exception if
the expiry time is exceeded.

To fix this we'll store the original expiry time in the token cache and
pass that directly to the Github cliendt. We then adjust the cutoff time
by the 5min grace time when checking if a token should still be
considered valid.

Change-Id: I56f51df1d57e4dd7f1f85eba4af28c2a7318ddd1
2023-05-03 07:26:39 +02:00
Simon Westphahl 640c4213b2
Log durations around Github event pre-processing
It seems like we are running into thread pool contention sometimes, so
having the time until the event processor starts and the duration of the
event pre-processing should give us more insights to confirm our
hypothesis.

Change-Id: I7532a39c761b8df6b3e9adb3625aeb1322ab4723
2023-04-13 15:24:10 +02:00
Simon Westphahl 56cf0cba25
Increase Github event processor thread pool
So far we were using the default value for the thread pool executor
running the Github event pre-processing which is determined by
`min(32, (os.cpu_count() or 1) + 4)`.

However, on a busy system this value might be too small when a lot of
events arrive at the same time and some of the events take longer to
process. To avoid contention around the thread pool we'll change the
pool size to `min(32, (os.cpu_count() or 1) * 4)` (similar to the
default value before Python 3.8).

Change-Id: Ief349638fe41fabe7ca417f23bf439f860e524a7
2023-04-13 15:10:26 +02:00
Zuul e9a3baee59 Merge "Set cache ltime when branch protection changed" 2023-03-24 15:10:33 +00:00
Zuul dd8ad88b8e Merge "Add installation_id to event log" 2023-03-23 09:35:54 +00:00
Simon Westphahl 782be9a990
Set cache ltime when branch protection changed
When we detect newly protected branche we also need to set the branch
cache ltime accordingly. Otherwise we might end up with schedulers using
an outdated branch cache during reconfig and layout update which can
result in config not being loaded.

Change-Id: Ie18ef0ce9664e58d25f34018f8eb4513bc8b559a
2023-03-23 09:14:50 +01:00
Dong Zhang f3cb00b583 Add installation_id to event log
Occasionally we need to look into hanging event processing, with the installation_id
to be included in the log, it would be easier to find out which events are blocked
in waiting for the lock.

Change-Id: I824e299501642b61a57883f4b37dc121f5ea0979
2023-03-23 08:40:25 +01:00
Zuul 286677584a Merge "Truncate Github file annotation message to 64 KB" 2023-03-22 21:51:53 +00:00
Simon Westphahl b372575b62
Truncate Github file annotation message to 64 KB
File annotations that are posted to a PR as part of a check run have a
size limit of 64KB for the message field.

Since it's unclear if this should be 64KiB or 64KB, we'll use KB as a
unit to be on the safe side.

Change-Id: I43e4cfbc3a96bf1e8a9828c55150216940a64728
2023-03-03 10:54:58 +01:00
Simon Westphahl 59857a14b5
Fix race related to PR with changed base branch
Some people use a workflow that's known as "stacked pull requests" in
order to split a change into more reviewable chunks.

In this workflow the first PR in the stack targets a protected branch
(e.g. master) and all other PRs target the unprotected branch of the
next item in the stack.

    E.g. master <- feature-A (PR#1) <- feature-B (PR#2) <- ...

Now, when the first PR in the stack is merged Github will automatically
change the target branch of dependent PRs. For the above example this
would look like the following after PR#1 is merged:

    master <- feature-B (PR#2) <- ...

The problem here is that we might still process events for a PR before
the base branch change, but the Github API already returns the info
about the updated target branch.

The problem with this is, that we used the target branch name from the
event (outdated branch name) and and the information from the change
object (new target branch) whether or not the target branch is protected
to determine if a branch was configured as protected.

In the above example Zuul might wrongly conclude that the "feature-A"
branch (taken from the event) is now protected.

In the related incident we also observed that this triggered a
reconfiguration with the wrong state of now two protected branches
(masters + feature-A). Since the project in question previously had only
one branch this lead to a change in branch matching behavior for jobs
defined in that repository.

Change-Id: Ia037e3070aaecb05c062865a6bb0479b86e4dcde
2023-03-02 12:25:42 +01:00
Simon Westphahl 7c4e66cf74
Return cached Github change on concurrent update
When a pull-request is updated concurrently on a scheduler we'll wait
until the first thread has updated the change. The problem is if we
needed to create a new PR object. In this case we'd return a Github
change that wasn't updated and also doesn't have a cache key set.

ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
    refreshed = self._process_pipeline(
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline
    self.process_pipeline_trigger_queue(tenant, pipeline)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue
    self._process_trigger_event(tenant, pipeline, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event
    pipeline.manager.addChange(change, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange
    self.updateCommitDependencies(change, None, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in updateCommitDependencies
    new_commit_needs_changes = [d.cache_key for d in dependencies]
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in <listcomp>
    new_commit_needs_changes = [d.cache_key for d in dependencies]
  File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5946, in cache_key
    return self.cache_stat.key.reference
AttributeError: 'NoneType' object has no attribute 'key'

Change-Id: I2f3012060c486d593ad857e046334c3d9bef0ed5
2023-02-17 11:17:35 +01:00
Zuul 934846b9b3 Merge "Report a config error for unsupported merge mode" 2022-12-19 23:11:50 +00:00
Zuul 8a26020eb6 Merge "Handle Github changed files errors more broadly" 2022-11-29 19:37:32 +00:00
Simon Westphahl c8aac6a118
Check if Github detected a merge conflict for a PR
Github uses libgit2 to compute merges without requiring a worktree [0].
In some cases this can lead to Github detecting a merge conflict while
for Zuul the PR merges fine.

To prevent such changes from entering dependent pipelines and e.g. cause
a gate reset, we'll also check if Github detected any merge conflicts.

[0] https://github.blog/2022-10-03-highlights-from-git-2-38/

Change-Id: I22275f24c903a8548bb0ef6c32a2e15ba9eadac8
2022-11-18 11:59:32 +01:00
Simon Westphahl e9be1d6685
Handle Github changed files errors more broadly
In more recent versions Github seems to return different/additional
exceptions when it can't generate the diff for changed files fast
enough. To make this more robust we will catch all response errors from
Github and fall back to getting the list of changed files via the
mergers in those cases.

2022-11-14 15:52:02,718 ERROR zuul.GithubEventProcessor: [e: 43745d90-6434-11ed-9182-8080263aeb92] Exception when processing event:
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubconnection.py", line 351, in run
    self._process_event()
  File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubconnection.py", line 413, in _process_event
    change = self.connection._getChange(change_key,
  File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubconnection.py", line 1428, in _getChange
    pull = self.getPull(change.project.name, change.number,
  File "/opt/zuul/lib/python3.10/site-packages/zuul/driver/github/githubconnection.py", line 1823, in getPull
    for pr_file in probj.files():
  File "/opt/zuul/lib/python3.10/site-packages/github3/structs.py", line 98, in __iter__
    json = self._get_json(response)
  File "/opt/zuul/lib/python3.10/site-packages/github3/structs.py", line 145, in _get_json
    return self._json(response, 200)
  File "/opt/zuul/lib/python3.10/site-packages/github3/models.py", line 161, in _json
    raise exceptions.error_for(response)
github3.exceptions.UnprocessableEntity: 422 Server Error: Sorry, this diff is taking too long to generate.

Change-Id: Idba1775d9d727fcccb7dc5b3a595a4875b9a4ec1
2022-11-15 12:02:24 +01:00
James E. Blair 640059a67a Report a config error for unsupported merge mode
This updates the branch cache (and associated connection mixin)
to include information about supported project merge modes.  With
this, if a project on github has the "squash" merge mode disabled
and a Zuul user attempts to configure Zuul to use the "squash"
mode, then Zuul will report a configuration syntax error.

This change adds implementation support only to the github driver.
Other drivers may add support in the future.

For all other drivers, the branch cache mixin simply returns a value
indicating that all merge modes are supported, so there will be no
behavior change.

This is also the upgrade strategy: the branch cache uses a
defaultdict that reports all merge modes supported for any project
when it first loads the cache from ZK after an upgrade.

Change-Id: I3ed9a98dfc1ed63ac11025eb792c61c9a6414384
2022-11-11 09:53:28 -08:00
Simon Westphahl 7d52b98373
Trace received Github events
We'll create a span when zuul-web receives a Github webhook event which
is then linked to the span for the event pre-processing step.

The pre-processing span context will be added to the trigger events and
with Icd240712b86cc22e55fb67f6787a0974d5308043 complete tracing of the
whole chain from receiving a Github event until a change is enqueued.

Change-Id: I1734a3a9e44f0ae01f5ed3453f8218945c90db58
2022-09-30 09:50:37 +02:00
Simon Westphahl c98f14025a Fix read-only branches error in zuul-web
When exclude-unprotected-branche is in effect and a project doesn't have
any protected branches (e.g. a wrong branch protection rule in Github or
none at all) the branch cache will contain an empty list.

Since the empty list was so far used to indicate a fetch error, those
projects showed up with a config error in zuul-web ("Will not fetch
project branches as read-only is set").

For the branch cache we need to distinguish three different cases:
1. branche cache miss (no fetch yet)
2. branch cache hit (including empty list of branches)
3. fetch error

Instead of storing an empty list in case of a fetch error we will store
a sentinel value of `None` in those cases. However, with this change we
can't use `None` as an indicator for a cache miss anymore, so we are now
raising a `LookupError` instead.

Change-Id: I5b51805f7d0a5d916a46fe89db7b32f14063d7aa
2022-07-04 11:35:36 +02:00
James E. Blair c41fcbe483 Add support for GHE repository cache
Change-Id: Iec87857aa58f71875d780da3698047dae01120d7
2022-05-05 13:39:41 -07:00
Dong Zhang 79b6252370 Fix bug in getting changed files
The fix including 2 parts:
1. For Gtihub, we use the base_sha instead of target branch to
   be passed as "tosha" parameter to get precise changed files
2. In method getFilesChanges(), use diff() result to filter out
   those files that changed and reverted between commits.

The reason we do not direcly use diff() is that for those
drivers other than github, the "base_sha" is not available yet,
using diff() may include unexpected files when target branch
has diverged from the feature branch.

This solution works for  99.9% of the caseses, it may still get
incorrect list of changed files in following corner case:
1. In non-github connection, whose base_sha is not implented, and
2. Files changed and reverted between commits in the change, and
3. The same file has also diverged in target branch.

The above corner case can be fixed by making base_sha available in
other drivers.

Change-Id: Ifae7018a8078c16f2caf759ae675648d8b33c538
2022-04-25 15:05:48 -07:00
Zuul 463cd1615b Merge "Handle Github branch protection rule webhook events" 2022-03-22 07:25:29 +00:00
Simon Westphahl c379691533 Include original path of renamed file for a PR
When a file is moved/renamed Github will only return an entry for the
file with the new name. However, the previous path also needs to be
included in the list of files. This is especially important when a Zuul
config file is renamed but also when `job.[irrelevant-]files` is used.

Change-Id: Ieba250bed57c8a9c2e7811476c202d530f2b30f1
2022-03-09 08:20:52 +01:00
James E. Blair 801a1ba551 Handle Github branch protection rule webhook events
Github now emits a webhook event when a branch protection rule changes.

Add support for that in the Github driver.  We will update the branch
protection status in the branch cache immediately, and emit a trigger
event which will cause tenant reconfigurations if the protection status
of a branch has changed in any tenants.

Note: a branch protection rule applies to any number of branches, so
we may generate multiple Zuul trigger events from a single Github
webhook event in this case.

Change-Id: I0a7af786f9c69cf67eaaf4c75f437f8cf64a054a
2022-03-01 16:33:47 -08:00
Zuul 8615daa521 Merge "Populate missing change cache entries" 2022-02-22 12:28:06 +00:00
Zuul 733c40cc0b Merge "github: a change in .gitub/ may prevent a merge" 2022-02-21 17:57:04 +00:00
James E. Blair df220cd4d6 Populate missing change cache entries
The drivers are expected to populate the change cache before
passing trigger events to the scheduler so that all the difficult
work is done outside the main loop.  Further, the cache cleanup
is designed to accomodate this so that events in-flight don't have
their change cache entries removed early.

However, at several points since moving the change cache into ZK,
programming errors have caused us to encounter enqueued changes
without entries in the cache.  This usually causes Zuul to abort
pipeline processing and is unrecoverable.

We should continue to address all incidences of those since they
represent Zuul not working as designed.  However, it would be nice
if Zuul was able to recover from this.

To that end, this change allows missing changes to be added to the
change cache.

That is primarily accomplished by adjusting the Source.getChange
method to accept a ChangeKey instead of an Event.  Events are only
available when the triggering event happens, whereas a ChangeKey
is available when loading the pipeline state.

A ChangeKey represents the minimal distinguishing characteristics
of a change, and so can be used in all cases.  Some drivers obtain
extra information from events, so we still pass it into the getChange
method if available, but it's entirely optional -- we should still
get a workable Change object whether or not it's supplied.

Ref (and derived: Branch, Tag) objects currently only store their
newrev attribute in the ChangeKey, however we need to be able to
create Ref objects with an oldrev as well.  Since the old and new
revs of a Ref are not inherent to the ref but rather the generating
event, we can't get that from the source system.  So we need to
extend the ChangeKey object to include that.  Adding an extra
attribute is troublesome since the ChangeKey is not a ZKObject and
therefore doesn't have access to the model api version.  However,
it's not too much of a stretch to say that the "revision" field
(which like all ChangeKey fileds is driver-dependent) should include
the old and new revs.  Therefore, in these cases the field is
upgraded in a backwards compatible way to include old and newrev
in the standard "old..new" git encoding format.  We also need to
support "None" since that is a valid value in Zuul.

So that we can continue to identify cache errors, any time we encounter
a change key that is not in the cache and we also don't have an
event object, we log an error.

Almost all of this commit is the refactor to accept change keys
instead of events in getChange.  The functional change to populate
the cache if it's missing basically consists of just removing
getChangeByKey and replacing it with getChange.  A test which deletes
the cache midway through is added.

Change-Id: I4252bea6430cd434dbfaacd583db584cc796dfaa
2022-02-17 13:14:23 -08:00
Zuul f648f21304 Merge "Add a model API version" 2022-01-27 22:46:49 +00:00
James E. Blair 29fbee7375 Add a model API version
This is a framework for making upgrades to the ZooKeeper data model
in a manner that can support a rolling Zuul system upgrade.

Change-Id: Iff09c95878420e19234908c2a937e9444832a6ec
2022-01-27 12:19:11 -08:00
Simon Westphahl 3d62dc862d Refresh cached branches in timer driver
The cache maintenance has an inherent data race as it is only
considering changes as relevent that are currently in any pipeline. To
prevent garbage collection of changes for in-flight events, we only
clean up items older than 2h.

Usually the driver will refresh a change when receiving a connection
event. However, this wasn't the case for trigger events created by the
timer driver.

This can lead to a race condition where a cached branch is cleaned up
while a timer triggered item is enqueued.

For consistency all non-change objects (Branch, Tag, Ref) will now be
refreshed in case the refresh flag of `getChange()` is set to True.

2022-01-24 11:31:50,815 ERROR zuul.Scheduler: Exception processing pipeline periodic-xy in tenant foobar
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.8/site-packages/zuul/scheduler.py", line 1786, in process_pipelines
    pipeline.state.refresh(ctx)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 153, in refresh
    self._load(context)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 205, in _load
    self._set(**self.deserialize(data, context))
  File "/opt/zuul/lib/python3.8/site-packages/zuul/model.py", line 690, in deserialize
    queue = ChangeQueue.fromZK(context, queue_path,
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 148, in fromZK
    obj._load(context, path=path)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 205, in _load
    self._set(**self.deserialize(data, context))
  File "/opt/zuul/lib/python3.8/site-packages/zuul/model.py", line 944, in deserialize
    item = QueueItem.fromZK(context, item_path,
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 148, in fromZK
    obj._load(context, path=path)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 205, in _load
    self._set(**self.deserialize(data, context))
  File "/opt/zuul/lib/python3.8/site-packages/zuul/model.py", line 4093, in deserialize
    change = self.pipeline.manager.resolveChangeReferences(
  File "/opt/zuul/lib/python3.8/site-packages/zuul/manager/__init__.py", line 199, in resolveChangeReferences
    return self.resolveChangeKeys(
  File "/opt/zuul/lib/python3.8/site-packages/zuul/manager/__init__.py", line 211, in resolveChangeKeys
    self._change_cache[change.cache_key] = change
AttributeError: 'NoneType' object has no attribute 'cache_key'
2022-01-24 11:31:50,811 ERROR zuul.Pipeline.foobar.periodic-xy: Unable to resolve change from key <ChangeKey github org/project Branch master None hash=da4e62f669e51a7fbef5db1a9b480b0bd42693a7febffdb47a4eb794faa300a9>

Change-Id: I62f5b816780e244e1426ab8a8871f09379293f3e
2022-01-27 11:47:54 +01:00
Tobias Henkel 4fbc3a8c1d
Enable reprime by default
Currently we by default don't reprime the installation map if we don't
find the installation id. This breaks with multiple schedulers since
updating the installation map is done via the github events which are
only processed by one scheduler. Enabling repriming by default is a
quickfix for this problem.

Change-Id: I10c619d77cdcbe530813cd64b5545b27931a7888
2022-01-14 13:55:15 +01:00
Gonéri Le Bouder 66413b17cd
github: a change in .gitub/ may prevent a merge
Github won't merge a PR if the main branch has recent change in the .github/
directory and if the PR is not based above it.

Change-Id: I595faf0750e277570965767e22c340740cf5a8d5
2021-12-20 11:00:39 -05:00
James E. Blair fb3d3f7471 Add an init phase to scheduler/web startup
This adds a new component state: INITIALIZING and the scheduler and
web components use it when they are creating their initial config.

It is safe for the scheduler to start processing tenant and pipeline
events as soon as it starts because it only processes those for
the tenants that it has already loaded.

However it is not safe for drivers to move events from their
incoming queue into the scheduler since that requires the full
tenant list.  The 4 drivers to which this applies are updated
to wait on config priming.

Zuul-web is already structured to wait until config priming
so does not need a corresponding change.

Change-Id: I36dd4927e583328434e66553aa3ff0cd7469b488
2021-11-16 13:06:32 -08:00
Felix Edel 2c900c2c4a Split up registerScheduler() and onLoad() methods
This is an early preparation step for removing the RPC calls between
zuul-web and the scheduler.

In order to do so we must initialize the ConfigLoader in zuul-web which
requires all connections to be available. Therefore, this change ensures
that we can load all connections in zuul-web without providing a
scheduler instance.

To avoid unnecessary traffic from a zuul-web instance the onLoad()
method initializes the change cache only if a scheduler instance is
available on the connection.

Change-Id: I3c1d2995e81e17763ae3454076ab2f5ce87ab1fc
2021-11-09 09:17:43 +01:00
James E. Blair a836575c60 Store connection branch cache in ZK
This will allow us to sync the branch cache to other participants
via ZK.

Change-Id: I75b2436008e7bc44e086abe680d8b98cf73102f8
2021-11-03 17:15:47 -07:00
James E. Blair 29d0534696 Never externally delete change cache entries
The change cache depends on having singleton objects for entries.
If a scheduler ever ends up with 2 objects for the same change, the
cache will refuse to update the cache with new data for the object
which is not in the cache.  However, there is a simple series of
events which could lead to this:

1) Event from source populates cache with a change.
2) Change is enqueued into pipeline.
3) Event from source triggers a data refresh of same change.
4) Data refresh fails.
5) Exception handler for data refresh deletes change from cache.

Imagine that the pipeline processor is now attempting to refresh the
change to determine whether it has merged.  At this point, updates
to the cache will fail with this error:

2021-09-28 14:25:23,057 ERROR zuul.Scheduler: Exception in pipeline processing:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/zuul/scheduler.py", line 1615, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1418, in processQueue
    item_changed, nnfi = self._processOneItem(
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1356, in _processOneItem
    self.reportItem(item)
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1612, in reportItem
    merged = source.isMerged(item.change, item.change.branch)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritsource.py", line 47, in isMerged
    return self.connection.isMerged(change, head)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritconnection.py", line 1013, in isMerged
    self._change_cache.updateChangeWithRetry(key, change, _update_change)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 330, in updateChangeWithRetry
    self.set(key, change, version)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 302, in set
    if self._change_cache[key._hash] is not change:
KeyError: 'ef075359268c2f3ee7d52ccbcb6ac51a3a5922c709e634fdb2efcf97c57095b2'

The process may continue:

6) Event from source triggers a data refresh of same change.
7) Refresh succeeds and cache is popuplated with new change object.

Then the pipeline will fail with this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/zuul/scheduler.py", line 1615, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1418, in processQueue
    item_changed, nnfi = self._processOneItem(
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1356, in _processOneItem
    self.reportItem(item)
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1612, in reportItem
    merged = source.isMerged(item.change, item.change.branch)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritsource.py", line 47, in isMerged
    return self.connection.isMerged(change, head)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritconnection.py", line 1013, in isMerged
    self._change_cache.updateChangeWithRetry(key, change, _update_change)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 330, in updateChangeWithRetry
    self.set(key, change, version)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 303, in set
    raise RuntimeError(
RuntimeError: Conflicting change objects (existing <Change 0x7f1405c188e0 starlingx/nfv 810014,2> vs. new <Change 0x7f148446c370 starlingx/nfv 810014,2> for key '{"connection_name": "gerrit", "project_name": null, "change_type": "GerritChange", "stable_id": "810014", "revision": "2"}'

To avoid this, we should never remove a change from the cache unless it is
completely unused (that is, we should only remove changes from the cache via
the prune method).  Even if it means that the change is out of date, it is
still the best information that we have, and a future event may succeed and
eventually update the change.

This removes the exception handling which deleted the change from all drivers.

Change-Id: Idbecdf717b517cce5c25975a40d9f42d57a26c9e
2021-09-28 10:21:12 -07:00
Zuul 537cf71c55 Merge "Use structured change cache keys" 2021-09-25 01:11:01 +00:00
Zuul b423630219 Merge "github: handle suspended apps" 2021-09-25 00:18:23 +00:00
James E. Blair c4268b1b46 Use structured change cache keys
This adds a ChangeKey class which is essentially a structured universal
identifier for a change-like object (Ref, Branch, Change, PR, whatever).

We can use this in ZK objects to reference changes, and by doing so, we
can in many cases avoid actually referencing the change objects
themselves.

This also updates the actual keys in ZK to be sha256sums of the structured
key (for brevity and simplicity of encoding).

Change-Id: I6cd62973d48ad3515f6aa8a8172b9e9c19fcda55
2021-09-24 13:48:37 -07:00
James E. Blair 27b677df91 Only refresh deps if change messages have changed
We only need to call the refreshDeps method if the Depends-On
list has changed.  That can only happen with a new patchset
(gerrit) or the PR body has changed (github et al).  Add a method
to determine if the PR body has changed so we can reduce the
times where we need to call this method.

Change-Id: Iaa50a274c29347397bc4e10e2c3cefc25e442879
2021-09-24 11:47:49 -07:00