Commit Graph

179 Commits

Author SHA1 Message Date
Simon Westphahl 4680c58a27
Allow rerequested action for Github triggers
The 'requested' action is deprecated in favor of 'rerequested', but the
new schema did not permit the new action name.

Change-Id: I047d2676f44151e7569d38bc1df3d26ffee83202
2024-03-14 14:48:05 +01:00
James E. Blair 171d4c56b1 Add some github configuration deprecations
The "event" trigger attribute can currently be a list.  Technically,
it is possible to construct a trigger with an event list, such as:

    trigger:
      github:
        - event:
            - pull_request
            - pull_request_review
          branch: master

Which would trigger on any pull_request or pull_request_review event
on the master branch.  However in practice users typically have much
more narrow event selections, such as only triggering on pull_request
events with the opened action, or a pull_request event with a certain
comment.  It is not possible to construct that example with a single
trigger; the following is invalid:

    trigger:
      github:
        - event:
            - pull_request
            - pull_request_review
          actions:
            - opened
            - commented
          branch: master
          comment: recheck

That will pass syntax validation but would only fire on a recheck
comment; it would never fire on a PR opened event because that event
won't have a comment.

To help users avoid these problems, or worse, let's limit the event
specifier to a single event (of course users can add more triggers for
other events).  That will allow us to inform users when they use
options incompatible with the event they selected.

For now, we make this a deprecation so that in the future we can
enforce it and improve feedback.

This adds syntax validation for each of the possible event/action
combinations in the case where the user has already specified a single
event.  This allows us to go ahead and issue warnings if users specify
incompatible options.  Later, all of these can become errors.

Some time ago (8.3.0) we deprecated the require-status attribute.  It
is eligible for removal now, but that predated the deprecation
warnings system.  Since we haven't yet removed it, and we now have
that system, let's add a deprecation warning for it and give users a
little more time to notice that and remove it before it becomes an
error.

When a Github user requests that a check run start again, Github emits
a "check_run" event with a "rerequested" action.  In zuul < 5.0.0, we
asked users to configure the check_run trigger with the "requested"
action and we silently translated the "rerequested" from github to the
zuul "requested".  In 5.0.0, we reversed that decision in order to
match our policy of passing through data from remote systems as
closely as possible to enable users to match the corresponding
documentation of zuul and the remote system.  We deprecated
"requested" and updated the examples in the documentation to say
"rerequested".  Unfortunately, we left the canonical documentation of
the value as "requested".  To correct this oversight, that
documentation is updated to say "rerequested" and a configuration
deprecation warning is added for uses of "requested".

The "unabel" trigger attribute is undocumented and unused.  Deprecate
it from syntax checking here so we can gracefully remove it later.

Some unit tests configs are updated since they passed validation
previously but no longer do, and the actual github pull request
review state constants ('approved', etc) are updated to match
what github sends.

Change-Id: I6bf7753d74ec0c5f19dad508c33762a7803fe805
2024-02-29 16:37:47 -08:00
James E. Blair 57a9c13197 Use the GitHub default branch as the default branch
This supplies a per-project default value for Zuul's default-branch
based on what the default branch is set to in GitHub.  This means
that if users omit the default-branch setting on a Zuul project
stanza, Zuul will automatically use the correct value.

If the value in GitHub is changed, an event is emitted which allows
us to automatically reconfigure the tenant.

This could be expanded to other drivers that support an indication
of which branch is default.

Change-Id: I660376ecb3f382785d3bf96459384cfafef200c9
2023-08-23 11:07:08 -07:00
James E. Blair 985aefc70c Add commit_id to zuul vars
We added the commit_id to the MQTT reporter a while ago so that
external systems which identify changes by commit id (the Gerrit
current patchset hexsha, or the Github PR HEAD, etc) can track
Zuul work.  To enable the same sort of interfacing with external
systems in Zuul jobs, the variable is added to the set of zuul
variables.

To make it easier for use in multiple pipelines, the variable is
set to oldrev or newrev in the non-change case.

To make it easier to confirm driver parity, a new Gerrit driver
test class is added with the intention that it should parallel
the other driver test classes over time.

Change-Id: I2826349ee9ce696c8f9de8c23276094123292196
2023-07-19 15:16:17 -07:00
Zuul 1ba52ec372 Merge "Don't add PR title in commit message on squash" 2023-03-22 21:52:14 +00:00
Simon Westphahl e3d6cb0724
Don't add PR title in commit message on squash
Github will use the PR title as the commit subject for squash merges, so
we don't need include the title once again in the commit description.

Change-Id: Id5a00701c236235f5a49abd025bcfad1b2da916c
2023-03-20 09:19:08 +01:00
Simon Westphahl 59857a14b5
Fix race related to PR with changed base branch
Some people use a workflow that's known as "stacked pull requests" in
order to split a change into more reviewable chunks.

In this workflow the first PR in the stack targets a protected branch
(e.g. master) and all other PRs target the unprotected branch of the
next item in the stack.

    E.g. master <- feature-A (PR#1) <- feature-B (PR#2) <- ...

Now, when the first PR in the stack is merged Github will automatically
change the target branch of dependent PRs. For the above example this
would look like the following after PR#1 is merged:

    master <- feature-B (PR#2) <- ...

The problem here is that we might still process events for a PR before
the base branch change, but the Github API already returns the info
about the updated target branch.

The problem with this is, that we used the target branch name from the
event (outdated branch name) and and the information from the change
object (new target branch) whether or not the target branch is protected
to determine if a branch was configured as protected.

In the above example Zuul might wrongly conclude that the "feature-A"
branch (taken from the event) is now protected.

In the related incident we also observed that this triggered a
reconfiguration with the wrong state of now two protected branches
(masters + feature-A). Since the project in question previously had only
one branch this lead to a change in branch matching behavior for jobs
defined in that repository.

Change-Id: Ia037e3070aaecb05c062865a6bb0479b86e4dcde
2023-03-02 12:25:42 +01:00
Simon Westphahl 7c4e66cf74
Return cached Github change on concurrent update
When a pull-request is updated concurrently on a scheduler we'll wait
until the first thread has updated the change. The problem is if we
needed to create a new PR object. In this case we'd return a Github
change that wasn't updated and also doesn't have a cache key set.

ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
    refreshed = self._process_pipeline(
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline
    self.process_pipeline_trigger_queue(tenant, pipeline)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue
    self._process_trigger_event(tenant, pipeline, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event
    pipeline.manager.addChange(change, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange
    self.updateCommitDependencies(change, None, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in updateCommitDependencies
    new_commit_needs_changes = [d.cache_key for d in dependencies]
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 868, in <listcomp>
    new_commit_needs_changes = [d.cache_key for d in dependencies]
  File "/opt/zuul/lib/python3.10/site-packages/zuul/model.py", line 5946, in cache_key
    return self.cache_stat.key.reference
AttributeError: 'NoneType' object has no attribute 'key'

Change-Id: I2f3012060c486d593ad857e046334c3d9bef0ed5
2023-02-17 11:17:35 +01:00
Zuul 934846b9b3 Merge "Report a config error for unsupported merge mode" 2022-12-19 23:11:50 +00:00
Simon Westphahl c8aac6a118
Check if Github detected a merge conflict for a PR
Github uses libgit2 to compute merges without requiring a worktree [0].
In some cases this can lead to Github detecting a merge conflict while
for Zuul the PR merges fine.

To prevent such changes from entering dependent pipelines and e.g. cause
a gate reset, we'll also check if Github detected any merge conflicts.

[0] https://github.blog/2022-10-03-highlights-from-git-2-38/

Change-Id: I22275f24c903a8548bb0ef6c32a2e15ba9eadac8
2022-11-18 11:59:32 +01:00
James E. Blair 640059a67a Report a config error for unsupported merge mode
This updates the branch cache (and associated connection mixin)
to include information about supported project merge modes.  With
this, if a project on github has the "squash" merge mode disabled
and a Zuul user attempts to configure Zuul to use the "squash"
mode, then Zuul will report a configuration syntax error.

This change adds implementation support only to the github driver.
Other drivers may add support in the future.

For all other drivers, the branch cache mixin simply returns a value
indicating that all merge modes are supported, so there will be no
behavior change.

This is also the upgrade strategy: the branch cache uses a
defaultdict that reports all merge modes supported for any project
when it first loads the cache from ZK after an upgrade.

Change-Id: I3ed9a98dfc1ed63ac11025eb792c61c9a6414384
2022-11-11 09:53:28 -08:00
James E. Blair 26b9b0e2fb Add rebase-merge merge mode
GitHub supports a "rebase" merge mode where it will rebase the PR
onto the target branch and fast-forward the target branch to the
result of the rebase.

Add support for this process to the merger so that it can prepare
an effective simulated repo, and map the merge-mode to the merge
operation in the reporter so that gating behavior matches.

This change also makes a few tweaks to the merger to improve
consistency (including renaming a variable ref->base), and corrects
some typos in the similar squash merge test methods.

Change-Id: I9db1d163bafda38204360648bb6781800d2a09b4
2022-10-17 14:27:05 -07:00
James E. Blair fbb97bd6b6 Add whitespace around keywords
This fixes pep8 E275 which wants whitespace after assert and del.

Change-Id: I1f8659f462aa91c3fdf8f7eb8b939b67c0ce9f55
2022-08-02 08:02:30 -07:00
James E. Blair 9a279725f9 Strictly sequence reconfiguration events
In the before times when we only had a single scheduler, it was
naturally the case that reconfiguration events were processed as they
were encountered and no trigger events which arrived after them would
be processed until the reconfiguration was complete.  As we added more
event queues to support SOS, it became possible for trigger events
which arrived at the scheduler to be processed before a tenant
reconfiguration caused by a preceding event to be complete.  This is
now even possible with a single scheduler.

As a concrete example, imagine a change merges which updates the jobs
which should run on a tag, and then a tag is created.  A scheduler
will process both of those events in succession.  The first will cause
it to submit a tenant reconfiguration event, and then forward the
trigger event to any matching pipelines.  The second event will also
be forwarded to pipeline event queues.  The pipeline events will then
be processed, and then only at that point will the scheduler return to
the start of the run loop and process the reconfiguration event.

To correct this, we can take one of two approaches: make the
reconfiguration more synchronous, or make it safer to be
asynchronous.  To make reconfiguration more synchronous, we would need
to be able to upgrade a tenant read lock into a tenant write lock
without releasing it.  The lock recipes we use from kazoo do not
support this.  While it would be possible to extend them to do so, it
would lead us further from parity with the upstream kazoo recipes, so
this aproach is not used.

Instead, we will make it safer for reconfiguration to be asynchronous
by annotating every trigger event we forward with the last
reconfiguration event that was seen before it.  This means that every
trigger event now specifies the minimum reconfiguration time for that
event.  If our local scheduler has not reached that time, we should
stop processing trigger events and wait for it to catch up.  This
means that schedulers may continue to process events up to the point
of a reconfiguration, but will then stop.  The already existing
short-circuit to abort processing once a scheduler is ready to
reconfigure a tenant (where we check the tenant write lock contenders
for a waiting reconfiguration) helps us get out of the way of pending
reconfigurations as well.  In short, once a reconfiguration is ready
to start, we won't start processing tenant events anymore because of
the existing lock check.  And up until that happens, we will process
as many events as possible until any further events require the
reconfiguration.

We will use the ltime of the tenant trigger event as our timestamp.
As we forward tenant trigger events to the pipeline trigger event
queues, we decide whether an event should cause a reconfiguration.
Whenever one does, we note the ltime of that event and store it as
metadata on the tenant trigger event queue so that we always know what
the most recent required minimum ltime is (ie, the ltime of the most
recently seen event that should cause a reconfiguration).  Every event
that we forward to the pipeline trigger queue will be annotated to
specify that its minimum required reconfiguration ltime is that most
recently seen ltime.  And each time we reconfigure a tenant, we store
the ltime of the event that prompted the reconfiguration in the layout
state.  If we later process a pipeline trigger event with a minimum
required reconfigure ltime greater than the current one, we know we
need to stop and wait for a reconfiguration, so we abort early.

Because this system involves several event queues and objects each of
which may be serialized at any point during a rolling upgrade, every
involved object needs to have appropriate default value handling, and
a synchronized model api change is not helpful.  The remainder of this
commit message is a description of what happens with each object when
handled by either an old or new scheduler component during a rolling
upgrade.

When forwarding a trigger event and submitting a tenant
reconfiguration event:

The tenant trigger event zuul_event_ltime is initialized
from zk, so will always have a value.

The pipeline management event trigger_event_ltime is initialzed to the
tenant trigger event zuul_event_ltime, so a new scheduler will write
out the value.  If an old scheduler creates the tenant reconfiguration
event, it will be missing the trigger_event_ltime.

The _reconfigureTenant method is called with a
last_reconfigure_event_ltime parameter, which is either the
trigger_event_ltime above in the case of a tenant reconfiguration
event forwarded by a new scheduler, or -1 in all other cases
(including other types of reconfiguration, or a tenant reconfiguration
event forwarded by an old scheduler).  If it is -1, it will use the
current ltime so that if we process an event from an old scheduler
which is missing the event ltime, or we are bootstrapping a tenant or
otherwise reconfiguring in a context where we don't have a triggering
event ltime, we will use an ltime which is very new so that we don't
defer processing trigger events.  We also ensure we never go backward,
so that if we process an event from an old scheduler (and thus use the
current ltime) then process an event from a new scheduler with an
older (than "now") ltime, we retain the newer ltime.

Each time a tenant reconfiguration event is submitted, the ltime of
that reconfiguration event is stored on the trigger event queue.  This
is then used as the min_reconfigure_ltime attribute on the forwarded
trigger events.  This is updated by new schedulers, and ignored by old
ones, so if an old scheduler process a tenant trigger event queue it
won't update the min ltime.  That will just mean that any events
processed by a new scheduler may continue to use an older ltime as
their minimum, which should not cause a problem.  Any events forwarded
by an old scheduler will omit the min_reconfigure_ltime field; that
field will be initialized to -1 when loaded on a new scheduler.

When processing pipeline trigger events:

In process_pipeline_trigger_queue we compare two values: the
last_reconfigure_event_ltime on the layout state which is either set
to a value as above (by a new scheduler), or will be -1 if it was last
written by an old scheduler (including in the case it was overwritten
by an old scheduler; it will re-initialize to -1 in that case).  The
event.min_reconfigure_ltime field will either be the most recent
reconfiguration ltime seen by a new scheduler forwarding trigger
events, or -1 otherwise.  If the min_reconfigure_ltime of an event is
-1, we retain the old behavior of processing the event regardless.
Only if we have a min_reconfigure_ltime > -1 and it is greater than
the layout state last_reconfigure_event_ltime (which itself may be -1,
and thus less than the min_reconfigure_ltime) do we abort processing
the event.

(The test_config_update test for the Gerrit checks plugin is updated
to include an extra waitUntilSettled since a potential test race was
observed during development.)

Change-Id: Icb6a7858591ab867e7006c7c80bfffeb582b28ee
2022-07-18 10:51:59 -07:00
James E. Blair b784d53f88 Clarify missing merge requirement message
When Zuul dequeues a change because it no longer meets the code
review systems merge requirements, it says it was dequeued
"due to a missing requirement".  To clarify it's the code review
system's merge requirements (as opposed to some kind of job,
artifact, or other change), update the text to "due to a
missing merge requirement".

Change-Id: Ia76c0564726c0e1317302e76f3f2bdae2c1ff431
2022-06-09 08:11:12 -07:00
James E. Blair c41fcbe483 Add support for GHE repository cache
Change-Id: Iec87857aa58f71875d780da3698047dae01120d7
2022-05-05 13:39:41 -07:00
Dong Zhang 79b6252370 Fix bug in getting changed files
The fix including 2 parts:
1. For Gtihub, we use the base_sha instead of target branch to
   be passed as "tosha" parameter to get precise changed files
2. In method getFilesChanges(), use diff() result to filter out
   those files that changed and reverted between commits.

The reason we do not direcly use diff() is that for those
drivers other than github, the "base_sha" is not available yet,
using diff() may include unexpected files when target branch
has diverged from the feature branch.

This solution works for  99.9% of the caseses, it may still get
incorrect list of changed files in following corner case:
1. In non-github connection, whose base_sha is not implented, and
2. Files changed and reverted between commits in the change, and
3. The same file has also diverged in target branch.

The above corner case can be fixed by making base_sha available in
other drivers.

Change-Id: Ifae7018a8078c16f2caf759ae675648d8b33c538
2022-04-25 15:05:48 -07:00
Zuul c0f9a606cc Merge "Dequeue items that can no longer merge" 2022-03-23 01:00:06 +00:00
James E. Blair 082ff730fe Dequeue items that can no longer merge
We do not allow items to be enqueued into dependent pipelines if they
do not meet the approval/mergability requirements of the code review system.
For example, an "Approved" review in gerrit, or a required status check in
github.  However, we do not check those once the item has been enqueued.
This means that if someone revokes approval, the item may remain in the queue
until it finishes, at which point it will fail to merge and be dequeued.

To avoid this, perform the canMerge check (which operates on local data)
each time we process the item.

Note: this does not perform Zuul pipeline requirement checks, but that would
be a natural follow-on from this change.

Change-Id: Iaf0ff530a9cb70052bf6a0908b28e2794dd110ae
2022-03-22 11:33:31 -07:00
Zuul 463cd1615b Merge "Handle Github branch protection rule webhook events" 2022-03-22 07:25:29 +00:00
Simon Westphahl c379691533 Include original path of renamed file for a PR
When a file is moved/renamed Github will only return an entry for the
file with the new name. However, the previous path also needs to be
included in the list of files. This is especially important when a Zuul
config file is renamed but also when `job.[irrelevant-]files` is used.

Change-Id: Ieba250bed57c8a9c2e7811476c202d530f2b30f1
2022-03-09 08:20:52 +01:00
James E. Blair 801a1ba551 Handle Github branch protection rule webhook events
Github now emits a webhook event when a branch protection rule changes.

Add support for that in the Github driver.  We will update the branch
protection status in the branch cache immediately, and emit a trigger
event which will cause tenant reconfigurations if the protection status
of a branch has changed in any tenants.

Note: a branch protection rule applies to any number of branches, so
we may generate multiple Zuul trigger events from a single Github
webhook event in this case.

Change-Id: I0a7af786f9c69cf67eaaf4c75f437f8cf64a054a
2022-03-01 16:33:47 -08:00
James E. Blair 41d6a4c033 Remove rpcclient from connection test fixtures
There are no uses of the rpc client in tests now, remove the
initialization and parameters.

Change-Id: I2d22fa73138c8cc2db96f78ecbc2dc3792cdd8bb
2022-01-17 12:49:28 -08:00
James E. Blair d75abd08b3 Remove rpcclient use from github tests
It's valuable to make sure that we handle the slightly different
(compared to gerrit) code paths of a github change identifier, so
the existing github RPC client tests are modified to directly use
the event queue system.  This still provides most of the coverage
but doesn't require turning these into full-blown zuul-web tests.

One test also relied on a client dequeue for its action, so the
same approach is used there.

Change-Id: I002ebb19112eeb37a8b9e66b2494848875a0ff7e
2022-01-17 12:42:40 -08:00
Felix Edel 9e157f1af3 Execute Github tests with only one scheduler
Currently, the canMerge check makes most of the tests fail when they are
executed with multiple schedulers.

The main reason are the fake github classes we are using as "backend" in
our tests. Since each scheduler gets a different (fake) connection, each
connection might have a different set of data during the test depending
on which scheduler handleded the trigger event.

When the canMerge check is executed by the wrong scheduler the test will
fail, as the fake connection bound to this scheduler is not aware of the
commit in question.

As this is mainly a testing issue, we decided to limit the number of
schedulers for the Github tests to 1. As a long term solution we should
implement a HTTP server that serves as fake Github backend (similar to
our FakeGerrit), so that the connection data isn't bound to a single
python object.

Change-Id: Ic4dcab061b5e0a1951e93da22debbb161a1f3ecb
2021-11-30 08:58:19 -08:00
Felix Edel 28289cb652 Combine different history approaches for merge jobs in tests
Currently, there are two "history approaches" for merge jobs in the unit
tests. Both, the RecordingMergeClient and the HoldableMergerApi store
each submitted job in a local dictionary. Those dictionaries can be used
in tests to assert that a specific number (or type) of merge job was
issued during the test.

The majority of the tests uses the history from the HoldableMergerApi.

This change adapts the two remaining test cases that use the history
from the merge client to use the history from the merger api instead.

Additionally, this change adapts the internal structure of the merger
api's history dictionary to reflect the one from the merge client as
this makes the assertions in the unit tests easier.

Change-Id: I459cbe5001d068a10e9851104e0d7a0e213a07cc
2021-11-29 15:26:09 -08:00
Simon Westphahl 982aebe4c8 Store branch cache minimum ltimes in layout state
In order to check whether the project branch cache of a connection is
up-to-date we need store the cache ltimes as part of the layout state.

Change-Id: I3450dfd584d5b2c21f1c7171337054c137eed12d
2021-11-05 08:41:28 +01:00
Simon Westphahl ba1d165f3b Ensure same layout UUID across schedulers
Until now the layout UUID was randomly generated on each scheduler.
However, since we now use the layout UUID to detect if we need to reset
the pipeline state, the UUID must be the same on all schedulers.

Having the same layout UUID on all schedulers might also make debugging
easier in some cases.

For this we will store the layout UUID as the UUID of the layout
state in Zookeeper.

Change-Id: I400b96b54b3dc9e30fb58e56e1e25a79427b5e70
2021-10-29 12:04:44 +02:00
Felix Edel 8038f9f75c Execute merge jobs via ZooKeeper
This is the second part of I767c0b4c5473b2948487c3ae5bbc612c25a2a24a.
It uses the MergerAPI.

Note: since we no longer have a central gearman server where we can
record all of the merge jobs, some tests now consult the merger api
to get the list of merge jobs which were submitted by that scheduler.
This should generally be equivalent, but something to keep in mind
as we add multiple schedulers.

Change-Id: I1c694bcdc967283f1b1a4821df7700d93240690a
2021-08-06 15:40:41 -07:00
Tristan Cacqueray d567fae882 Fix support for multiple github connection
This change enables using multiple github connection by fixing this
exception:

  File "zuul/executor/server.py", line 1228, in _execute
    repo_state[task.connection_name][task.project_name]
  KeyError: 'org/project'

Change-Id: I678bd8984233ced580e637aeea6d74ad7d799a08
2021-08-05 22:19:48 +00:00
Simon Westphahl ce0e348eb6 Don't clear connection caches during full reconfig
Clearing of connection caches doesn't seem to be needed in practice and
is removed as it can't be easily coordinated between multiple
schedulers.

Change-Id: I663fa1f1752d0dd599af06a7275a14235faaff2f
2021-07-09 09:34:40 +02:00
Simon Westphahl 32f3d5fe35 Store tenant layout state in Zookeeper
To coordinate reconfigurations across schedulers we need to store the
current layout state in Zookeeper. We store the hostname of the
scheduler that processed the reconfiguration event as well as the human
readable timestamp of the reconfig.

The ltime of the layout state is the last modified transaction ID of the
corresponding znode in Zookeeper. It will later be used to detect layout
changes on other schedulers.

Change-Id: I243aa4f54b038fbad78e331fbeeb508989e34b5f
2021-07-08 13:16:38 -07:00
Paul Belanger 94ee11ec18 Add skipped / neutral statuses to the github driver
Github also supported these 2 settings in check runs. In the case of a
PR being dequeued it maybe be nicer in the UI to use 'skipped' as Github
will render this as 'grey'.

https://docs.github.com/en/rest/reference/checks#update-a-check-run

Change-Id: I9313a4f96655054fd2ba1b4f899c1959283f159e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2021-06-25 14:48:16 -04:00
James E. Blair be50a6ca42 Freeze job variables at start of build
Freze Zuul job variables when starting a build so that jinja
templates can not be used to expose secrets.  The values will be
frozen by running a playbook with set_fact, and that playbook
will run without access to secrets.  After the playbook
completes, the frozen variables are read from and then removed
from the fact cache.  They are then supplied as normal inventory
variables for any trusted playbooks or playbooks with secrets.

The regular un-frozen variables are used for all other untrusted
playbooks.

Extra-vars are now only used to establish precedence among all
Zuul job variables.  They are no longer passed to Ansible with
the "-e" command line option, as that level of precedence could
also be used to obtain secrets.

Much of this work is accomplished by "squashing" all of the Zuul
job, host, group, and extra variables into a flat structure for
each host in the inventory.  This means that much of the variable
precedence is now handled by Zuul, which then gives Ansible
variables as host vars.  The actual inventory files will be much
more verbose now, since each host will have a copy of every "all"
value.  But this allows the freezing process to be much simpler.

When writing the inventory for the setup playbook, we now use the
!unsafe YAML tag which is understood by Ansible to indicate that
it should not perform jinja templating on variables.  This may
help to avoid any mischief with templated variables since they
have not yet been frozen.

Also, be more strict about what characters are allowed in ansible
variable names.  We already checked job variables, but we didn't
verify that secret names/aliases met the ansible variable
requirements.  A check is added for that (and a unit test that
relied on the erroneous behavior is updated).

Story: 2008664
Story: 2008682
Change-Id: I04d8b822fda6628e87a4a57dc368f20d84ae5ea9
2021-06-24 06:24:23 -07:00
Tobias Henkel fc11cf1334 Fix repo state restore / Keep jobgraphs frozen
This is two changes squashed.  First:

Fix missing repo state restore

The global repo state handling misses the restoration of the repo
states of projects that are not part of the dependency chain. This can
be generically fixed by ensuring that the repo state is restored
immediately after clone into the job workspace.

Original Change-Id: I61db67edb3952cdba7709b5b597dac93be4b6dde

Second:

Keep jobgraphs frozen across reconfiguration

This removes test cases which are no longer be relevant.

Many of these were testing various mutations across reconfigurations,
but with job graphs frozen, about the only thing that we expect to
change now is when a pipeline, project, or tenant is deleted.  Test
cases are modified or added to test these.

It appears even the current code may have some bugs related to deleting
pipelines and tenants.  The improved testing in this change highlighted
that.  The scheduler is updated to ensure that it cancels all jobs on
pipelines or tenants that are removed from a running configuration.  This
should ensure we don't leak nodes or semaphores.

Change-Id: I2e4bd2fb9222b49cb10661d28d4c52a3c994ba62
Co-Authored-By: James E. Blair <jim@acmegating.com>
2021-04-21 14:53:54 -07:00
Simon Westphahl 5c620ddc9d Fix bug w/ None event in Github event forwarder
Change Ie54fc16488ab8cbc15f97d003f36c12b8a648ed4 introduced a bug in the
Github event forwarder thread that is triggered whenever the
GithubEventProcessor doesn't produce an event.

In case the event was None the GithubEventForwarder stopped instead of
continuing with the next future in the queue.

Change-Id: Ibd76c1a787f5b8bd186bcb271bc526fe5e5ddade
2021-03-23 11:11:46 +01:00
Tobias Henkel 808db0aec9 Report max 50 file comments to github
The github api limits us to report 50 file comments per api
request. Since the check run reporting is in the critical path don't
do paging but limit this to 50 comments. In order to get a diverse
feedback sort the messages by 'uniqueness' first so that 100 equal
comments don't push out 5 other comments that are valuable.

Change-Id: If5a7c246a58b510d8fcfa31bd71d74c84acbc278
2021-03-15 15:49:24 -07:00
Felix Edel 2dfb34a818 Initialize ZooKeeper connection in server rather than in cmd classes
Currently, the ZooKeeper connection is initialized directly in the cmd
classes like zuul.cmd.scheduler or zuul.cmd.merger and then passed to
the server instance.

Although this makes it easy to reuse a single ZooKeeper connection for
multiple components in the tests it's not very realistic.
A better approach would be to initialize the connection directly in the
server classes so that each component has its own connection to
ZooKeeper.

Those classes already get all necessary parameters, so we could get rid
of the additional "zk_client" parameter.

Furthermore it would allow us to use a dedicated ZooKeeper connection
for each component in the tests which is more realistic than sharing a
single connection between all components.

Change-Id: I12260d43be0897321cf47ef0c722ccd74599d43d
2021-03-08 07:15:32 -08:00
Zuul 8673a6ca6f Merge "Save superfluous api requests in check run reporting" 2021-03-01 17:55:26 +00:00
James E. Blair 554a162001 Use ZooKeeperClient.fromConfig in tests
To make the tests more like running the apps, use fromConfig in the
tests rather than directly creating a client from a string connection
description.  Note that since fromConfig requires TLS and the tests
don't use it, a temporary argument has been added to that method to
allow non-tls connections; it's only used within the test suite.

In future changes, we can configure the tests to use TLS connections,
remove the argument, and then move client instantiation into the
server classes themselves (which will remove additional differences
between test and production startup code).

Change-Id: Ic45c0e60c464ed8eeb44cb32bab039403f73ec69
2021-02-22 09:29:53 -08:00
Jan Kubovy cca699ae39 Mandatory Zookeeper connection for ZuulWeb in tests
This is needed for change: I532dfb5af56a5d3074a808c4cf4a6854285636e8

Change-Id: Ice363c53b53ed90aec2f61849a4fea0f490b06a7
2021-02-15 14:44:18 +01:00
Pierre-Louis Bonicoli 8be15d9aad
gitlab: handle protected branches
Depends-On: https://review.opendev.org/#/c/756725/
Change-Id: Ib31ee55b28145aa9b22f11e976d76bf3ad3df04a
2021-01-30 13:35:53 +01:00
Tobias Henkel ba6d86ada2
Save superfluous api requests in check run reporting
When reporting the check run result zuul is currently doing four
github api requests:
1. Get repository
2. Get head commit of pr
3. Get check runs of head commit
4. (optional) Create initial version of check run in case start
   reporting is disabled
5. Update check run that has been created in start reporting

The first three requests are basically unneeded if we craft a direct
PATCH request and remember the check run id that has been created in
the start reporting.

This is needed since those run in the critical path of the event
processing and can cause large event processing delays in a very busy
zuul system.

Change-Id: I55df1cc28279bb6923e51686dde8809421486c6a
2020-11-04 08:52:28 +01:00
Zuul 846b6d0f0d Merge "Restore correct reason reporting of merge failures" 2020-11-02 19:35:50 +00:00
Jan Kubovy 8e333e65a9
Separate connection registries in tests
Each scheduler in tests needs its own connection registries.

This change only affects tests.

Change-Id: I2ad188cf5a72c46f7486c23ab653ff574123308b
Story: 2007192
2020-10-13 07:00:09 +02:00
Tobias Henkel 42f58abf02
Restore correct reason reporting of merge failures
The optimization of merges in GitHub [1] has a side effect that the
merge failure reason always is 'Method not allowed'. This has been
unnoticed because the test framework did not distinguish between
status message and response body. GitHub reports such failures with
405 Method not allowed and a detailed reason in the response body.

[1] I48dbb5fc1b7e7dec4cb09c779b9661d82935d9df

Change-Id: Ic2e47ee14c7f97defba0905db68ef8706108c081
2020-10-05 14:46:52 +02:00
Tobias Henkel b2f6d48cc5
Handle review requirements in canMerge
Since GitHub 2.21 we now can query the reviewDecision flag on a PR
which tells us if a review is required and if it's approved or not
[1]. This can be leveraged in the canMerge check so we now finally can
accurately check if a change is allowed to enter the gate.

[1] https://developer.github.com/enterprise/2.21/v4/object/pullrequest/

Change-Id: I792a28b9f3c7d40ac21e22438bd7c09d3174cbb2
2020-09-03 19:08:20 +02:00
Tobias Henkel 688ecf6233
Move reports from FakeGithubConnection to github data
This is needed to handle comment pull as direct post request in the
followup. Further the reports are better located in the github data
since this needs to be global data.

Further this will be needed as preparation for the HA scheduler as
well since there will be more than one connection and the test data
needs to be centralized then.

Change-Id: I3fa96d43f0ffe62d50630856b23972221bf2c69a
2020-08-31 12:12:24 +02:00
Benjamin Schanzel 1ed3992809 GitHub Reporter: Fix User Email in Merge Commit Message
Re-introduce change https://review.opendev.org/#/c/738590/

There was a bug in fetching a GitHub user while loading pull requests.
`getUser` was given a project object instead of a project name string,
so no installation id was found for GitHub app authentication. That lead
to unauthenticated requests to the user API which resulted in HTTP 401
errors.

This change fixes this by passing the correct `project.name` to
`getUser` and also refactors the parameter names of the involved
methods from `project` to `project_name` to be more precise there.

This reverts commit 56f355ab72.

Change-Id: Ib25f0ef5d27115626792890d0ec282ed0d562485
2020-08-26 13:57:51 +02:00
Zuul 0e5e797869 Merge "github: use the same status url for commit status as checks" 2020-08-13 14:26:19 +00:00