Commit Graph

42 Commits

Author SHA1 Message Date
Simon Westphahl 464af2ad24
Fix bug with cached merge modes in TPC
The fix in I473ba605decb136cd527308a63f16a5e548697fb did not fully solve
the problem with the new Github default merge modes.

In case the branch cache already contains the new merge modes, but the
tenant project config (TPC) still only supports the old subset of modes,
dynamic layout creation will fail saying that the new default merge mode
is not supported.

To fix this we will supply a new parameter with valid merge modes from
the TPC when getting the project default branch instead of getting the
project merge modes directly from the branch cache. Based on the list of
valid modes the driver can then select the best default merge mode.

This change also updates the model API upgrade test v17 -> v18 to cover
this case.

Change-Id: Ibc5645d4725f0ec31cb7ab18d4500452d866166a
2023-11-16 09:37:08 +01:00
James E. Blair d4fac1a0e8 Register RE2 syntax errors as warnings
This adds a configuration warning (viewable in the web UI) for any
regular expressions found in in-repo configuration that can not
be compiled and executed with RE2.

Change-Id: I092b47e9b43e9548cafdcb65d5d21712fc6cc3af
2023-08-28 15:04:49 -07:00
James E. Blair 57a9c13197 Use the GitHub default branch as the default branch
This supplies a per-project default value for Zuul's default-branch
based on what the default branch is set to in GitHub.  This means
that if users omit the default-branch setting on a Zuul project
stanza, Zuul will automatically use the correct value.

If the value in GitHub is changed, an event is emitted which allows
us to automatically reconfigure the tenant.

This could be expanded to other drivers that support an indication
of which branch is default.

Change-Id: I660376ecb3f382785d3bf96459384cfafef200c9
2023-08-23 11:07:08 -07:00
Simon Westphahl 8d443f1ada
Fix exception on retry in source base class
ERROR zuul.Scheduler: Exception processing pipeline check in tenant foobar
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2149, in process_pipelines
    refreshed = self._process_pipeline(
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2241, in _process_pipeline
    self.process_pipeline_trigger_queue(tenant, pipeline)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2447, in process_pipeline_trigger_queue
    self._process_trigger_event(tenant, pipeline, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/scheduler.py", line 2480, in _process_trigger_event
    pipeline.manager.addChange(change, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 534, in addChange
    self.updateCommitDependencies(change, None, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/manager/__init__.py", line 864, in updateCommitDependencies
    dep = source.getChangeByURLWithRetry(match, event)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/source/__init__.py", line 112, in getChangeByURLWithRetry
    return dep
UnboundLocalError: local variable 'dep' referenced before assignment

Change-Id: I1c706e5e5d2d337ec84b8fc1ad5e900191f2362c
2023-02-14 15:08:48 +01:00
James E. Blair f82ef0882c Further avoid unnecessary change dependency updates
When adding a unit test for change I4fd6c0d4cf2839010ddf7105a7db12da06ef1074
I noticed that we were still querying the dependent change 4 times instead of
the expected 2.  This was due to an indentation error which caused all 3
query retry attempts to execute.

This change corrects that and adds a unit test that covers this as well as
the previous optimization.

Change-Id: I798d8d713b8303abcebc32d5f9ccad84bd4a28b0
2023-01-04 15:33:49 -08:00
James E. Blair 640059a67a Report a config error for unsupported merge mode
This updates the branch cache (and associated connection mixin)
to include information about supported project merge modes.  With
this, if a project on github has the "squash" merge mode disabled
and a Zuul user attempts to configure Zuul to use the "squash"
mode, then Zuul will report a configuration syntax error.

This change adds implementation support only to the github driver.
Other drivers may add support in the future.

For all other drivers, the branch cache mixin simply returns a value
indicating that all merge modes are supported, so there will be no
behavior change.

This is also the upgrade strategy: the branch cache uses a
defaultdict that reports all merge modes supported for any project
when it first loads the cache from ZK after an upgrade.

Change-Id: I3ed9a98dfc1ed63ac11025eb792c61c9a6414384
2022-11-11 09:53:28 -08:00
James E. Blair e2a472bc97 Change merge mode default based on driver
The default merge mode is 'merge-resolve' because it has been observed
that it more closely matches the behavior of jgit in Gerrit (or, at
least it did the last time we looked into this).  The other drivers
are unlikely to use jgit and more likely to use the default git
merge strategy.

This change allows the default to differ based on the driver, and
changes the default for all non-gerrit drivers to 'merge'.

The implementation anticipates that we may want to add more granularity
in the future, so the API accepts a project as an argument, and in
the future, drivers could provide a per-project default (which they
may obtain from the remote code review system).  That is not implemented
yet.

This adds some extra data to the /projects endpoint in the REST api.
It is currently not easy (and perhaps not possible) to determine what a
project's merge mode is through the api.  This change adds a metadata
field to the output which will show the resulting value computed from
all of the project stanzas.  The project stanzas themselves may have
null values for the merge modes now, so the web app now protects against
that.

Change-Id: I9ddb79988ca08aba4662cd82124bd91e49fd053c
2022-10-13 10:31:19 -07:00
James E. Blair c41fcbe483 Add support for GHE repository cache
Change-Id: Iec87857aa58f71875d780da3698047dae01120d7
2022-05-05 13:39:41 -07:00
James E. Blair e16fcc80f8 Add queue.dependencies-by-topic
This adds a pipeline queue setting to emulate the Gerrit behavior
of submitWholeTopic without needing to enable it site-wide in Gerrit.

Change-Id: Icb33a1e87d15229e6fb3aa1e4b1ad14a60623a29
2022-03-25 15:25:52 -07:00
James E. Blair df220cd4d6 Populate missing change cache entries
The drivers are expected to populate the change cache before
passing trigger events to the scheduler so that all the difficult
work is done outside the main loop.  Further, the cache cleanup
is designed to accomodate this so that events in-flight don't have
their change cache entries removed early.

However, at several points since moving the change cache into ZK,
programming errors have caused us to encounter enqueued changes
without entries in the cache.  This usually causes Zuul to abort
pipeline processing and is unrecoverable.

We should continue to address all incidences of those since they
represent Zuul not working as designed.  However, it would be nice
if Zuul was able to recover from this.

To that end, this change allows missing changes to be added to the
change cache.

That is primarily accomplished by adjusting the Source.getChange
method to accept a ChangeKey instead of an Event.  Events are only
available when the triggering event happens, whereas a ChangeKey
is available when loading the pipeline state.

A ChangeKey represents the minimal distinguishing characteristics
of a change, and so can be used in all cases.  Some drivers obtain
extra information from events, so we still pass it into the getChange
method if available, but it's entirely optional -- we should still
get a workable Change object whether or not it's supplied.

Ref (and derived: Branch, Tag) objects currently only store their
newrev attribute in the ChangeKey, however we need to be able to
create Ref objects with an oldrev as well.  Since the old and new
revs of a Ref are not inherent to the ref but rather the generating
event, we can't get that from the source system.  So we need to
extend the ChangeKey object to include that.  Adding an extra
attribute is troublesome since the ChangeKey is not a ZKObject and
therefore doesn't have access to the model api version.  However,
it's not too much of a stretch to say that the "revision" field
(which like all ChangeKey fileds is driver-dependent) should include
the old and new revs.  Therefore, in these cases the field is
upgraded in a backwards compatible way to include old and newrev
in the standard "old..new" git encoding format.  We also need to
support "None" since that is a valid value in Zuul.

So that we can continue to identify cache errors, any time we encounter
a change key that is not in the cache and we also don't have an
event object, we log an error.

Almost all of this commit is the refactor to accept change keys
instead of events in getChange.  The functional change to populate
the cache if it's missing basically consists of just removing
getChangeByKey and replacing it with getChange.  A test which deletes
the cache midway through is added.

Change-Id: I4252bea6430cd434dbfaacd583db584cc796dfaa
2022-02-17 13:14:23 -08:00
Clark Boylan ef9d0a6f31 Retry dependency update requests
It is possible for dependency updates requests to fail due to errors
with the source. Previously when this happened the change was ignored by
the pipeline and the user gets no feedback. Chances are high that
reporting back to the source will fail so we can't really notify of this
error.

Instead we retry the requests in the hope that the error is a one off
and we can continue to proceed with the originally requested job work.

Story: 2009687
Change-Id: Id010d8c6809b9f9c012b81992590e54bf5e7e1d8
2021-12-01 13:43:40 -08:00
Simon Westphahl 0b048295e4 Add source interface for getting the cache ltime
In order to save a list of ltimes for each connection we need a source
interface to get the current ltime of a project branch cache.

Change-Id: If01db0698024beeed813d2c9910651c757377865
2021-11-04 15:15:15 +01:00
Simon Westphahl 0e9cb51426 Refresh branch cache depending on min. ltime
Change-Id: I373296d2f3b3a4392c98e1226a5e150c48daa2e0
2021-11-04 15:15:15 +01:00
Simon Westphahl 88f84bc5d5 Reference change dependencies by key
In order to cache changes in Zookeeper we need to make change objects
JSON serializable. This means that we can no longer reference other
change objects directly. Instead we will use a cache key consisting of
the connection name and a connection specific cache key.

Those cache keys can be resolved by getting the source instance using
the connection name and then retrieving the change instance via the new
`getChangeByKey()` interface.

The pipeline manager provides a helper method for resolving a list of
cache keys. Cache keys that where resolved once are also cached by the
manager as long as the reference is needed by any change in the
pipeline. The cache will be cleaned up at the end of a run of the queue
processor.

Until we can run multiple schedulers the change cache in the pipeline
manager will avoid hitting Zookeeper every time we resolve a cache key.

Later on when we have the pipeline state in Zookeeper we probably want
to clear the change cache in the pipeline manager at the end of the
queue processor. This way we make sure the change is recent enough when
we start processing a pipeline.

Change-Id: I09845d65864edc0e54af5f24d3c7be8fe2f7a919
2021-09-08 17:01:21 +02:00
Simon Westphahl 22c379bf80 Add source interface for setting change attributes
Changes are mostly created and updated by the drivers. However, since
there are some change attributes that are also modified from other parts
of the code, we need to make sure to update the cache in Zookeeper in
those cases. For this we introduce `setChangeAttributes()` as an
additional `Source` interface.

Change-Id: Iab9bc4a6e40f254c1cbc4405e90cb5f03e3ecd56
2021-09-08 17:00:53 +02:00
Simon Westphahl 29592c9531 Allow refreshing volatile data in canMerge check
On GitHub we cannot reliably update all information that's needed for
doing a canmerge check using events. Namely completely missing events
on branch protection changes and ambiguous status events that might
match several changes due to its data model to have statuses on the
commit instead the pr. This was no problem in the past since this
information was only used during the enqueue phase which is directly
after the event preprocessing phase.

However with circular dependencies we re-do the can merge check just
before merging again and need to act on recent data. Therefore add an
allow_refresh flag that makes it possible to refresh the volatile
parts of the data we don't get events for. This is only used on GitHub
for now as the other drivers are either correctly updating their
states using events or didn't yet optimize to not do api calls within
the main loop yet (pagure).

Change-Id: I89ff158642fe32c5004ef62c2e25399110564252
2021-03-01 18:45:02 +00:00
Tobias Henkel e4a207e5c9
Annotate getChangeByUrl logs with event id
This causes e.g. web requests that should be trackable via the event
id in the logs.

Change-Id: Iade2558f2312aedca7480b4ea1d3df60735cfc90
2020-07-30 15:28:30 +02:00
Tobias Henkel 2e8f2b61ab
Annotate canMerge check with event id
This helps with debugging from logs in case something doesn't enter a
gate as expected.

Change-Id: Ia0c7e84812d479c455d72f8e4c367975ea0bd709
2019-07-12 12:34:57 +02:00
James E. Blair a48c9101c6 Cache branches in connections/sources
The current attempt to caches branches is ineffective -- we
query the list of branches during every tenant reconfiguration.

The list of branches for a project is really global information;
we might cache it on the Abide, however, drivers may need to filter
that list based on tenant configuration (eg, github protected
branches).  To accomodate that, just allow/expect the drivers to
perform their own caching of branches, and to generally keep
the list up to date (or at least invalidate their caches) by
observing branch create/delete events.

A full reconfiguration instructs the connections to clear their
caches so that we perform a full query.  That way, an operator
can correct from a situation where the cache is invalid.

Change-Id: I3bd0cda5875dd21368e384e3704a61ebb5dcedfa
2018-08-09 16:02:02 -07:00
Tobias Henkel 619e2fc904 Limit search scope of getChangesDependingOn to tenant
In GitHub with many app installations the getChangesDependingOn
currently iterates over all installations within the system and fires
up a search query. In larger deployments this can sum up to hundreds
of queries for a single parent-change-enqueued event. At least for
multi-tenant deployments this can be greatly improved when limiting
the scope just to the installations related to the tenant. With this
improvement in most tenants this can be accomplished with a handful of
requests then.

Change-Id: Ibfad750a685d2ec58f3e452bfe2098bbdc294e37
2018-05-24 00:15:51 +00:00
James E. Blair 0e4c791c7b Support cross-source dependencies
Additional tests and docs in later patches.

Change-Id: I3b86a1e3dd507fa5e584680fb6c86d35f9ff3e23
Story: 2001334
Task: 5885
2018-01-16 09:37:40 -08:00
Tobias Henkel eca4620efa Optionally limit github to protected branches
When using a branch and pull model on a shared repository there are
usually one or more protected branches which are gated and a dynamic
number of temporary personal/feature branches which are the source for
the pull requests. These temporary branches while ungated can
potentially include broken zuul config and therefore break the global
tenant wide configuration.

In order to deal with this model add support for excluding unprotected
branches. This can be configured on tenant level and overridden per
project.

Change-Id: I8a45fd41539a3c964a84142f04c1644585c0fdcf
2017-08-03 11:50:26 +02:00
Monty Taylor b934c1a052
Remove use of six library
It exists only for py2/py3 compat. We do not need it any more.

This will explicitly break Zuul v3 for python2, which is different than
simply ceasing to test it and no longer declaring we support it. Since
we're not testing it any longer, it's bound to degrade overtime without
us noticing, so hopefully a clean and explicit break will prevent people
from running under python2 and it working for a minute, then breaking
later.

Change-Id: Ia16bb399a2869ab37a183f3f2197275bb3acafee
2017-06-19 10:34:57 -05:00
James E. Blair aad3ae2fe1 Add driver-specific pipeline requirements
As we expand the Github driver, we're seeing a need to specify driver-specific
pipeline requirements.  To accomplish this, bump the require/reject pipeline
keywords down a level underneath connection names.  This lets users specify
per-source pipeline requirements.

This adds new API methods for sources to create the new pipeline filters
(by returning instances or subclasses of RefFilter, which used to be called
ChangeishFilter).

This change also creates and/or moves driver-specific subclasses of EventFilter
and TriggerEvent in(to) their respective drivers.

Change-Id: Ia56c254e3aa591a688103db5b04b3dddae7b2da4
2017-05-19 13:24:00 -07:00
James E. Blair 1c7744207c Add canonical hostname to source object
This is the start of the implementation of:
http://lists.openstack.org/pipermail/openstack-infra/2017-March/005208.html

It lets us associate a canonical hostname with every connection
that we will later use to uniquely identify source code repos.

Story: 2000953
Change-Id: I7f2e64944d46f304e63a54078e682fd5e1682f27
2017-04-06 13:45:17 -07:00
James E. Blair f43b53a67f Fix constructor arguments to source
Instatiations of sources all passed (driver, connection) to the
constructor, which was expecting (config, connection).  The
config option is currently ignored (sources currently have no
additional configuration), which is why we didn't notice the
discrepancy.

Update the constructor to (driver, connection, optional config)
to match the rest of the driver-related classes.

Change-Id: Ibc878b51b81950559d39b00b1591864c7661fe7c
2017-04-06 13:45:05 -07:00
James E. Blair e511d2f6c4 Reorganize connections into drivers
This change, while substantial, is mostly organizational.
Currently, connections, sources, triggers, and reporters are
discrete concepts, and yet are related by virtue of the fact that
the ConnectionRegistry is used to instantiate each of them.  The
method used to instantiate them is called "_getDriver", in
recognition that behind each "trigger", etc., which appears in
the config file, there is a class in the zuul.trigger hierarchy
implementing the driver for that trigger.  Connections also
specify a "driver" in the config file.

In this change, we redefine a "driver" as a single class that
organizes related connections, sources, triggers and reporters.

The connection, source, trigger, and reporter interfaces still
exist.  A driver class is responsible for indicating which of
those interfaces it supports and instantiating them when asked to
do so.

Zuul instantiates a single instance of each driver class it knows
about (currently hardcoded, but in the future, we will be able to
easily ask entrypoints for these).  That instance will be
retained for the life of the Zuul server process.

When Zuul is (re-)configured, it asks the driver instances to
create new connection, source, trigger, reporter instances as
necessary.  For instance, a user may specify a connection that
uses the "gerrit" driver, and the ConnectionRegistry would call
getConnection() on the Gerrit driver instance.

This is done for two reasons: first, it allows us to organize all
of the code related to interfacing with an external system
together.  All of the existing connection, source, trigger, and
reporter classes are moved as follows:

  zuul.connection.FOO -> zuul.driver.FOO.FOOconnection
  zuul.source.FOO -> zuul.driver.FOO.FOOsource
  zuul.trigger.FOO -> zuul.driver.FOO.FOOtrigger
  zuul.reporter.FOO -> zuul.driver.FOO.FOOreporter

For instance, all of the code related to interfacing with Gerrit
is now is zuul.driver.gerrit.

Second, the addition of a single, long-lived object associated
with each of these systems allows us to better support some types
of interfaces.  For instance, the Zuul trigger maintains a list
of events it is required to emit -- this list relates to a tenant
as a whole rather than individual pipelines or triggers.  The
timer trigger maintains a single scheduler instance for all
tenants, but must be able to add or remove cron jobs based on an
individual tenant being reconfigured.  The global driver instance
for each of these can be used to accomplish this.

As a result of using the driver interface to create new
connection, source, trigger and reporter instances, the
connection setup in ConnectionRegistry is much simpler, and can
easily be extended with entrypoints in the future.

The existing tests of connections, sources, triggers, and
reporters which only tested that they could be instantiated and
have names have been removed, as there are functional tests which
cover them.

Change-Id: Ib2f7297d81f7a003de48f799dc1b09e82d4894bc
2017-01-20 05:43:21 -08:00
Jenkins 86d2f726be Merge "Add getProjectBranches to Source" into feature/zuulv3 2016-11-09 18:59:03 +00:00
Paul Belanger 9bba490381 Re-enable test_can_merge unit test
Expose the ability to refresh a change in zuul/connection/gerrit.py
too, which is needed for our unit testing.

Change-Id: Iefd09d9b8deef563299e0f209d95e25b61aa4c1e
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2016-11-09 12:20:02 -05:00
James E. Blair 51b7492e95 Add getProjectBranches to Source
This lets us ask a source for all of the branches for a project.
This uses the git protocol for now, but this can get much nicer
in the future if we switch to using Gerrit's REST API.  It should
also be easy to do with github.

The included comment indicates why it's being added -- implementation
to follow in subsequent changes.

Change-Id: I0dfcd61f343a235dcf935aea434b9772d6e746d9
2016-10-04 15:01:31 -07:00
James E. Blair fef7894c1b Remove scheduler parameter from connection registry
The connection registry should not have to know about the scheduler,
rather, the inverse is true.

(NB, connections themselves still know about the scheduler, but
that's okay, that happens after the connection registry is created.)

Drivers should be able to access the global configuration when
being created, so store that when the connection registry configures
itself.

Change-Id: Iea4b8fe3888b5eefd3df9ce385225b885f2caa0b
2016-03-21 19:58:27 -07:00
James E. Blair 765e11b657 Move gerrit logic from source to connection
Sources and connections are very tightly coupled in reality.  Rather
than trying to maintain them as two abstractions, consider the
connection to hold all of the information and logic about the reality
of the external resource it represents.  Make sources mere local
data structures that are used to interface a connection with a pipeline.

As seen in subsequent changes, this will allow us to simplify the
interconnections between objects.

Change-Id: I2dd88e1165267e4f987a205ba55923eaec7ea9ce
2016-03-21 19:58:27 -07:00
Joshua Hesketh dc7820cf88 Merge branch 'master' into feature/zuulv3
Conflicts:
	zuul/model.py
	zuul/scheduler.py

Change-Id: I2973bfae65b3658549dc13aa3ea0efe60669ba8e
2016-03-11 13:24:00 +11:00
Joshua Hesketh 4bd7da32fa Cache is held and managed by connections
Add reconfigure test case. This test previously fails currently due to a
regression introduced with the connections changes.

Because multiple sources share a connection, a pipeline that does not hold
and therefore require any changes in the cache may clear a connections
cache before a pipeline that does need said change has an opportunity to
add it to the relevant list.

Allow connections to manage their cache directly rather than the source
doing it vicariously ignorant of other pipelines/sources. Collect the
relevant changes from all pipelines and ask any connections holding a
cache for that item to keep it on reconfiguration.

Co-Authored-By: James E. Blair <jeblair@linux.vnet.ibm.com>
Change-Id: I2bf8ba6b9deda58114db9e9b96985a2a0e2a69cb
2016-02-17 22:10:33 +11:00
Joshua Hesketh 89b67f617c Merge master into feature/zuulv3
Conflicts:
	zuul/connection/gerrit.py
	zuul/lib/connections.py
	zuul/model.py
	zuul/scheduler.py

Change-Id: If1c8ac3bf26bd8c4496ac130958b966d9937519e
2016-02-12 14:10:03 +11:00
Joshua Hesketh 811e2e9334 Fix regression in change tracking
Make sure we update the referenced change object on a new gerrit
event rather than waiting to remake the queue item.

This was a performance regression in the connection changes.

Change-Id: I2a967f0347352a7674deb550e34fb94d1d903e89
2015-12-21 15:52:48 +11:00
James E. Blair 8300578a2a Add job inheritance and start refactoring
This begins a lot of related changes refactoring config loading,
the data model, etc., which will continue in subsequent changes.

Change-Id: I2ca52a079a837555c1f668e29d5a2fe0a80c1af5
2015-12-15 15:56:45 -08:00
James E. Blair 93bdde8551 Remove stop method from reporters, sources, triggers
It still remains in the drivers that the connections utilize.

Change-Id: Ie6efb57af297fbd546eed3e1104299b2e1a5205e
2015-12-10 11:25:21 -08:00
James E. Blair 59fdbac119 Add tenants
Change-Id: Ia6c21152c00c9380c17c559290ed98ff22cf767b
2015-12-08 16:38:09 -08:00
Joshua Hesketh 352264b3c2 Add support for 'connection' concept
This is a large refactor and as small as I could feasibly make it
while keeping the tests working. I'll do the documentation and
touch ups in the next commit to make digesting easier.

Change-Id: Iac5083996a183d1d8a9b6cb8f70836f7c39ee910
2015-12-06 14:48:32 +11:00
Joshua Hesketh ecdbd80247 Add base class for sources
and test the all sources adhere to the set contract.

Also standardise the source (triggers to come) class names
to NameSource.

This will make it easier to do more sources in the future and also
add the possibility of loading sources dynamically.

Co-Authored-By: Gregory Haynes <greg@greghaynes.net>

Change-Id: I15b32013904f60873601dd7cc8fce3c158787de4
2015-12-06 14:48:31 +11:00
Joshua Hesketh 850ccb6022 Refactor sources out of triggers
This is to further differentiate between sources and triggers.
Eventually allowing for multiple triggers per pipeline.

Still to come is separating connections from everything.

Change-Id: I1d680dbed5f650165643842af450f16b32ec5ed9
2015-12-06 14:48:31 +11:00