Commit Graph

202 Commits

Author SHA1 Message Date
James E. Blair 239fe205ec Gerrit: skip ref-updated /meta events
Approximately 40% of all Gerrit events that OpenDev processes are
ref-updated events for refs/changes/.../meta refs.  This is likely
due to the increased use of notedb for storing data in Gerrit.  Since
Zuul users are not likely to need to trigger off of ref-updates to
the meta ref, let's avoid enqueing them into Zuul's event queue.
This will reduce ZK traffic.

Change-Id: I724f5b20790d1ad32e72b1ce642355c2257026c1
2024-04-18 15:02:13 -07:00
Joshua Watt fa431cf4f8 Fix missing label evaluation in Gerrit
Fixes the way that missing labels are handled in Gerrit. The intention
is that labels provided by Zuul are removed from the set of missing
labels on the change (and thus ignored). The original code was using the
">" set comparison operator to do this, but this operator is actually
"issuperset()". This means that if there was any disjoint members in the
allow_needs set (that is allow_needs had labels that were not missing),
the comparison would be False, and any actual missing labels would be
ignored.

The fix is to use set difference to calculate the missing labels and
remove the allow_needs set. If any labels are left after this, they are
actually missing and the change cannot be merged

Change-Id: Ibdb5df44e80d75198493f8287443ed19bcf269f1
2024-04-11 11:33:12 -06:00
James E. Blair ced306d5b1 Update gerrit changes more atomically
The following problem was observed:

Change A depends-on change B, which is in turn the tip of a
patch series of several changes.

Drivers warm the change cache on events by querying information
about changes related to those events.  But they don't process
depends-on headers, which means most drivers only warm one change,
and while the gerrit driver will follow other types of dependency
links which are unique to it, it stops at depends-on boundaries.

So in the example above, the only change in the cach which was warm
was change A.

The triggering event was enqueued, forwarded, and processed by
two responding pipelines simultaneously on two executors.

Each of them noticed the depends-on link and started querying gerrit
for change B and its dependencies.  One of the schedulers was about
1 second ahead of the other in this process.

In the gerrit driver, there is a two phase process for updating
changes.  First the change itself is updated in the normal way
common to all drivers, and then gerrit-specific dependency links
are updated.  That means the change is added to the change cache
with no dependencies, then mutated to add dependencies later.

The first scheduler added change B to the cache with no dependencies.
The second scheduler saw the update and refreshed its copy of B.
The second scheduler begin updating B, saw that the ltime of its
copy of B was sufficiently new it didn't need to update the cache
and stopped updating.
The second scheduler enqueued changes A and B, but no others in its
pipeline.
The first scheduler finished querying the stack of changes ending at
B, added them to the change cache, and mutated the entry for B in the
cache.
The first scheduler enqueued A, B, and the rest of the stack in its
pipeline.
The second scheduler updated its copy of B to include the new
dependencies.
The second scheduler ran a pipeline processor, noticed that B lacked
dependencies, and dequeued A and B, and reported an error to the user.

The good news is that Zuul noticed the mistake and dequeued the
changes.

To correct this, we will now collect all of the information about a
change and its gerrit-specific dependencies before writing any of
that information to the change cache.  This means that in our example
above, the second scheduler would not have aborted its queries.
Eventually, both schedulers would end up with the same information
before enqueing anything.

This process is still not quite optimal, in that we will have multiple
schedulers racing to update the same changes at the same time, but
they are designed to deal with collisions like that, so it should
at least be correct.

A future area of work might be to investigate whether we can optimize
this case further.

Change-Id: I647c2b54a55789e521fca71c8c3814907df65da6
2024-02-22 06:37:31 -08:00
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
James E. Blair 164b1784c6 Add gerrit hashtags support
This adds support for the hashtags-changed trigger event as well
as using hashtags as pipeline and trigger requirements.

Change-Id: I1f6628d7c227d12355f651c3c822b06e2d5c5562
2023-12-07 07:07:14 -08:00
James E. Blair ef88c15405 Assign Gerrit change events to a patchset
We receive some events from Gerrit (such as the currently-unhandled
hashtags-changed or currently-partially-handled topic-changed) events
without a patchset, only a change number. This makes sense from a
data model perspective, but is not useful in Zuul since we must have
a patchset to actually enqueue something.

Currently we will handle these events by querying for the change and
storing the results in the change cache under the key (change, None).
Note that the change itself will have the current patchset assigned
as a result of the query.  But this "upgrade" of information applies
only to the change itself, not the change key, so this up-to-date
change object will never be used for anything productive in Zuul.

In order to actually trigger off of these events, let's "upgrade" the
change key as well after performing the query.  To do that, we will
use a new change key with patchset information when storing the change
object in the cache if our initial change key lacked patchset info but
the resulting change has it.  But only in the specific case where we
are performing our first query after receiving an event.  We will also
update the event with the same patchset information.  This should
mean that after receiving an event and performing the initial query,
we should be guaranteed to have patchset information about the change
and therefore Zuul should never see a (change, None) tuple for a
change key any more.

* (Change keys have more information than that tuple, but those are
   the relevant parts for this change.)

Change-Id: I6f077376044ffbbd3853e2050c507f449da77962
2023-12-07 07:07:08 -08:00
Zuul bd11c4ff79 Merge "Add gcloud pubsub support to Gerrit driver" 2023-10-04 03:29:42 +00:00
James E. Blair 70c34607f5 Add support for limiting dependency processing
To protect Zuul servers from accidental DoS attacks in case someone,
say, uploads a 1k change tree to gerrit, add an option to limit the
dependency processing in the Gerrit driver and in Zuul itself (since
those are the two places where we recursively process deps).

Change-Id: I568bd80bbc75284a8e63c2e414c5ac940fc1429a
2023-09-07 11:01:29 -07:00
James E. Blair eee6ef3fcf Strip refs/heads from gerrit default branches
The HEAD Gerrit API endpoint returns '/refs/heads/master', not
'master' as the test fixture was constructed with.  Correct this.

Change-Id: I98b0759516bd50c0eddeb9245fc951c58e80ee45
2023-09-06 07:03:27 -07:00
James E. Blair 5c12ea68c6 Add default branch support to the Gerrit driver
This extends the previous change to include project default branch
support for the Gerrit driver as well as GitHub.

Change-Id: I2b1f6feed72277f5e61a2789d8af5276ee4c7b05
2023-08-23 11:07:09 -07:00
James E. Blair 36276806b8 Add gcloud pubsub support to Gerrit driver
This adds support for the Google Cloud Pub/Sub service to the
Gerrit driver.  It is very similar to Kafka.

Change-Id: Ib6e4dc01058b74e1042bfd1deb9fa4f3f43f7a36
2023-08-02 14:50:28 -07:00
James E. Blair e6e978615f Add AWS Kinesis support
Gerrit has an event plugin for AWS Kinesis (which looks sort of
like Kafka, but without server side checkpoints.  or ordering.).

Add support to the Gerrit driver for it for sites which would
rather use that than ssh.

Change-Id: I942845ac16bf220664499f14ff7c4086ff65de2a
2023-07-25 11:04:19 -07:00
James E. Blair 4dc0962a49 Add Kafka support to Gerrit
Gerrit supports a number of pub-sub plugins which can act as
alternatives to stream-events.  These can often be easier for
users to configure than ssh access and have the advantage of
providing queueing and delivery guarantees for messages.

This change not only adds support for Kafka, but is meant as
a template for adding support for other Gerrit pub-sub plugins
as well.

Change-Id: Ib03d8cb9ef709b625d2717a09125930548c86a22
2023-07-15 14:41:23 -07:00
James E. Blair a485ff5e67 Refactor Gerrit driver event sources
Gerrit supports a number of pub-sub plugins which can act as
alternatives to stream-events.  These can often be easier for
users to configure than ssh access and have the advantage of
providing queueing and delivery guarantees for messages.

Subsequent changes will add support for multiple pub-sub event
sources, so to make driver maintenance easier, this change
refactors the gerrit driver into its two current event sources:
SSH stream-events and the checks plugin.

The checks plugin is a bit of a special case in that we always
start the polling method for it, so event source selection isn't
quite as clean as for the other sources.  But it's still useful
to compartmentalize it as much as possible, so it is moved to
its own file and treated as similarly as possible.

The stream events listener behaves much more like the pub-sub
listeners will.

Change-Id: I28d0f1e37b87927c5f2dd5e9bdc162391ad66d07
2023-07-13 14:02:46 -07:00
Joshua Watt c015c1aa84 gerrit: Handle case where commit has no parents
If a commit has no parents (e.g. the first commit in a repository), the
HTTP request to gerrit to the revision files endpoint must not specify
the parent=1 query parameter. Otherwise, gerrit will return an HTTP 400
error with the message 'invalid parent number: 1'.

Explicitly check for this by examining if any parents are present for
the current revision before adding the query parameter

Change-Id: I3f93899dc79ebc716223ecc6f5adecd0f1ce9b3e
2023-04-17 11:40:36 -05:00
James E. Blair 7b08cb15d4 Check Gerrit submit requirements
With newer versions of Gerrit, we are increasingly likely to encounter
systems where the traditional label requirements are minimized in favor
of the new submit requirements rules.  If Gerrit is configured to use
a submit requirement instead of a traditional label blocking rule, that
is typically done by switching the label function to "NoBlock", which,
like the "NoOp" function, will still cause the label to appear in the
"submit_record" field, but with a value of "MAY" instead of "OK", "NEED",
or "REJECT".

Instead, the interesting information will be in the "submit_requirements"
field.  In this field we can see the individual submit requirement rules
and whether they are satisfied or not.

Since submit requirements do not have a 1:1 mapping with labels, determining
whether an "UNSATISFIED" submit requirement should be ignored (because it
pertains to a label that Zuul will alter, like "Verified") is not as
straightforward is it is for submit records.  To be conservative, this
change looks for any of the "allow needs" labels (typically "Verified") in
each unsatisfied submit record and if it finds one, it ignores that record.

With this change in place, we can avoid enqueing changes which we are certain
can not be merged into gate pipelines, and will continue to enqueue changes
about which we are uncertain.

Change-Id: I667181565684d97c1d036e2db6193dc606c76c57
2023-03-28 16:19:50 -07:00
Dong Zhang effee1fdd0 Ignore fetch-ref-replicated gerrit event
This event appears in our new gerrit setup, and I think we should ignore it.

Change-Id: Iad424678be2e2f4fc1809862acceca5b9240a3cf
2023-02-28 12:09:41 +01:00
Alex Hornung ffc03cfcc1 gerrit driver: fix bug around unicode branch names
If a branch name contains unicode characters that are more than 1-byte
wide, the size in bytes of the pack record won't match the size in
characters, and the pack parsing will be incorrect.

Instead, treat everything as an encoded byte string until parsing is
done - and only decode when handling a single, parsed, record.

Change-Id: I7f1a0cc96a36129fbc04c7a8687da3f66c1eef99
2023-01-24 17:08:53 +00:00
Zuul 4ea740f75e Merge "Trace Gerrit connection events" 2022-10-04 03:34:17 +00:00
Simon Westphahl 35b6016df3
Trace Gerrit connection events
Creates the first span when adding a connection event to the queue that
is then linked to the span for the event pre-processing.

The pre-processing span is implicitly propagated via the trigger event.

Change-Id: Id6790695d6c440c5576db75b859a25ef089f1d46
2022-09-30 09:50:37 +02:00
James E. Blair ccb00d6827 Log more info on gerrit 403 errors
If Gerrit returns a 403 on submit, log the text we get in reply to
help diagnose the problem.

Change-Id: I8c9b286bb63ba1703a6a8f3cd6cd9a4b86e62cf2
2022-09-06 15:04:09 -07:00
Zuul 7491e081bd Merge "Reduce redundant Gerrit queries" 2022-08-19 21:33:58 +00:00
James E. Blair e6530d11d0 Reduce redundant Gerrit queries
Sometimes Gerrit events may arrive in batches (for example, an
automated process modifies several related changes nearly
simultaneously). Because of our inbuilt delay (10 seconds by
default), it's possible that in these cases, many or all of the
updates represented by these events will have settled on the Gerrit
server before we even start processing the first event.  In these
cases, we don't need to query the same changes multiple times.

Take for example a stack of 10 changes.  Someone approves all 10
simultaneously.  That would produce (at least) 10 events for Zuul
to process.  Each event would cause Zuul to query all 10 changes in
the series (since they are related).  That's 100 change queries
(and each change query requires 2 or 3 HTTP queries).

But if we know that all the event arrived before our first set of
change queries, we can reduce that set of 100 queries to 10 by
suppressing any queries after the first.

This change generates a logical timestamp (ltime) immediately
before querying Gerrit for a change, and stores that ltime in the
change cache.  Whenever an event arrives for processing with an
ltime later than the query ltime, we assume the change is already
up to date with that event and do not perform another query.

Change-Id: Ib1b9245cc84ab3f5a0624697f4e3fc73bc8e03bd
2022-08-19 10:08:57 -07:00
James E. Blair fbb97bd6b6 Add whitespace around keywords
This fixes pep8 E275 which wants whitespace after assert and del.

Change-Id: I1f8659f462aa91c3fdf8f7eb8b939b67c0ce9f55
2022-08-02 08:02:30 -07:00
Zuul cbe40d994f Merge "Improve gerrit connection event queue logging" 2022-06-30 19:46:46 +00:00
Zuul 9438b47e1e Merge "Fix merging with submitWholeTopic" 2022-06-30 06:45:30 +00:00
James E. Blair 39aded4517 Fix merging with submitWholeTopic
Previously support for Gerrit's submitWholeTopic feature was added
so that when it is enabled, changes that are submitted together are
treated as circular dependencies in Zuul.  However, this feature did
not work in a gating pipeline because when that setting is enabled,
Gerrit requires all changes to be mergable at once so that it can
attempt to atomically merge all of them.  That means that every
change would need a Verified+2 vote before any change can be
submitted.  Zuul leaves the vote and submits each change one at a
time.

(Note, this does not affect the emulated submitWholeTopic feature
in Zuul, since in that case, Gerrit will merge each change alone.)

To correct this, a two-phase option is added to reporters.  In phase1,
reporters will report all information about the change but not submit.
In phase2, they will only submit.  In the cases where we are about
to submit a successful bundle, we enable the two-phase option and
report the entire bundle without submitting first, then proceed to
submit each change in the bundle in sequence as normal.  In Gerrit,
if submitWholeTopic is enabled, this will cause all changes to be
submitted as soon as the first one is, but that's okay because we
handle the case where we try to submit a change and it is already
submitted.

The fake Gerrit used in the Zuul unit tests is updated to match
the real Gerrit in these cases.  If submitWholeTopic is enabled,
it will return a 409 to submit requests if the whole topic is not
able to be submitted.

One unit test of failed bundle reporting is adjusted since we will
now report the buildset result to all changes before potentially
reporting a second time if the bundle later fails to merge.
While this does mean that some changes will have extra report
messages, it actually makes the behavior consistent (before, some
changes would have 2 reports and some would have only 1; now all
changes will have 2 reports: the expected result and then a second
report of the unexpected merge failure).

All reporters are updated to handle the two-phase reporting.  Since
all reporters have different API methods to leave comments and merge
changes, this won't actually cause any extra API calls even for
reporters which don't need two-phase reporting.

Non-merging reporters (MQTT, SMTP, etc) simply ignore phase2.

Change-Id: Ibf377ab5b7141fe60ecfd5cbbb296bb4f9c24265
2022-06-29 15:33:13 -07:00
James E. Blair 366cc2a64a Improve gerrit connection event queue logging
It can be difficult to understand what happens between the time
a Gerrit event is received over SSH and it is added to the trigger
event queue.  To address that:

* Generate the zuul event id earlier so that we can match the event
  received from gerrit to the event pulled off the connection event
  queue to build the trigger event (the uuid is currently generated
  at the time the trigger event is created).
* Log when the event is handled by the event connector (and how long
  ago it was received).
* Use an annotated logger when logging the connection event submission.
* Log the start and stop of the event connector election.
* Indicate how long the connection event queue is when we start
  processing it.

Change-Id: I2394290cd1612c47525c034342b28dae6cdf71cb
2022-06-23 09:56:05 -07:00
James E. Blair 595fb3e9ef Add ssh_server option to Gerrit driver
In a multi-host Gerrit environment (HA or failover) it's plausible
that admins may use one mechanism for managing ingress for HTTP
requests and a different for SSH requests.  Or admins may have
different firewall rules for each. To accomodate these situations,
Add an "ssh_server" configuration item for Gerrit.  This makes the
set of hostname-like items the following:

* server: the HTTP hostname and default for all others
* canonical_hostname: what to use for golang-style git paths
* ssh_server: the hostname to use for SSH connections
* baseurl: the base URL for HTTP connections

The following are equivalent:
  server=review.example.com
  ssh_server=ssh-review.example.com
and:
  server=ssh-review.example.com
  baseurl=https://review.example.com

Change-Id: I6e9cd9f48c1a78d8d24bfe176efbb932a18ec83c
2022-06-22 14:10:41 +00:00
James E. Blair 38770a7867 Fix recursive Gerrit change query
We have a history memo list which we use to avoid infinite recursion
in the Gerrit change query, however instead of mutating it, we copy
it.  This avoids infinite recursion in some circumstances, but at the
cost of exponential repitition.  It may also allow infinite recursion
depending on the order of results in Gerrit query responses.

To correct this, mutate the memo list instead of duplicating it.

Change-Id: Ic892cf5aa2db2f5e11f8562f857d9ec0a6b60c0a
2022-03-28 15:08:45 -07:00
James E. Blair 81d84e7c15 Support submitWholeTopic in Gerrit
This adds a query for changes which are set (by Gerrit) to be submitted
together due to submitWholeTopic being enabled.  In this case, changes
which are of the same topic will be merged simultaneously by Gerrit,
therefore Zuul should treat them as a set of circular dependencies.

The behavior is automatic, since Gerrit will return a set of changes
if the setting is enabled, or the empty list if it is not.  Zuul still
requires that circular dependencies be enabled in any queues where they
appear.

If users have submitWholeTopic enabled in Gerrit but do not have
allow-circular-dependencies enabled, they may begin to see errors.
However, the errors are self-explanatory, and installations such as
these are already not testing what Gerrit will merge, so reporting
the discrepancy is an improvement.

Change-Id: Icf65913a049a9b9ddbedd20cc73bf44ffcc831b8
2022-03-21 12:53:55 -07:00
Zuul 3ddb9713a4 Merge "Clone from /a/ with authenticated Gerrit HTTP" 2022-02-24 21:10:31 +00:00
Zuul 77d9a0ef8d Merge "Add ssh connection timeout" 2022-02-23 19:10:00 +00:00
James E. Blair df220cd4d6 Populate missing change cache entries
The drivers are expected to populate the change cache before
passing trigger events to the scheduler so that all the difficult
work is done outside the main loop.  Further, the cache cleanup
is designed to accomodate this so that events in-flight don't have
their change cache entries removed early.

However, at several points since moving the change cache into ZK,
programming errors have caused us to encounter enqueued changes
without entries in the cache.  This usually causes Zuul to abort
pipeline processing and is unrecoverable.

We should continue to address all incidences of those since they
represent Zuul not working as designed.  However, it would be nice
if Zuul was able to recover from this.

To that end, this change allows missing changes to be added to the
change cache.

That is primarily accomplished by adjusting the Source.getChange
method to accept a ChangeKey instead of an Event.  Events are only
available when the triggering event happens, whereas a ChangeKey
is available when loading the pipeline state.

A ChangeKey represents the minimal distinguishing characteristics
of a change, and so can be used in all cases.  Some drivers obtain
extra information from events, so we still pass it into the getChange
method if available, but it's entirely optional -- we should still
get a workable Change object whether or not it's supplied.

Ref (and derived: Branch, Tag) objects currently only store their
newrev attribute in the ChangeKey, however we need to be able to
create Ref objects with an oldrev as well.  Since the old and new
revs of a Ref are not inherent to the ref but rather the generating
event, we can't get that from the source system.  So we need to
extend the ChangeKey object to include that.  Adding an extra
attribute is troublesome since the ChangeKey is not a ZKObject and
therefore doesn't have access to the model api version.  However,
it's not too much of a stretch to say that the "revision" field
(which like all ChangeKey fileds is driver-dependent) should include
the old and new revs.  Therefore, in these cases the field is
upgraded in a backwards compatible way to include old and newrev
in the standard "old..new" git encoding format.  We also need to
support "None" since that is a valid value in Zuul.

So that we can continue to identify cache errors, any time we encounter
a change key that is not in the cache and we also don't have an
event object, we log an error.

Almost all of this commit is the refactor to accept change keys
instead of events in getChange.  The functional change to populate
the cache if it's missing basically consists of just removing
getChangeByKey and replacing it with getChange.  A test which deletes
the cache midway through is added.

Change-Id: I4252bea6430cd434dbfaacd583db584cc796dfaa
2022-02-17 13:14:23 -08:00
James E. Blair 32bdb7300e Clone from /a/ with authenticated Gerrit HTTP
When using authenticated HTTP with Gerrit (instead of SSH) for
cloning repos, use the "/a/" URL prefix.

Gerrit supports cloning with or without the prefix going back to
at least 2.16.  However, if users install the login-redirect plugin,
it disallows HTTP access except via "/a/".

Further, the download-commands actually do recommend using /a/ in
clone URLs.

Switching to this should help users with login-redirect installed
without any loss of functionality otherwise.

Change-Id: I9b65d0b9a0f235afe076a34a9e878f1e8fefe1a0
2022-02-05 10:23:05 -08:00
James E. Blair 00fdf4542b Add ssh connection timeout
In case a network problem causes the socket connection to establish
slowly, set a socket timeout of 30 seconds (matching the HTTP request
timeout).

Change-Id: Idd2e29d019357963faa25f834289cf6f318aa720
2022-02-01 09:07:41 -08:00
Zuul f648f21304 Merge "Add a model API version" 2022-01-27 22:46:49 +00:00
James E. Blair 29fbee7375 Add a model API version
This is a framework for making upgrades to the ZooKeeper data model
in a manner that can support a rolling Zuul system upgrade.

Change-Id: Iff09c95878420e19234908c2a937e9444832a6ec
2022-01-27 12:19:11 -08:00
Simon Westphahl 3d62dc862d Refresh cached branches in timer driver
The cache maintenance has an inherent data race as it is only
considering changes as relevent that are currently in any pipeline. To
prevent garbage collection of changes for in-flight events, we only
clean up items older than 2h.

Usually the driver will refresh a change when receiving a connection
event. However, this wasn't the case for trigger events created by the
timer driver.

This can lead to a race condition where a cached branch is cleaned up
while a timer triggered item is enqueued.

For consistency all non-change objects (Branch, Tag, Ref) will now be
refreshed in case the refresh flag of `getChange()` is set to True.

2022-01-24 11:31:50,815 ERROR zuul.Scheduler: Exception processing pipeline periodic-xy in tenant foobar
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.8/site-packages/zuul/scheduler.py", line 1786, in process_pipelines
    pipeline.state.refresh(ctx)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 153, in refresh
    self._load(context)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 205, in _load
    self._set(**self.deserialize(data, context))
  File "/opt/zuul/lib/python3.8/site-packages/zuul/model.py", line 690, in deserialize
    queue = ChangeQueue.fromZK(context, queue_path,
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 148, in fromZK
    obj._load(context, path=path)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 205, in _load
    self._set(**self.deserialize(data, context))
  File "/opt/zuul/lib/python3.8/site-packages/zuul/model.py", line 944, in deserialize
    item = QueueItem.fromZK(context, item_path,
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 148, in fromZK
    obj._load(context, path=path)
  File "/opt/zuul/lib/python3.8/site-packages/zuul/zk/zkobject.py", line 205, in _load
    self._set(**self.deserialize(data, context))
  File "/opt/zuul/lib/python3.8/site-packages/zuul/model.py", line 4093, in deserialize
    change = self.pipeline.manager.resolveChangeReferences(
  File "/opt/zuul/lib/python3.8/site-packages/zuul/manager/__init__.py", line 199, in resolveChangeReferences
    return self.resolveChangeKeys(
  File "/opt/zuul/lib/python3.8/site-packages/zuul/manager/__init__.py", line 211, in resolveChangeKeys
    self._change_cache[change.cache_key] = change
AttributeError: 'NoneType' object has no attribute 'cache_key'
2022-01-24 11:31:50,811 ERROR zuul.Pipeline.foobar.periodic-xy: Unable to resolve change from key <ChangeKey github org/project Branch master None hash=da4e62f669e51a7fbef5db1a9b480b0bd42693a7febffdb47a4eb794faa300a9>

Change-Id: I62f5b816780e244e1426ab8a8871f09379293f3e
2022-01-27 11:47:54 +01:00
Zuul d27c1c7a93 Merge "gerrit driver - skip ignored events before they hit zookeeper" 2022-01-13 15:54:32 +00:00
Kenny Ho 35522a2053 Add git_over_ssh option for Gerrit connection
This option allows Zuul to continue to use ssh for Git operations even
when HTTP Password is set for the Gerrit connection.  This enable REST
API usage by Zuul even when the Gerrit server requires SSH for Git
operations.

Change-Id: Ie16eac048a54b2a698397f47b232d31177c54e07
2022-01-06 17:44:28 -05:00
Fabien Boucher bfe3bdc8f5 gerrit driver - skip ignored events before they hit zookeeper
Change-Id: Idc3d0c2a1c4ceb752065d961ac84e480e3bf8415
2021-12-21 11:40:43 +00:00
James E. Blair fb3d3f7471 Add an init phase to scheduler/web startup
This adds a new component state: INITIALIZING and the scheduler and
web components use it when they are creating their initial config.

It is safe for the scheduler to start processing tenant and pipeline
events as soon as it starts because it only processes those for
the tenants that it has already loaded.

However it is not safe for drivers to move events from their
incoming queue into the scheduler since that requires the full
tenant list.  The 4 drivers to which this applies are updated
to wait on config priming.

Zuul-web is already structured to wait until config priming
so does not need a corresponding change.

Change-Id: I36dd4927e583328434e66553aa3ff0cd7469b488
2021-11-16 13:06:32 -08:00
Felix Edel 2c900c2c4a Split up registerScheduler() and onLoad() methods
This is an early preparation step for removing the RPC calls between
zuul-web and the scheduler.

In order to do so we must initialize the ConfigLoader in zuul-web which
requires all connections to be available. Therefore, this change ensures
that we can load all connections in zuul-web without providing a
scheduler instance.

To avoid unnecessary traffic from a zuul-web instance the onLoad()
method initializes the change cache only if a scheduler instance is
available on the connection.

Change-Id: I3c1d2995e81e17763ae3454076ab2f5ce87ab1fc
2021-11-09 09:17:43 +01:00
James E. Blair a836575c60 Store connection branch cache in ZK
This will allow us to sync the branch cache to other participants
via ZK.

Change-Id: I75b2436008e7bc44e086abe680d8b98cf73102f8
2021-11-03 17:15:47 -07:00
James E. Blair 2f416b6f1e Use CachedBranchConnection in Gerrit driver
This class is shared by the github and gitlab drivers.  It looks
like we need to move the per-connection branch cache into ZK, so
let's align all the drivers to use this so we can do it in one
place.

Change-Id: I5b42390718a987d3d6bc8423ea12f35f82aca700
2021-11-03 13:31:45 -07:00
Ian Wienand 618d98fdc2 gerrit: trim messages to "human length"
The "message" field of a gerrit response (i.e. the bit that has "Zuul
encountered a syntax error while parsing its configuration ...") has a
limit in Gerrit of "change.commentSizeLimit" which is by default 16KiB
[1].

If you have a syntax error in a very long section, say the middle of
the project: section of OpenDev system-config, you can actually
overflow this length.  Gerrit rejects the update and as an end-user
you don't get any clue why.

This truncates the message in the gerrit driver if it's going to
overflow.

[1] Note change.robotCommentSizeLimit is larger, but that's not how
syntax messages are reported.

Change-Id: I2f14e734ef5f9f203b14556c7d2c8ed1ad052319
2021-10-14 10:09:50 +11:00
Ian Wienand 4a52c8053a gerrit: handle POST 400 errors and don't retry
A 400 response from Gerrit indicates that we sent something bad, or
that something failed validation, etc.  It is not going to disappear
with retries.  Thus this catches 400 separately on POST requests and
stops any retry loops.

The response also returns some text telling you what is wrong; log
that in the exception (it is logged at the zuul.GerritConnection.io
logger, but that is usually set high to avoid too much log traffic).

Change-Id: I6a470bf338cf8b9944760c89d5742c79318c7cb8
2021-10-13 15:16:04 +11:00
James E. Blair 659ba07f63 Include project name in gerrit branch cache
The structured cache data change updated the keys for the gerrit
branch cache, but it omitted the project from non-change items.

Gerrit change items omit the project because when we fetch gerrit
changes from the cache, we often only know the number.  However,
all other types of items (branches, tags, etc) are project specific,
and so their cache keys must contain the branch name.  Without it,
we may fetch the wrong change from the cache for timer pipelines.

The other drivers all require the project name for every change
type, including changes (since PR numbers are not unique), so they
were not affected by this copypasta error.

Change-Id: I399330cdf9880072580d3e70d97b63cb74d95a23
2021-09-28 17:21:32 -07:00
James E. Blair 29d0534696 Never externally delete change cache entries
The change cache depends on having singleton objects for entries.
If a scheduler ever ends up with 2 objects for the same change, the
cache will refuse to update the cache with new data for the object
which is not in the cache.  However, there is a simple series of
events which could lead to this:

1) Event from source populates cache with a change.
2) Change is enqueued into pipeline.
3) Event from source triggers a data refresh of same change.
4) Data refresh fails.
5) Exception handler for data refresh deletes change from cache.

Imagine that the pipeline processor is now attempting to refresh the
change to determine whether it has merged.  At this point, updates
to the cache will fail with this error:

2021-09-28 14:25:23,057 ERROR zuul.Scheduler: Exception in pipeline processing:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/zuul/scheduler.py", line 1615, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1418, in processQueue
    item_changed, nnfi = self._processOneItem(
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1356, in _processOneItem
    self.reportItem(item)
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1612, in reportItem
    merged = source.isMerged(item.change, item.change.branch)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritsource.py", line 47, in isMerged
    return self.connection.isMerged(change, head)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritconnection.py", line 1013, in isMerged
    self._change_cache.updateChangeWithRetry(key, change, _update_change)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 330, in updateChangeWithRetry
    self.set(key, change, version)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 302, in set
    if self._change_cache[key._hash] is not change:
KeyError: 'ef075359268c2f3ee7d52ccbcb6ac51a3a5922c709e634fdb2efcf97c57095b2'

The process may continue:

6) Event from source triggers a data refresh of same change.
7) Refresh succeeds and cache is popuplated with new change object.

Then the pipeline will fail with this error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/zuul/scheduler.py", line 1615, in _process_pipeline
    while not self._stopped and pipeline.manager.processQueue():
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1418, in processQueue
    item_changed, nnfi = self._processOneItem(
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1356, in _processOneItem
    self.reportItem(item)
  File "/usr/local/lib/python3.8/site-packages/zuul/manager/__init__.py", line 1612, in reportItem
    merged = source.isMerged(item.change, item.change.branch)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritsource.py", line 47, in isMerged
    return self.connection.isMerged(change, head)
  File "/usr/local/lib/python3.8/site-packages/zuul/driver/gerrit/gerritconnection.py", line 1013, in isMerged
    self._change_cache.updateChangeWithRetry(key, change, _update_change)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 330, in updateChangeWithRetry
    self.set(key, change, version)
  File "/usr/local/lib/python3.8/site-packages/zuul/zk/change_cache.py", line 303, in set
    raise RuntimeError(
RuntimeError: Conflicting change objects (existing <Change 0x7f1405c188e0 starlingx/nfv 810014,2> vs. new <Change 0x7f148446c370 starlingx/nfv 810014,2> for key '{"connection_name": "gerrit", "project_name": null, "change_type": "GerritChange", "stable_id": "810014", "revision": "2"}'

To avoid this, we should never remove a change from the cache unless it is
completely unused (that is, we should only remove changes from the cache via
the prune method).  Even if it means that the change is out of date, it is
still the best information that we have, and a future event may succeed and
eventually update the change.

This removes the exception handling which deleted the change from all drivers.

Change-Id: Idbecdf717b517cce5c25975a40d9f42d57a26c9e
2021-09-28 10:21:12 -07:00