Commit Graph

38 Commits

Author SHA1 Message Date
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
Simon Westphahl c963526560
Add Zuul event id to merge completed events
Return the Zuul event ID that is already part of the merge request with
the merge result event so logs can be correlated.

Change-Id: I018709cd4d4afa562e6851d0d52c1ddd7583dc62
2023-08-08 12:02:36 +02:00
Simon Westphahl f1e3d67608
Trace merge requests and merger operations
The span info for the different merger operations is stored on the
request and will be returned to the scheduler via the result event.

This also adds the request UUID to the "refstat" job so that we can
attach that as a span attribute.

Change-Id: Ib6ac7b5e7032d168f53fe32e28358bd0b87df435
2022-09-19 11:25:49 +02:00
James E. Blair 458ba317fd Add pipeline-based merge op metrics
So that operators can see in aggregate how long merge, files-changes,
and repo-state merge operations take in certain pipelines, add
metrics for the merge operations themselves (these exclude the
overhead of pipeline processing and job dispatching).

Change-Id: I8a707b8453c7c9559d22c627292741972c47c7d7
2022-07-12 10:25:59 -07:00
James E. Blair 61cb275480 Report which repo failed initial merge ops
When the initial merge job for a queue item fails, users typically
see a message saying "this project or one of dependencies failed
to merge".  To help users and/or administrators more quickly identify
the problem, include connection project and change information in
a warning message posted to the code review system.

Change-Id: If1bced80b87b908f63867083efb306ebe02ed1ee
2022-02-20 13:06:39 -08:00
James E. Blair 66008900a8 Send synthetic merge completed events on cleanup
When a merger crashes, the scheduler identifies merge jobs which
were left in an incomplete state and cleans them up.  However there
may be queue items waiting for merge complete events, and nothing
generates those in this case.

Update the merge job cleanup procedure to mimic the executor job
cleanup procedure which, in addition to deleting the incomplete job
requests, also creates synthetic complete events in order to prompt
the scheduler to resume processing.

Change-Id: Idea384f636a0cd9e8c82ee92d3f5a65bef0889f2
2021-09-20 10:37:39 -07:00
James E. Blair 97a76de403 Fix race involving job request locks
It's possible for the following sequence to occur (prefixed by
thread ids):

2> process job request cache update

1> finish job
1> set job request state to complete
1> unlock job request
1> delete job request
1> delete job request lock

2> get cached list of running jobs for lostRequests, start examining job
2> check if the job is unlocked (this will re-create the lock dir and return true)
2> attempt to set job request state to complete (this will raise JobRequestNotFound)
2> bail

This leaves a lock node laying around.  We have a cleanup process that
will eventually remove it in production, but it's existence can cause
the clean-state checks at the end of unit tests to fail.

To correct this:

a) Try to avoid re-creating the lock (though this is not possible in all cases)
b) If we encounter JobRequestNotFound error in the cleanup, attempt to
   delete the job nonetheless (so that we re-delete the lock dir)

The remove method is also made entirely idemptotent to support this.

Change-Id: I49ad5c38a3c6cbaf0962e805b6c228e36b97a3d2
2021-09-14 09:10:34 -07:00
Simon Westphahl 5e78afd6f9 Fix wrong call to unlock requests in merger client
Change-Id: Ic519132f211dc3613023e2bc2bd8f11b29c9ac42
2021-09-06 07:15:14 +02:00
James E. Blair 6a0b5c419c Several merger cleanups
This change contains several merger-related cleanups which seem
distinct but are intertwined.

* Ensure that the merger API space in ZK is empty at the end of all
  tests.  This assures us that we aren't leaking anything.
* Move some ZK untility functions into the base test class.
* Remove the extra zk_client created in the component registry test
  since we can use the normal ZK client.
* The null result value in the merger server is initialized earlier to
  make sure that it is initalized for use in the exception handler.
* The test_branch_delete_full_reconfiguration leaked a result node
  because one of the cat jobs fails, and later cat jobs are run but
  ignored.

To address the last point, we need to make a change to the cat job
handling.  Currently, when a cat job fails, the exception bubbles up
and we simply ignore all the remaining jobs.  The mergers will run
them, write results to ZK, but no one will see those results.  That
would be fine, except that we created a "waiter" node in ZK to
indicate we want to see those results, and as long as it exists, the
results won't be deleted by the garbage collecter, yet we are no
longer waiting for them, so we won't delete them either.

To correct that, we store the merge job request path on the job
future.  Then, when the first cat job fails, we "cancel" all the cat
jobs.  That entails deleting the merge job request if we are able (to
save the mergers from having to do useless work), and regardless of
whether that succeeds, we delete the waiter node in ZK.  If a cat job
happens to be running (and if there's more than one, like in this test
case, it likely is), it will eventually complete and write its result
data.  But since we have removed the waiter node, the periodic cleanup
task will detect it as leaked data and delete.

Change-Id: I49a459debf5a6c032adc60b66bbd8f6a5901bebe
2021-08-19 15:01:49 -07:00
James E. Blair a729d6c6e8 Refactor Merger/Executor API
The Merger and executor APIs have a lot in common, but they behave
slightly differently.  A merger needs to sometimes return results.
An executor needs to have separate queues for zones and be able to
pause or cancel jobs.

This refactors them both into a common class which can handle job
state changes (like pause/cancel) and return results if requested.

The MergerApi can subclass this fairly trivially.

The ExecutorApi adds an intermediate layer which uses a
DefaultKeyDict to maintain a distinct queue for every zone and then
transparently dispatches method calls to the queue object for
that zone.

The ZK paths for both are significantly altered in this change.

Change-Id: I3adedcc4ea293e43070ba6ef0fe29e7889a0b502
2021-08-06 15:40:46 -07:00
Felix Edel 8038f9f75c Execute merge jobs via ZooKeeper
This is the second part of I767c0b4c5473b2948487c3ae5bbc612c25a2a24a.
It uses the MergerAPI.

Note: since we no longer have a central gearman server where we can
record all of the merge jobs, some tests now consult the merger api
to get the list of merge jobs which were submitted by that scheduler.
This should generally be equivalent, but something to keep in mind
as we add multiple schedulers.

Change-Id: I1c694bcdc967283f1b1a4821df7700d93240690a
2021-08-06 15:40:41 -07:00
James E. Blair 04ac8287b6 Match tag items against containing branches
To try to approach a more intuitive behavior for jobs which apply
to tags but are defined in-repo (or even for centrally defined
jobs which should behave differently on tags from different branches),
look up which branches contain the commit referenced by a tag and
use that list in branch matchers.

If a tag item is enqueued, we look up the branches which contain
the commit referenced by the tag.  If any of those branches match a
branch matcher, the matcher is considered to have matched.

This means that if a release job is defined on multiple branches,
the branch variant from each branch the tagged commit is on will be
used.

A typical case is for a tagged commit to appear in exactly one branch.
In that case, the most intuitive behavior (the version of the job
defined on that branch) occurs.

A less typical but perfectly reasonable case is that there are two
identical branches (ie, stable has just branched from master but not
diverged).  In this case, if an identical commit is merged to both
branches, then both variants of a release job will run.  However, it's
likely that these variants are identical anyway, so the result is
apparently the same as the previous case.  However if the variants
are defined centrally, then they may differ while the branch contents
are the same, causing unexpected behavior when both variants are
applied.

If two branches have diverged, it will not be possible for the same
commit to be added to both branches, so in that case, only one of
the variants will apply.  However, tags can be created retroactively,
so that even if a branch has diverged, if a commit in the history of
both branches is tagged, then both variants will apply, possibly
producing unexpected behavior.

Considering that the current behavior is to apply all variants of
jobs on tags all the time, the partial reduction of scope in the most
typical circumstances is probably a useful change.

Change-Id: I5734ed8aeab90c1754e27dc792d39690f16ac70c
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>
2020-03-06 13:29:18 -08:00
Tobias Henkel 5f423346aa
Filter out unprotected branches from builds if excluded
When working with GitHub Enterprise the recommended working model is
branch&pull within the same repo. This is especially necessary for
workflows that combine multiple repos in a single workspace. This has
the side effect that those repos can contain a large number of
branches that never will be part of a job. Having many branches in a
repo can have a large impact on the executor performance so exclude
them from the repo state if we exclude them in the tenant config. This
change only affects branches, not tags or other references.

Change-Id: Ic8e75fa8bf76d2e5a0b1779fa3538ee9a5c43411
2019-06-25 20:49:54 +02:00
Tobias Henkel 7639053905
Annotate merger logs with event id
If we have an event we should submit its id also to the merger so
we're able to trace merge operations via an event id.

Change-Id: I12b3ab0dcb3ec1d146803006e0ef644e485a7afe
2019-05-17 06:11:04 +02:00
Tobias Henkel e69c9fe97b
Make git clone timeout configurable
When dealing with large repos or slow connections to the scm the
default clone timeout of 5 minutes may not be sufficient. Thus a
configurable clone/fetch timeout can make it possible to handle those
repos.

Change-Id: I0711895806b7cbcc8b9fa3ba085bcf79d7fb6665
2019-01-31 11:17:05 +01:00
Zuul 91e7e680a1 Merge "Use gearman client keepalive" 2019-01-28 20:09:30 +00:00
Tobias Henkel 8bfc0cd409
Delay Github fileschanges workaround to pipeline processing
Github pull requests files API only returns at max the first 300
changed files of a PR in alphabetical order. Change
I10a593e26ac85b8c12ca9c82051cad809382f50a introduced a workaround that
queries the file list from the mergers within the github event
loop. While this was a minimal invasive approach this can cause
multi-minute delays in the github event queue.

This can be fixed by making this query asynchronous and delaying it to
the pipeline processing. This query is now handled the same way as
merge requests.

Change-Id: I9c77b35f0da4d892efc420370c04bcc7070c7676
Depends-On: https://review.openstack.org/625596
2018-12-18 13:30:14 +01:00
Tobias Henkel fb4c6402a4
Use gearman client keepalive
If the gearman server vanishes (e.g. due to a VM crash) some clients
like the merger may not notice that it is gone. They just wait forever
for data to be received on an inactive connection. In our case the VM
containing the zuul-scheduler crashed and after the restart of the
scheduler all mergers were waiting for data on the stale connection
which blocked a successful scheduler restart.  Using tcp keepalive we
can detect that situation and let broken inactive connections be
killed by the kernel.

Depends-On: I8589cd45450245a25539c051355b38d16ee9f4b9
Change-Id: I30049d59d873d64f3b69c5587c775827e3545854
2018-12-11 21:28:59 +01:00
Fabien Boucher 194a2bf237 Git driver
This patch improves the existing git driver by adding
a refs watcher thread. This refs watcher looks at
refs added, deleted, updated and trigger a ref-updated
event.

When a refs is updated and that the related commits
from oldrev to newrev include a change on .zuul.yaml/zuul.yaml
or zuul.d/*.yaml then tenants including that ref is reconfigured.

Furthermore the patch includes a triggering model. Events are
sent to the scheduler so jobs can be attached to a pipeline for
running jobs.

Change-Id: I529660cb20d011f36814abe64f837945dd3f1f33
2017-12-15 14:32:40 +01:00
James E. Blair 3b5b335ca2 Abort reconfiguration when cat jobs fail
Currently, if a cat job fails during reconfiguration, we simply
proceed without that section of the config, which usually doesn't
work out well.  Instead, raise an exception which will abort the
reconfiguration.

Change-Id: I87f2d870f007e3df5f47c04ef49add27c8a0b554
2017-09-12 09:40:06 -06:00
James E. Blair 289f5930fa Ensure ref-updated jobs run with their ref
We were incorrectly preparing the current state of the repo for
ref updated (eg, post) jobs.  This ensures that we run with the
actual supplied ref, even if the remote has moved on since then.

Change-Id: I52f05406246e6e39805fd8365412f3cb77fe3a0a
2017-08-02 16:56:18 -07:00
Tobias Henkel 34ee088603 Remove zuul_url from merger config
Currently the zuul_url is not used anywhere but still a required
merger setting. This removes it.

Change-Id: I627c8a18015f4c148c28d2f7e735b30cc1ef3862
2017-07-31 22:28:35 +02:00
Tristan Cacqueray 829e617bac Add support for zuul.d configuration split
This change implements the zuul_split spec to support configuration split in
a zuul.d directory.

Change-Id: I6bc7250b2045b73dfba109aa0b2f1ba5d66752b2
2017-07-10 05:13:42 +00:00
Tristan Cacqueray 91601d788e config: refactor config get default
This change adds a new get_default library procedure to simplify getting
default value of config object.

Change-Id: I0546b1175b259472a10690273af611ef4bad5a99
2017-06-17 02:00:50 +00:00
Paul Belanger 0a21f0a1d5
Add ssl support to gearman / gearman_server
Enable SSL support for gearman. We also created an new SSLZuulBaseTest
class to provide a simple way to use SSL end to end where possible. A
future patch will enable support in zookeeper.

Change-Id: Ia8b89bab475d758cc6a021988f8d79ead8836a9d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-06-14 10:10:45 -04:00
James E. Blair 1960d687c9 Use previously stored repo state on executor
When the initial speculative merge for a change is performed at
the request of the pipeline manager, the repo state used to
construct that merge is saved in a data structure.  Pass that
structure to the executor when running jobs so that, after cloning
each repo into the jobdir, the repos are made to appear the same
as those an the merger before it started its merge.  The subsequent
merge operatons on the executor will repeat the same operations
producing the same content (though the actual commits will be
different due to timestamps).

It would be more efficient to have the executors pull changes from
the mergers, however, that would require the mergers to run an
accessible git service, which is one of the things that adds
significant complexity to a zuul deployment.  This method only
requires that the mergers be able to initiate outgoing connections
to gearman and sources.

Because the initial merge may happen well before jobs are executed,
save the dependency chain for a given BuildSet when it's configuration
is being finalized.  This will cause us to save not only the repository
configuration that the merger uses, but also the exact sequence of
changes applied on top of that state.  (Currently, we build the series
of changes we apply before running each job, however, the queue state
can change (especially if items are merged) in the period between the
inital merge and job launch).

The initial merge is performed before we have a shadow layout for the
item, yet, we must specify a merge mode for each project for which we
merge a change.  Currently, we are defaulting to the 'merge-resolve'
merge mode for every project during the initial speculative merge, but
then the secondary merge on the executor will use the correct merge
mode since we have a layout at that point.  With this change, where
we are trying to replicate the initial merge exactly, we can't rely
on that behavior any more.  Instead, when attempting to find the merge
mode to use for a project, we use the shadow layout of the nearest
item ahead, or else the current live layout, to find the merge mode,
and only if those fail, do we use the default.  This means that a change
to a project's merge-mode will not use that merge mode.  However,
subsequent changes will.  This seems to be the best we can do, short
of detecting this case and merging such changes twice.  This seems
rare enough that we don't need to do that.

The test_delayed_merge_conflict method is updated to essentially invert
the meaning of the test.  Since the old behavior was for the initial
merge check to be completely independent of the executor merge, this
test examined the case where the initial merge worked but between that
time and when the executor performed its merge, a conflicting change
landed.  That should no longer be possible since the executor merge
now uses the results of the initial merge.  We keep the test, but invert
its final assertion -- instead of checking for a merge conflict being
reported, we check that no merge conflict is reported.

Change-Id: I34cd58ec9775c1d151db02034c342bd971af036f
2017-05-24 14:19:14 -07:00
James E. Blair 34c7daaaa4 Store initial repo state in the merger
When we ask a merger to speculatively merge changes, record the
complete starting state of each repo (defined as all of the refs
other than Zuul refs) and return that at the completion of all
of the merges.

This will later be used so that when a pipeline manager asks a
merger to speculatively merge a change, the process can later
be repeated by the (potentially multiple) executors which will
end up running jobs for the change.  Between the time that the
merger runs and the jobs run, the underlying repos may have changed.
This ensures a consistent state throughout.

The facility which used saved zuul refs within the merger repo
to short-cut the merge sequence for an additional change added to
a previously completed merge sequence is removed, because in that
case, we would not be able to know the original repo state for the
earlier merge sequence.  This is slightly less efficient, however,
we are proposing removing zuul refs anyway due to the maintenance
burden they cause.

Change-Id: If0215d53c3b08877ded7276955a55fc5e617b244
2017-05-24 14:19:13 -07:00
Jenkins 9e958254bb Merge "Remove unused merger:update task" into feature/zuulv3 2017-05-20 17:21:35 +00:00
Clint Byrum e5c4afa94c Use gear Text interface
This makes the transition to python3 much smoother.

Change-Id: I9d8638dd98502bdd91cbe6caf3d94ce197f06c6f
Depends-On: If6bfc35d916cfb84d630af59f4fde4ccae5187d4
Depends-On: I93bfe33f898294f30a82c0a24a18a081f9752354
2017-05-19 06:39:15 -07:00
Jesse Keating ba2f93c5a2 Remove unused merger:update task
This task is no longer used and was the last thing that merger claimed
to do that executor did not. Now what merger does is a subset of
executor, thus merger can scale out to handle things that leave the
executor(s) free to focus on running jobs.

Change-Id: Ibc8638cf7c2109d9b32c27fb98fb84605f5d5ac0
Signed-off-by: Jesse Keating <omgjlk@us.ibm.com>
2017-05-17 10:59:29 -07:00
James E. Blair 2a53567014 Use connection to qualify projects in merger
Fully qualify projects in the merger with connection names.
This lets us drop the URL parameter (which always seemed
unecessary, as the merger can figure that out on its own given a
uniquely identified project).

On disk, use the canonical hostname, so that the checked out
versions of repositories include the canonical hostname, and so that
repos on mergers survive changes in connection names.

This simplifies both the API and the JSON data structure passed to
the merger.

The addProject method of the merger is flagged as an internal method
now, as all "public" API methods indirectly call it.

In the executor, after cloning and merging are completed, the 'origin'
remote is removed from the resulting repositories since it may not
be valid for use within a running job.

Change-Id: Idcc9808948b018a271b32492766a96876979d1fa
2017-04-27 15:47:45 -07:00
James E. Blair e47eb770dd Add some gearman related debugging
Make sure all clients are identified.
Log the port on which the gearman server is listening in tests.
Log the arguments for the launch job.

Change-Id: Ia99ea5272241799aa8dd089bdb99f6058838ddff
2017-02-06 10:11:14 -08:00
James E. Blair 8b1dc3fb22 Add dynamic reconfiguration
If a change alters .zuul.yaml in a repo that is permitted to use in-repo
configuration, create a shadow configuration layout specifically for that
and any following changes with the new configuration in place.

Such configuration changes extend only to altering jobs and job trees.
More substantial changes such as altering pipelines will be ignored.  This
only applies to "project" repos (ie, the repositories under test which may
incidentally have .zuul.yaml files) rather than "config" repos (repositories
specifically designed to hold Zuul configuration in zuul.yaml files).  This
is to avoid the situation where a user might propose a change to a config
repository (and Zuul would therefore run) that would perform actions that
the gatekeepers of that repository would not normally permit.

This change also corrects an issue with job inheritance in that the Job
instances attached to the project pipeline job trees (ie, those that
represent the job as invoked in the specific pipeline configuration for
a project) were inheriting attributes at configuration time rather than
when job trees are frozen when a change is enqueued.  This could mean that
they would inherit attributes from the wrong variant of a job.

Change-Id: If3cd47094e6c6914abf0ffaeca45997c132b8e32
2016-07-18 09:58:19 -07:00
James E. Blair 14abdf44c0 Load in-repo configuration
Change-Id: I225934407ce31f92a9b6df4bc282fbd5ec2968b3
2015-12-09 16:17:25 -08:00
James E. Blair b1afc8089f Improve merge client logging
When submitting a job to the mergers, log more information about
the job.  Specifically the UUID will now be included for easier
cross-correlation with completion events.

Change-Id: Id92ae0c73f725da23761c59c97f0d39d64e802a9
2015-03-10 11:01:36 -07:00
James E. Blair eb98aba7a2 Set gearman timeout to 300
In practice we are seeing that geard can occasionally get disrupted
and then temporarily backlogged enough that it exceeds the 30 second
timeout for submitting a job.  To make Zuul less fragile in this case,
increase the timeouts for any requests submitted to gearman.

Change-Id: I12741bb259c1a78fa2446d764318f84df34bac67
2014-12-12 11:00:10 -08:00
James E. Blair e9a8184fe0 Add precedence to merge jobs
When creating a merge job, give it the precedence of the associated
pipeline.

Change-Id: I96c6a942a08f603ae7cce442427ae171d7e76d78
2014-09-25 08:35:55 -07:00
James E. Blair 4076e2b432 Split the merger into a separate process
Connect it to Zuul via Gearman.  Any number of mergers may be
deployed.

Directly find the pipeline for a build when processing a result,
so that the procedure is roughly the same for build and merge
results.

The timer trigger currently requires the gerrit trigger also be
configured.  Make that explicit inside of the timer trigger so
that the scheduler API interaction with triggers is cleaner.

Change-Id: I69498813764753c97c426e42d17596c2ef1d87cf
2014-02-17 11:47:15 -08:00