Commit Graph

78 Commits

Author SHA1 Message Date
James E. Blair 1f026bd49c Finish circular dependency refactor
This change completes the circular dependency refactor.

The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.

In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.

In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.

Previously changes were enqueued recursively and then
bundles were made out of the resulting items.  Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.

Some tests exercise situations where Zuul is processing
events for old patchsets of changes.  The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.

This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade.  A later change will implement
a helper command for this.

All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed.  In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).

Job deduplication is simplified and now only needs to
consider jobs within the same buildset.

The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid.  This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.

The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.

Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
2024-02-09 07:39:40 -08:00
Simon Westphahl c963526560
Add Zuul event id to merge completed events
Return the Zuul event ID that is already part of the merge request with
the merge result event so logs can be correlated.

Change-Id: I018709cd4d4afa562e6851d0d52c1ddd7583dc62
2023-08-08 12:02:36 +02:00
Simon Westphahl b17dfc13ed
Cleanup leaked git index.lock files on checkout
When the git command crashes or is aborted due to a timeout we might end
up with a leaked index.lock file in the affected repository.

This has the effect that all subsequent git operations that try to
create the lock will fail. Since Zuul maintains a separate lock for
serializing operations on a repositotry, we can be sure that the lock
file was leaked in a previous operation and can be removed safely.

Unable to checkout 8a87ff7cc0d0c73ac14217b653f9773a7cfce3a7
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 1045, in _mergeChange
    repo.checkout(ref, zuul_event_id=zuul_event_id)
  File "/opt/zuul/lib/python3.10/site-packages/zuul/merger/merger.py", line 561, in checkout
    repo.head.reset(working_tree=True)
  File "/opt/zuul/lib/python3.10/site-packages/git/refs/head.py", line 82, in reset
    self.repo.git.reset(mode, commit, '--', paths, **kwargs)
  File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 542, in <lambda>
    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)
  File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 1005, in _call_process
    return self.execute(call, **exec_kwargs)
  File "/opt/zuul/lib/python3.10/site-packages/git/cmd.py", line 822, in execute
    raise GitCommandError(command, status, stderr_value, stdout_value)
git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
  cmdline: git reset --hard HEAD --
  stderr: 'fatal: Unable to create '/var/lib/zuul/merger-git/github/foo/foo%2Fbar/.git/index.lock': File exists.
  Another git process seems to be running in this repository, e.g.
  an editor opened by 'git commit'. Please make sure all processes
  are terminated then try again. If it still fails, a git process
  may have crashed in this repository earlier:
  remove the file manually to continue.'

Change-Id: I97334383df476809c39e0d03b1af50cb59ee0cc7
2022-11-15 07:03:21 +01:00
James E. Blair 8a8502f661 Fix race in merger shutdown
We can disconnect from ZK while the merger is still running which
can have some adverse effects and cause tests to never exit.

This moves the zk disconnect in the merger to the join method so
that we ensure that we have exited the main loop.

It also adds some improved logging so that not everything just
says "Stopped".

Change-Id: I459af85ac70ecf1f61645466d0eddc63c7e61ff9
2022-11-08 15:12:22 -08:00
James E. Blair e68f2bfdb3 Don't trace merge jobs that we don't lock
We get a trace from every merger (including executors) for every
merge job because we start the trace before attempting the lock.
So essentially, we get one trace from the merger that runs the job,
and one trace from every other merger indicating that it did not
run the job.

This is perhaps too much detail for us.  While it's true that we
can see the response times of every system component here, it may
be sufficient to have only the response time of the first merger.
This will reduce the noise in trace visualizations significantly.

Change-Id: I88c56f00c060eae9316473f4a4e222a0db97e510
2022-10-05 11:16:18 -07:00
Simon Westphahl f1e3d67608
Trace merge requests and merger operations
The span info for the different merger operations is stored on the
request and will be returned to the scheduler via the result event.

This also adds the request UUID to the "refstat" job so that we can
attach that as a span attribute.

Change-Id: Ib6ac7b5e7032d168f53fe32e28358bd0b87df435
2022-09-19 11:25:49 +02:00
James E. Blair 458ba317fd Add pipeline-based merge op metrics
So that operators can see in aggregate how long merge, files-changes,
and repo-state merge operations take in certain pipelines, add
metrics for the merge operations themselves (these exclude the
overhead of pipeline processing and job dispatching).

Change-Id: I8a707b8453c7c9559d22c627292741972c47c7d7
2022-07-12 10:25:59 -07:00
James E. Blair 61cb275480 Report which repo failed initial merge ops
When the initial merge job for a queue item fails, users typically
see a message saying "this project or one of dependencies failed
to merge".  To help users and/or administrators more quickly identify
the problem, include connection project and change information in
a warning message posted to the code review system.

Change-Id: If1bced80b87b908f63867083efb306ebe02ed1ee
2022-02-20 13:06:39 -08:00
James E. Blair a160484a86 Add zuul-scheduler tenant-reconfigure
This is a new reconfiguration command which behaves like full-reconfigure
but only for a single tenant.  This can be useful after connection issues
with code hosting systems, or potentially with Zuul cache bugs.

Because this is the first command-socket command with an argument, some
command-socket infrastructure changes are necessary.  Additionally, this
includes some minor changes to make the services more consistent around
socket commands.

Change-Id: Ib695ab8e7ae54790a0a0e4ac04fdad96d60ee0c9
2022-02-08 14:14:17 -08:00
Clark Boylan 1d4a6e0b71 Add a merger graceful command
This command is an alias for merger stop as merger stop is already a
graceful stop. We add this command to make this more clear and
consistent with the executor.

Change-Id: Iffba56b0127575eaadf31753e2a64dfd95f12fa6
2022-02-07 09:39:44 -08:00
James E. Blair 704fef6cb9 Add readiness/liveness probes to prometheus server
To facilitate automation of rolling restarts, configure the prometheus
server to answer readiness and liveness probes.  We are 'live' if the
process is running, and we are 'ready' if our component state is
either running or paused (not initializing or stopped).

The prometheus_client library doesn't support this directly, so we need
to handle this ourselves.  We could create yet another HTTP server that
each component would need to start, or we could take advantage of the
fact that the prometheus_client is a standard WSGI service and just
wrap it in our own WSGI service that adds the extra endpoints needed.
Since that is far simpler and less resounce intensive, that is what
this change does.

The prometheus_client will actually return the metrics on any path
given to it.  In order to reduce the chances of an operator configuring
a liveness probe with a typo (eg '/healthy/ready') and getting the
metrics page served with a 200 response, we restrict the metrics to
only the '/metrics' URI which is what we specified in our documentation,
and also '/' which is very likely accidentally used by users.

Change-Id: I154ca4896b69fd52eda655209480a75c8d7dbac3
2021-12-09 07:37:29 -08:00
James E. Blair b7e2e49f7f Use sort_keys with json almost everywhere we write to ZK
For almost any data we write to ZK (except for long-standing nodepool
classes), add the sort_keys=True so that we can more easily determine
whether an update is required.

This is in service of zkobject, and is not strictly necessary because
the json module follows dict insertion order, and our serialize methods
are obviously internally consistent (at least, if they're going to produce
the same data, which is all we care about).  But that hasn't always been
true and might not be true in the future, so this is good future-proofing.

Based on a similar thought, the argument is also added to several places
which do not use zkobject but which do write to ZK, in case we perform
a similar check in the future.  This seems like a good habit to use
throughout the code base.

Change-Id: Idca67942c057ab0e6b629b50b9b3367ccc0e4ad7
2021-11-12 15:50:02 -08:00
Felix Edel 220534c0f7 Store version information in component registry
This stores the zuul version of each component in the component
registry and updates the API endpoint.

Change-Id:  I1855b2a6db2bd330343cad69d9d6cf21ea35a1f5
2021-10-20 17:17:02 +02:00
James E. Blair 6fcde31c9e Try harder to unlock failed build requests
An OpenDev executor lost the ZK connection while trying to start
a build, specifically at the stage of reading the params from ZK.
In this case, it was also unable to unlock the build request
after the initial exception was raised.  The ZK connection
was resumed without losing the lock, which means that the build
request stayed in running+locked, so the cleanup method leaves
it alone.  There is no recovery path from this situation.

To correct this, we will try indefinitely to unlock a build request
after we are no longer working on it.  Further, we will also try
indefinitely to report the result to Zuul.  There is still a narrow
race condition noted inline, but this change should be a substantial
improvement until we can address that.

Also, fix a race that could run merge jobs twice and break their result

There is a race condition in the merger run loop that allows a merge job
to be run twice whereby the second run breaks the result because the job
parameters where deleted during the first run.

This can occur because the merger run loop is operating on cached data.
It could be that a merge request is taken into account because it's
unlocked but was already completed in a previous run.

To avoid running the request a second time, the lock() method now
updates the local request object with the current data from ZooKeeper
and the merger checks the request's state again after locking it.

This change also fixes the executor run loop as this one is using the
same methods. Although we've never seen this issue there it might be
hidden by some other circumstances as the executor API differs in some
aspects from the merger API (e.g. dealing with node requests and node
locking, no synchronous results).

Change-Id: I167c0ceb757e50403532ece88a534c4412d11365
Co-Authored-By: Felix Edel <felix.edel@bmw.de>
2021-09-07 09:34:44 -07:00
James E. Blair 6a0b5c419c Several merger cleanups
This change contains several merger-related cleanups which seem
distinct but are intertwined.

* Ensure that the merger API space in ZK is empty at the end of all
  tests.  This assures us that we aren't leaking anything.
* Move some ZK untility functions into the base test class.
* Remove the extra zk_client created in the component registry test
  since we can use the normal ZK client.
* The null result value in the merger server is initialized earlier to
  make sure that it is initalized for use in the exception handler.
* The test_branch_delete_full_reconfiguration leaked a result node
  because one of the cat jobs fails, and later cat jobs are run but
  ignored.

To address the last point, we need to make a change to the cat job
handling.  Currently, when a cat job fails, the exception bubbles up
and we simply ignore all the remaining jobs.  The mergers will run
them, write results to ZK, but no one will see those results.  That
would be fine, except that we created a "waiter" node in ZK to
indicate we want to see those results, and as long as it exists, the
results won't be deleted by the garbage collecter, yet we are no
longer waiting for them, so we won't delete them either.

To correct that, we store the merge job request path on the job
future.  Then, when the first cat job fails, we "cancel" all the cat
jobs.  That entails deleting the merge job request if we are able (to
save the mergers from having to do useless work), and regardless of
whether that succeeds, we delete the waiter node in ZK.  If a cat job
happens to be running (and if there's more than one, like in this test
case, it likely is), it will eventually complete and write its result
data.  But since we have removed the waiter node, the periodic cleanup
task will detect it as leaked data and delete.

Change-Id: I49a459debf5a6c032adc60b66bbd8f6a5901bebe
2021-08-19 15:01:49 -07:00
James E. Blair 9fa3c6ec6e Send merge completed events even in case of error
The scheduler depends on merge completed events in order to advance
the lifecycle of a queue item.  Without them, items can be stuck in
the queue indefinitely.

In the case of certain merge errors, we may not have submitted a
result to the event queue.  This change corrects that.

Change-Id: I9527c79868ede31f1fa68faf93ff113ac786462b
2021-08-19 10:21:21 -07:00
James E. Blair 15b589c1e4 Merger related cleanup
* Include the merge request job uuid in the MergeCompletedEvent so
  so that it can be associated with the originating request.
* repr() the MergeCompletedEvent with interesting information so
  the logs are more useful.
* Remove some unused methods from the scheduler that are no longer
  needed since merge complete events are submitted directly from
  the merge server.

Change-Id: I94db0d1cecfdcdb3745151f66b11749cd9850955
2021-08-19 09:58:02 -07:00
James E. Blair e79493c519 Streamline unlocking in merger and builder run loops
To help make the lock/unlock cycle a little easier to follow,
keep the unlock call as close to the lock call as possible
in the merger and executor run loops.

Change-Id: Ia4b86d2d23cf0f5e7102714adcf1be6d28d89d47
2021-08-06 15:40:47 -07:00
James E. Blair a729d6c6e8 Refactor Merger/Executor API
The Merger and executor APIs have a lot in common, but they behave
slightly differently.  A merger needs to sometimes return results.
An executor needs to have separate queues for zones and be able to
pause or cancel jobs.

This refactors them both into a common class which can handle job
state changes (like pause/cancel) and return results if requested.

The MergerApi can subclass this fairly trivially.

The ExecutorApi adds an intermediate layer which uses a
DefaultKeyDict to maintain a distinct queue for every zone and then
transparently dispatches method calls to the queue object for
that zone.

The ZK paths for both are significantly altered in this change.

Change-Id: I3adedcc4ea293e43070ba6ef0fe29e7889a0b502
2021-08-06 15:40:46 -07:00
Felix Edel 8038f9f75c Execute merge jobs via ZooKeeper
This is the second part of I767c0b4c5473b2948487c3ae5bbc612c25a2a24a.
It uses the MergerAPI.

Note: since we no longer have a central gearman server where we can
record all of the merge jobs, some tests now consult the merger api
to get the list of merge jobs which were submitted by that scheduler.
This should generally be equivalent, but something to keep in mind
as we add multiple schedulers.

Change-Id: I1c694bcdc967283f1b1a4821df7700d93240690a
2021-08-06 15:40:41 -07:00
Simon Westphahl bd2aeec5eb Log result payload size of merger jobs
Change-Id: Ifb611c899edbc4978333a4da79248791816586cd
2021-07-21 08:42:08 +02:00
Felix Edel 040f403e7f Improve component registry
This improves the usage of the component registry in various ways:

1. It adds a tree cache to the registry. The cache is eventual
   consistent, which should be sufficient for most use cases like
   calculating stats in the scheduler and getting a list of components
   without the need to ask ZooKeeper every time for the list of
   components.

2. Components can now be used as classes rather than dictionaries, which
   makes using and updating them much easier and nicer.

3. Components can be used without a registry. This makes registering
   components easier and you only need to instantiate a registry when
   you need the registry itself (e.g. in the scheduler).

With that change the registry itself is not used anywhere in the
production code because it's not required at this point. I will add this
in the next commit.

Change-Id: Ia8efba26114119eecffb9a89264083e4b8a80de0
2021-05-17 16:47:13 -07:00
James E. Blair b9a6190a45 Support overlapping repos and a flat workspace scheme
This adds the concept of a 'scheme' to the merger.  Up to this point,
the merger has used the 'golang' scheme in all cases.  However it is
possible with Gerrit to create a set of git repositories which collide
with each other using that scheme:

  root/example.com/component
  root/example.com/component/subcomponent

The users which brought this to our attention intend to use their repos
in a flat layout, like:

  root/component
  root/subcomponent

To resolve this we need to do two things: avoid collisions in all cases
in the internal git repo caches of the mergers and executors, and give
users options to resolve collisions in workspace checkouts.

In this change, mergers are updated to support three schemes:

  * golang (the current behavior)
  * flat (new behavior described above)
  * unique

The unique scheme is not intended to be user-visible.  It produces a
truly unique and non-conflicting name by using urllib.quote_plus.  It
sacrifices legibility in order to obtain uniqueness.

The mergers and executors are updated to use the unique scheme in their
internal repo caches.

A new job attribute, 'workspace-scheme' is added to allow the user to
select between 'golang' and 'flat' when Zuul prepares the repos for
checkout.

There is one more kind of repo that Zuul prepares: the playbook repo.
Each project that supplies a playbook to a job gets a copy of its repo
checked out into a dedicated directory (with no sibling repos).  In that
case there is no risk of collision, and so we retain the current behavior
of using the golang scheme for these checkouts.  This allows the playbook
paths to continue to be self-explanatory.  For example:

  trusted/project_0/example.com/org/project/playbooks/run.yaml

Documentation and a release note are added as well.

Change-Id: I3fa1fd3c04626bfb7159aefce0f4dcb10bbaf5d9
2021-04-29 17:56:24 -07:00
James E. Blair d4c7d29360 Clarify merger updates and resets
Several changes in an attempt to clarify exactly when updates and
resets should and do happen:

* Remove the repo_state argument from Merger.getRepo()

It was unclear under what circumstances the low-level repo object
honored repo_state (not much).  Remove it entirely and rely on
high-level Merger methods to deal with repo_state.

* Have merger.setRepoState() operate on one project instead of a
  list of items

Part of the reason we were passing repo_state to low-level
methods was to reset the state for required projects in the
executor.  Essentially there were three cases: projects of change
items, projects of non-change items, and projects of neither but
in required-projects.  The low-level repo_state usage only
handled the last, the first is easy, and the second we handled by
creating a list of non-change items and passing it to
setRepoState on the merger.

A simpler method of handling all of that is to reduce it to two
cases: projects of change items (which need to be merged) and the
rest (which need to be restored).  If we do that, we can maintain
a set of projects we've seen while merging in the first case,
then iterate over all the remaining projects and call
setRepoState on each in the second.

* Remove the update call from Repo.reset()

This lets us call Repo.reset() frequently (i.e., at the start of
any operation that writes to the merger's git repo working dir)
without performing a git fetch.  We need to make sure we call
Repo.update() where necessary.

* Remove the reset call from Merger.updateRepo()

This will now only call repo.update(), and even that will only
happen if the repo_state says we should.  So we can safely call
this before any significant operations and know that it will
update the repo if necessary.

* Add an update() call to getRepoState()

Because we removed the update() call from Repo.reset(), we need
to add one here next to the existing call to reset().

* Add a reset call to getFiles()

It relied on the reset in updateRepo.

* Set execution_context to False on the executor's main merger

The execution_context parameter determines whether we manipulate the
origin remotes to point at the previous commit.  This should be set
for mergers that operate on the build work dir, but it should not
be set for the main merger within the executor (so the main merger
behaves just like a standalone merger).  It previous was erroneously
set for the executor's main merger and this change corrects that.

* Add Merger.updateRepo() calls in the merger server merge method

The merger needs to update and reset each repo before merging changes.
Currently _mergeItem resets the repo the first time it encounters it.
But we still need to update the repo.  We don't want to update within
the merger method because the executor performs batch updates in
parallel before starting a merge and we don't want to re-do that work.
So instead we add it to the merger server invocation, so it's only
used in the merger:merge gearman function code path.

Change-Id: I740e958357dc7bf0a6506474c5991da12ab6264e
2021-04-21 14:53:54 -07:00
James E. Blair f7f689c87d Revert "Revert "Make repo state buildset global""
This reverts commit 02ca9aeb8f.

This makes a couple of changes to make sure we're passing in the
full repo_state to updateRepo rather than the project repo state.

Change-Id: Ifca2cd48f24b9cf8eec718034c879ffe75fb6ecc
2021-04-21 14:53:54 -07:00
Tobias Henkel 02ca9aeb8f
Revert "Make repo state buildset global"
We discovered a regression in the global repo state that can lead to
wrong commits checked out on required projects. Further a fix for this
needs a slight re-design of the reconfiguration process. In order to
have some more time to do this revert it for now.

This reverts commit 175990ec42.

Change-Id: Ibcf3758ab886a01468095a8c588cf78db209529e
2021-04-08 16:42:22 +02:00
Zuul ab9e808def Merge "Component Registry in ZooKeeper" 2021-03-13 14:35:06 +00:00
Jan Kubovy 22935c1177 Component Registry in ZooKeeper
This change adds a component registry which can be used by different
components, such as executors, mergers and others to register
themselves, report their state and store arbitrary runtime information.

This is needed to e.g., monitor components or to share the
"accepting_work" state of executors later on.

Change-Id: I4b7197d6cb399513e30d314f8a5f4f55ad9266f8
2021-03-12 13:51:48 -08:00
Zuul 591f6c40dc Merge "Make repo state buildset global" 2021-03-09 18:22:26 +00:00
Felix Edel 2dfb34a818 Initialize ZooKeeper connection in server rather than in cmd classes
Currently, the ZooKeeper connection is initialized directly in the cmd
classes like zuul.cmd.scheduler or zuul.cmd.merger and then passed to
the server instance.

Although this makes it easy to reuse a single ZooKeeper connection for
multiple components in the tests it's not very realistic.
A better approach would be to initialize the connection directly in the
server classes so that each component has its own connection to
ZooKeeper.

Those classes already get all necessary parameters, so we could get rid
of the additional "zk_client" parameter.

Furthermore it would allow us to use a dedicated ZooKeeper connection
for each component in the tests which is more realistic than sharing a
single connection between all components.

Change-Id: I12260d43be0897321cf47ef0c722ccd74599d43d
2021-03-08 07:15:32 -08:00
Jonas Sticha 175990ec42
Make repo state buildset global
Store repo state globally for whole buildset
including inherited and required projects.
This is necessary to avoid inconsistencies in case,
e.g., a required projects HEAD changes between two
dependent jobs executions in the same buildset.

Change-Id: I872d4272d8a594b2a40dee0c627f14c990399dd5
2021-03-05 13:28:22 +01:00
Guillaume Chauvel c0d46c2b37 merger cat: remove self._update duplicates
similar to https://review.opendev.org/c/zuul/zuul/+/776842
self._update is called in the try section.

Change-Id: I4d2991aac74b8ae4e5b8dc2c520c716ae9db645f
2021-02-24 17:48:08 +01:00
Guillaume Chauvel 73093e6d4b merger fileschanges: remove self._update duplicates
self._update is called in the try section.

Change-Id: I8347e1fb964f86a99c118452145ea10f776387e7
2021-02-21 23:10:04 +01:00
Jan Kubovy 7ae2805a5a Connect merger to Zookeeper
Part of point 5 in https://etherpad.openstack.org/p/zuulv4

Connection is idle for now.

Also update component documentation.

Change-Id: I97a97f61940fab2a555c3651e78fa7a929e8ebfb
2021-02-15 14:44:18 +01:00
Clark Boylan 0f7982fee0 Clean up stale git index.lock files on merger startup
We've noticed that if zuul executors (and presumably mergers) don't shut
down gracefully that they may leak git index.lock files in the .git dirs
of the merger repos. Since these repos should be dedicated to zuul's use
without outside interference we can reasonably safely remove any present
index.lock files when starting zuul mergers (and executors).

This implementation does an os.walk under the merger repos root looking
for .git dirs and once it has found them checks for any index.lock
files. This happens before starting the gearman worker which should
avoid any races with these resources.

Change-Id: Ie043453bcdf4500a3718da6f705c882431acafdf
2020-09-17 15:19:16 -07:00
Simon Westphahl 48a64cfaa2 Correctly fail cat/fileschanges when update fails
cat and fileschanges jobs were reported as updated even in cases were
the repo update faild. The `Merger.updateRepo()` method will now let the
Exception bubble up so it can be dealt with in the merger server handlers.

The `ExecutorServer._innerUpdateLoop()` already handles exceptions
properly.

Change-Id: If2e44dc0449d427d16d6995b7cae9f4482984f48
2020-07-16 15:35:18 +02:00
Guillaume Chauvel ab08ae3c7a Fix quickstart gating, Add git name and email to executor
When using quickstart tutorial to test gating, a merge commit cannot be
created because git user.name and user.email are not set

Change-Id: I62df8839e9637c10d3fd656cf6a3cb02cae40af1
Story: 2007603
Task: 39586
2020-05-31 15:01:18 +02:00
James E. Blair 04ac8287b6 Match tag items against containing branches
To try to approach a more intuitive behavior for jobs which apply
to tags but are defined in-repo (or even for centrally defined
jobs which should behave differently on tags from different branches),
look up which branches contain the commit referenced by a tag and
use that list in branch matchers.

If a tag item is enqueued, we look up the branches which contain
the commit referenced by the tag.  If any of those branches match a
branch matcher, the matcher is considered to have matched.

This means that if a release job is defined on multiple branches,
the branch variant from each branch the tagged commit is on will be
used.

A typical case is for a tagged commit to appear in exactly one branch.
In that case, the most intuitive behavior (the version of the job
defined on that branch) occurs.

A less typical but perfectly reasonable case is that there are two
identical branches (ie, stable has just branched from master but not
diverged).  In this case, if an identical commit is merged to both
branches, then both variants of a release job will run.  However, it's
likely that these variants are identical anyway, so the result is
apparently the same as the previous case.  However if the variants
are defined centrally, then they may differ while the branch contents
are the same, causing unexpected behavior when both variants are
applied.

If two branches have diverged, it will not be possible for the same
commit to be added to both branches, so in that case, only one of
the variants will apply.  However, tags can be created retroactively,
so that even if a branch has diverged, if a commit in the history of
both branches is tagged, then both variants will apply, possibly
producing unexpected behavior.

Considering that the current behavior is to apply all variants of
jobs on tags all the time, the partial reduction of scope in the most
typical circumstances is probably a useful change.

Change-Id: I5734ed8aeab90c1754e27dc792d39690f16ac70c
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw.de>
2020-03-06 13:29:18 -08:00
Tobias Henkel 1d1da5ae50
Centralize merge handling
We have quite some duplicated code to support the merge functions on
the executor and merger. Merge the common functionality into a
BaseMergeServer class that can be used as a base class of MergeServer
and ExecuteServer.

Change-Id: I86d7053a5095baf32fc0da76af639667fb760c33
2020-02-14 13:20:55 +01:00
Tobias Henkel 130708b43c
Support pausing merge jobs
Currently an executor still executes merge jobs even when it's
paused. This is surprising to the user and an operational problem when
having a misbehaving executor for some reason. Further the merger now
also can be paused explicitly.

Change-Id: I7ebf2df9d6648789e6bb2d797edd5b67a0925cfc
2020-02-14 13:20:15 +01:00
Tobias Henkel 5d35195b65
Unify gearman worker handling
We currently have five gearman worker in the system which are all
similar but different. In preparation of adding a sixth worker
refactor them to all re-use a central class and the same config and
dispatch mechanism.

Change-Id: Ifbb4c5aec28fe5b044569d365a4e3fe31150eb3b
2019-07-15 10:09:15 +02:00
Tobias Henkel 5f423346aa
Filter out unprotected branches from builds if excluded
When working with GitHub Enterprise the recommended working model is
branch&pull within the same repo. This is especially necessary for
workflows that combine multiple repos in a single workspace. This has
the side effect that those repos can contain a large number of
branches that never will be part of a job. Having many branches in a
repo can have a large impact on the executor performance so exclude
them from the repo state if we exclude them in the tenant config. This
change only affects branches, not tags or other references.

Change-Id: Ic8e75fa8bf76d2e5a0b1779fa3538ee9a5c43411
2019-06-25 20:49:54 +02:00
Tobias Henkel 7639053905
Annotate merger logs with event id
If we have an event we should submit its id also to the merger so
we're able to trace merge operations via an event id.

Change-Id: I12b3ab0dcb3ec1d146803006e0ef644e485a7afe
2019-05-17 06:11:04 +02:00
Tobias Henkel e69c9fe97b
Make git clone timeout configurable
When dealing with large repos or slow connections to the scm the
default clone timeout of 5 minutes may not be sufficient. Thus a
configurable clone/fetch timeout can make it possible to handle those
repos.

Change-Id: I0711895806b7cbcc8b9fa3ba085bcf79d7fb6665
2019-01-31 11:17:05 +01:00
Zuul 91e7e680a1 Merge "Use gearman client keepalive" 2019-01-28 20:09:30 +00:00
Paul Belanger 47aa6b12b2 Ensure command_socket is last thing to close
This updates all services to how zuul-scheduler works, we close the
command_socket at the last possible moment. This also means we can now
use the command socket on the filesystem as an idicator that zuul
properly shutdown.

Change-Id: I5fe1bc96c87e1177a2b94d73a9cbe505a7807202
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2019-01-07 10:19:48 -05:00
Tobias Henkel fb4c6402a4
Use gearman client keepalive
If the gearman server vanishes (e.g. due to a VM crash) some clients
like the merger may not notice that it is gone. They just wait forever
for data to be received on an inactive connection. In our case the VM
containing the zuul-scheduler crashed and after the restart of the
scheduler all mergers were waiting for data on the stale connection
which blocked a successful scheduler restart.  Using tcp keepalive we
can detect that situation and let broken inactive connections be
killed by the kernel.

Depends-On: I8589cd45450245a25539c051355b38d16ee9f4b9
Change-Id: I30049d59d873d64f3b69c5587c775827e3545854
2018-12-11 21:28:59 +01:00
James E. Blair 4e70bebafb Map file comment line numbers
After a build finishes, if it returned file comments, the executor
will use the repo in the workspace (if it exists) to map the
supplied line numbers to the original lines in the change (in case
an intervening change has altered the files).

A new facility for reporting warning messages is added, and if the
executor is unable to perform the mapping, or the file comment syntax
is incorrect, a warning is reported.

Change-Id: Iad48168d41df034f575b66976744dbe94ec289bc
2018-08-15 14:38:03 -07:00
Fabien Boucher 194a2bf237 Git driver
This patch improves the existing git driver by adding
a refs watcher thread. This refs watcher looks at
refs added, deleted, updated and trigger a ref-updated
event.

When a refs is updated and that the related commits
from oldrev to newrev include a change on .zuul.yaml/zuul.yaml
or zuul.d/*.yaml then tenants including that ref is reconfigured.

Furthermore the patch includes a triggering model. Events are
sent to the scheduler so jobs can be attached to a pipeline for
running jobs.

Change-Id: I529660cb20d011f36814abe64f837945dd3f1f33
2017-12-15 14:32:40 +01:00
Paul Belanger 765061143d
Add command socket support to zuul-merger
Like we have in zuul-executor, add command socket support for
zuul-merger.

Change-Id: I66a2cb2ba3f55bdd03e884f47648278e30d2f6ab
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-12-06 16:05:27 -05:00