Since jobs are no longer identified by name but by UUID we also need to
reference job dependencies in the MQTT payload by UUID.
For backward-compatibility we'll keep the old "dependencies" field and
add a new "job_dependencies" mapping with the job names and UUIDs.
Change-Id: Ib74b11faf72602e1708ea6364cc4a1000e3f0d3b
The characters '+' and '#' have a special meaning (wildcards) and are
not allowed when publishing messages.
ERROR zuul.MQTTConnection: Could not publish message to topic 'foobar/zuul/c++-test' via mqtt
Traceback (most recent call last):
File "/opt/zuul/lib/python3.11/site-packages/zuul/driver/mqtt/mqttconnection.py", line 97, in publish
self.client.publish(topic, payload=json.dumps(message), qos=qos)
File "/opt/zuul/lib/python3.11/site-packages/paho/mqtt/client.py", line 1233, in publish
raise ValueError('Publish topic cannot contain wildcards.')
ValueError: Publish topic cannot contain wildcards.
Change-Id: Iad2ad551151284910de076cec15b3ac6b1cbda52
This change completes the circular dependency refactor.
The principal change is that queue items may now include
more than one change simultaneously in the case of circular
dependencies.
In dependent pipelines, the two-phase reporting process is
simplified because it happens during processing of a single
item.
In independent pipelines, non-live items are still used for
linear depnedencies, but multi-change items are used for
circular dependencies.
Previously changes were enqueued recursively and then
bundles were made out of the resulting items. Since we now
need to enqueue entire cycles in one queue item, the
dependency graph generation is performed at the start of
enqueing the first change in a cycle.
Some tests exercise situations where Zuul is processing
events for old patchsets of changes. The new change query
sequence mentioned in the previous paragraph necessitates
more accurate information about out-of-date patchsets than
the previous sequence, therefore the Gerrit driver has been
updated to query and return more data about non-current
patchsets.
This change is not backwards compatible with the existing
ZK schema, and will require Zuul systems delete all pipeline
states during the upgrade. A later change will implement
a helper command for this.
All backwards compatability handling for the last several
model_api versions which were added to prepare for this
upgrade have been removed. In general, all model data
structures involving frozen jobs are now indexed by the
frozen job's uuid and no longer include the job name since
a job name no longer uniquely identifies a job in a buildset
(either the uuid or the (job name, change) tuple must be
used to identify it).
Job deduplication is simplified and now only needs to
consider jobs within the same buildset.
The fake github driver had a bug (fakegithub.py line 694) where
it did not correctly increment the check run counter, so our
tests that verified that we closed out obsolete check runs
when re-enqueing were not valid. This has been corrected, and
in doing so, has necessitated some changes around quiet dequeing
when we re-enqueue a change.
The reporting in several drivers has been updated to support
reporting information about multiple changes in a queue item.
Change-Id: I0b9e4d3f9936b1e66a08142fc36866269dc287f1
Depends-On: https://review.opendev.org/907627
Currently, the log_url for retried builds in the MQTT payload always
points to the build result page in Zuul web. As mqtt is meant to be
consumed by machines this breaks e.g. log post processing for those
builds.
To fix this, we do the same as for non-retried builds and provide a
dedicated web_url and log_url [1].
[1]: https://review.opendev.org/c/zuul/zuul/+/703983
Change-Id: I139a80d616d59e262a4f21772d7712fda3b5c03b
This is part of the circular dependency refactor.
This updates the buildset object in memory (and zk) to store builds
indexed by frozen job uuid rather than job name. This also updates
everal related fields and also temporary dictionaries to do the same.
This will allow us, in the future, to have more than one job/build
in a buildset with the same name (for different changes/refs).
Change-Id: I70865ec8d70fb9105633f0d03ba7c7e3e6cd147d
When a build is paused or resumed, we now store this information on the
build together with the event time. Instead of additional attributes for
each timestamp, we add an "event" list attribute to the build which can
also be used for other events in the future.
The events are stored in the SQL database and added to the MQTT payload
so the information can be used by the zuul-web UI (e.g. in the "build
times" gantt chart) or provided to external services.
Change-Id: I789b4f69faf96e3b8fd090a2e389df3bb9efd602
This adds an option to include result data from a job in the MQTT
reporter. It is off by default since it may be quite large for
some jobs.
Change-Id: I802adee834b60256abd054eda2db834f8db82650
Previously support for Gerrit's submitWholeTopic feature was added
so that when it is enabled, changes that are submitted together are
treated as circular dependencies in Zuul. However, this feature did
not work in a gating pipeline because when that setting is enabled,
Gerrit requires all changes to be mergable at once so that it can
attempt to atomically merge all of them. That means that every
change would need a Verified+2 vote before any change can be
submitted. Zuul leaves the vote and submits each change one at a
time.
(Note, this does not affect the emulated submitWholeTopic feature
in Zuul, since in that case, Gerrit will merge each change alone.)
To correct this, a two-phase option is added to reporters. In phase1,
reporters will report all information about the change but not submit.
In phase2, they will only submit. In the cases where we are about
to submit a successful bundle, we enable the two-phase option and
report the entire bundle without submitting first, then proceed to
submit each change in the bundle in sequence as normal. In Gerrit,
if submitWholeTopic is enabled, this will cause all changes to be
submitted as soon as the first one is, but that's okay because we
handle the case where we try to submit a change and it is already
submitted.
The fake Gerrit used in the Zuul unit tests is updated to match
the real Gerrit in these cases. If submitWholeTopic is enabled,
it will return a 409 to submit requests if the whole topic is not
able to be submitted.
One unit test of failed bundle reporting is adjusted since we will
now report the buildset result to all changes before potentially
reporting a second time if the bundle later fails to merge.
While this does mean that some changes will have extra report
messages, it actually makes the behavior consistent (before, some
changes would have 2 reports and some would have only 1; now all
changes will have 2 reports: the expected result and then a second
report of the unexpected merge failure).
All reporters are updated to handle the two-phase reporting. Since
all reporters have different API methods to leave comments and merge
changes, this won't actually cause any extra API calls even for
reporters which don't need two-phase reporting.
Non-merging reporters (MQTT, SMTP, etc) simply ignore phase2.
Change-Id: Ibf377ab5b7141fe60ecfd5cbbb296bb4f9c24265
Since the MQTT reporter can be used to emit start or enqueue events,
it may be useful to match enqueue and dequeue events. That could
be done with tenant+pipeline+change+patchset, but we also have a
UUID for queue items, so to make it simpler for MQTT consumers,
let's expose that.
Change-Id: Iff88bcfd73e00f292e0cc947f548582a276a7975
The MQTT reporter now includes artifacts for completed builds. Systems
which watch for MQTT events can now directly consume those artifacts without
the intermediate step of looking them up via the API.
Change-Id: I9df9e1dfd6854518c110dd65d4f89dea449c6fc0
Currently the mqtt reporter uses the report url as log_url. This is
fine as long as report-build-page is disabled. As soon as
report-build-page is enabled on a tenant it reports the url to the
result page of the build. As mqtt is meant to be consumed by machines
this breaks e.g. log post processing.
Fix this by reporting the real log url as log_url and add the field
web_url for use cases where really the human url is required.
This fixes also a wrong indentation in the mqtt driver documentation,
resulting in all buildset.builds.* attributes being listed as buildset.*
attributes.
Change-Id: I91ce93a7000ddd0d70ce504b70742262d8239a8f
If a build must be retried, the previous build information get lost.
To be informed that a build was retried, the retry builds are now part
of the mqtt message.
Change-Id: I8c93376f844c3d1c55c89a250384a7f835763677
Depends-On: https://review.opendev.org/704983
When trying to trace logs of builds it is often useful to search for log
messages via the event id of a specific build.
The event id is printed in (nearly) all log messages but is not provided
by the MQTT reporter, so one has to look it up first based on the build
id. To circumvent this extra step and make searching the logs more
straight forward, this patch makes sure the event id is provided in the
JSON message by the MQTT reporter.
Change-Id: I908dd7eca250825eed97bf8261fd33b69cc5f543
Sometimes, e.g. during reconfiguration, it can take quite some time
between the trigger event and when a change is enqueued.
This change allows tracking the time it takes from receiving the event
until it is processed by the scheduler.
Change-Id: I347acf56bc8d7671d96f6be444c71902563684be
We are using the MQTT reporter to create metrics about our job runtimes,
queueing times, etc. Knowing if a job has a dependency or not is a
valuable information for those calculations that was missing so far.
Change-Id: I4133487f98d458be65495b71c271316360dd982b
Reporting the execute_time helps to distinguish between the preparation
phase and the actual execution phase.
Change-Id: Ib578efb39cede37996c3516e1a9a251d6ed8a4b0
Having the enqueue and report timestamp allows for better reporting
regarding the livecycle of a change.
Change-Id: Ia3915ea35853f007181d5660c361417175c29507
We had cases where zuul used unmerged job descriptions to a trusted
parent job (change A) in non related downstream jobs (change B) not
having zuul.yaml changes. This happened if the trusted parent job is
not defined in the same config repo as the pipeline. E.g. if change A
adds a new post playbook an unrelated change B fails with 'post
playbook not found'. This is caused by the scheduler using the wrong
unmerged job definition of change A but the final workspace contains
the correct state without change A.
In case of change B there is no dynamic layout and the current active
layout should be taken. However it is taken directly from the pipeline
object in getLayout (item.queue.pipeline.layout) which doesn't have
the correct layout referenced at any time while the layout referenced
by the tenant object is correct.
Because the pipeline definition is in a different repository than the
proposed config repo change, when the dynamic layout is created for
the config repo change, the previously cached Pipeline objects are used
to build the layout. These objects are the actual live pipelines, and
when they are added to the layout, they have their Pipeline.layout
attributes set to the dynamic layout. This dynamic layout is then not
used further (it is only created for syntax validation), but the pipelines
remain altered.
We could go ahead and just change that to
item.queue.pipeline.layout.tenant.layout but this feels awkward and
would leave the possibility of similar bugs that are hard to find and
debug. Further pipeline.layout is almost everywhere just used to get
the tenant and not the layout. So this attempt to fix this bug goes
further and completely rips out the layout from the Pipeline object
and replaces it by the tenant. Because the tenant object is never
expected to change during the lifetime of the pipeline object, holding
the reference to the tenant, rather than the layout, is safe.
Change-Id: I1e663f624db5e30a8f51b56134c37cc6e8217029