Commit Graph

92 Commits

Author SHA1 Message Date
Zuul 617bbb229c Merge "Fix validate-tenants isolation" 2024-02-28 02:46:55 +00:00
James E. Blair c531adacae Add --keep-config-cache option to delete-state command
The circular dependency refactor will require deleting all of the
pipeline states as well as the event queues from ZK while zuul
is offline during the upgrade.  This is fairly close to the existing
"delete-state" command, except that we can keep the config cache.
Doing so will allow for a faster recovery since we won't need to
issue all of the cat jobs again in order to fetch file contents.

To facilitate this, we add a "--keep-config-cache" argument to
the "delete-state" command which will then remove everything under
/zuul except /zuul/config.

Also, speed up both operations by implementing a fast recursive
delete method which sends async delete ops depth first and only
checks their results at the end (as opposed to the standard kazoo
delete which checks each operation at once).

This is added without a release note since it's not widely useful
and the upcoming change which requires its use will have a release
note with usage instructions.

Change-Id: I4db43e00a73f5e5b796261ffe7236ed906e6b421
2024-02-02 12:09:52 -08:00
James E. Blair fb7d24b245 Fix validate-tenants isolation
The validate-tenants scheduler subcommand is supposed to perform
complete tenant validation, and in doing so, it interacts with zk.
It is supposed to isolate itself from the production data, but
it appears to accidentally use the same unparsed config cache
as the production system.  This is mostly okay, but if the loading
paths are different, it could lead to writing cache errors into
the production file cache.

The error is caused because the ConfigLoader creates an internal
reference to the unparsed config cache and therefore ignores the
temporary/isolated unparsed config cache created by the scheduler.

To correct this, we will always pass the unparsed config cache
into the configloader.

Change-Id: I40bdbef4b767e19e99f58cbb3aa690bcb840fcd7
2024-01-31 14:58:45 -08:00
James E. Blair ebb7986c6f Client (old): don't translate null to 0000000
Like I9886cd44f8b4bae6f4a5ce3644f0598a73ecfe0a, have the zuul client
send actual null values for oldrev/newrev instead of 0000000 which
could lead to unintended behavior.

Change-Id: I44994426493d05a039b5a1051504958b36729c9d
2024-01-12 06:49:17 -08:00
Simon Westphahl cc2ff9742c
Fix delete-pipeline-state command
This change also extends the test to assert that the pipeline change
list was re-creates by asserting that the node exists in Zookeeper.

Traceback (most recent call last):
  File "/home/westphahl/src/opendev/zuul/zuul/.nox/tests/bin/zuul-admin", line 10, in <module>
    sys.exit(main())
  File "/home/westphahl/src/opendev/zuul/zuul/zuul/cmd/client.py", line 1066, in main
    Client().main()
  File "/home/westphahl/src/opendev/zuul/zuul/zuul/cmd/client.py", line 592, in main
    if self.args.func():
  File "/home/westphahl/src/opendev/zuul/zuul/zuul/cmd/client.py", line 1045, in delete_pipeline_state
    PipelineChangeList.new(context)
  File "/home/westphahl/src/opendev/zuul/zuul/zuul/zk/zkobject.py", line 225, in new
    obj._save(context, data, create=True)
  File "/home/westphahl/src/opendev/zuul/zuul/zuul/zk/zkobject.py", line 507, in _save
    path = self.getPath()
  File "/home/westphahl/src/opendev/zuul/zuul/zuul/model.py", line 982, in getPath
    return self.getChangeListPath(self.pipeline)
AttributeError: 'PipelineChangeList' object has no attribute 'pipeline'

Change-Id: I8d7bf2fdb3ebf4790ca9cf15519dff4b761fbf2e
2023-04-26 15:58:32 +02:00
Zuul 987fba9f28 Merge "Fix prune-database command" 2023-03-30 01:49:54 +00:00
James E. Blair 7153505cd5 Fix prune-database command
This command had two problems:

* It would only delete the first 50 buildsets
* Depending on DB configuration, it may not have deleted anything
  or left orphan data.

We did not tell sqlalchemy to cascade delete operations, meaning that
when we deleted the buildset, we didn't delete anything else.

If the database enforces foreign keys (innodb, psql) then the command
would have failed.  If it doesn't (myisam) then it would have deleted
the buildset rows but not anything else.

The tests use myisam, so they ran without error and without deleting
the builds.  They check that the builds are deleted, but only through
the ORM via a joined load with the buildsets, and since the buildsets
are gone, the builds weren't returned.

To address this shortcoming, the tests now use distinct ORM methods
which return objects without any joins.  This would have caught
the error had it been in place before.

Additionally, the delet operation retained the default limit of 50
rows (set in place for the web UI), meaning that when it did run,
it would only delete the most recent 50 matching builds.

We now explicitly set the limit to a user-configurable batch size
(by default, 10,000 builds) so that we keep transaction sizes
manageable and avoid monopolizing database locks.  We continue deleting
buildsets in batches as long as any matching buildsets remain. This
should allow users to remove very large amounts of data without
affecting ongoing operations too much.

Change-Id: I4c678b294eeda25589b75ab1ce7c5d0b93a07df3
2023-03-29 17:12:13 -07:00
James E. Blair b1490b1d8e Avoid layout updates after delete-pipeline-state
The delete-pipeline-state commend forces a layout update on every
scheduler, but that isn't strictly necessary.  While it may be helpful
for some issues, if it really is necessary, the operator can issue
a tenant reconfiguration after performing the delete-pipeline-state.

In most cases, where only the state information itself is causing a
problem, we can omit the layout updates and assume that the state reset
alone is sufficient.

To that end, this change removes the layout state changes from the
delete-pipeline-state command and instead simply empties and recreates
the pipeline state and change list objects.  This is very similar to
what happens in the pipeline manager _postConfig call, except in this
case, we have the tenant lock so we know we can write with imputinity,
and we know we are creating objects in ZK from scratch, so we use
direct create calls.

We set the pipeline state's layout uuid to None, which will cause the
first scheduler that comes across it to (assuming its internal layout
is up to date) perform a pipeline reset (which is almost a noop on an
empty pipeline) and update the pipeline state layout to the current
tenant layout state.

Change-Id: I1c503280b516ffa7bbe4cf456d9c900b500e16b0
2023-03-01 13:54:46 -08:00
James E. Blair 7a8882c642 Set layout state event ltime in delete-pipeline-state
The delete-pipeline-state command updates the layout state in order
to force schedulers to update their local layout (essentially perform
a local-only reconfiguration).  In doing so, it sets the last event
ltime to -1.  This is reasonable for initializing a new system, but
in an existing system, when an event arrives at the tenant trigger
event queue it is assigned the last reconfiguration event ltime seen
by that trigger event queue.  Later, when a scheduler processes such
a trigger event after the delete-pipeline-state command has run, it
will refuse to handle the event since it arrived much later than
its local layout state.

This must then be corrected manually by the operator by forcing a
tenant reconfiguration.  This means that the system essentially suffers
the delay of two sequential reconfigurations before it can proceed.

To correct this, set the last event ltime for the layout state to
the ltime of the layout state itself.  This means that once a scheduler
has updated its local layout, it can proceed in processing old events.

Change-Id: I66e798adbbdd55ff1beb1ecee39c7f5a5351fc4b
2023-02-28 07:11:41 -08:00
James E. Blair 3780ed548c Unpin JWT and use integer IAT values
PyJWT 2.6.0 began performing validation of iat (issued at) claims
in 9cb9401cc5

I believe the intent of RFC7519 is to support any numeric values
(including floating point) for iat, nbf, and exp, however, the
PyJWT library has made the assumption that the values should be
integers, and therefore when we supply an iat with decimal seconds,
PyJWT will round down when validating the value. In our unit tests,
this can cause validation errors.

In order to avoid any issues, we will round down the times that
we supply when generating JWT tokens and supply them as integers
in accordance with the robustness principle.

Change-Id: Ia8341b4d5de827e2df8878f11f2d1f52a1243cd4
2022-11-15 13:52:53 -08:00
James E. Blair 3a981b89a8 Parallelize some pipeline refresh ops
We may be able to speed up pipeline refreshes in cases where there
are large numbers of items or jobs/builds by parallelizing ZK reads.

Quick refresher: the ZK protocol is async, and kazoo uses a queue to
send operations to a single thread which manages IO.  We typically
call synchronous kazoo client methods which wait for the async result
before returning.  Since this is all thread-safe, we can attempt to
fill the kazoo pipe by having multiple threads call the synchronous
kazoo methods.  If kazoo is waiting on IO for an earlier call, it
will be able to start a later request simultaneously.

Quick aside: it would be difficult for us to use the async methods
directly since our overall code structure is still ordered and
effectively single threaded (we need to load a QueueItem before we
can load the BuildSet and the Builds, etc).

Thus it makes the most sense for us to retain our ordering by using
a ThreadPoolExecutor to run some operations in parallel.

This change parallelizes loading QueueItems within a ChangeQueue,
and also Builds/Jobs within a BuildSet.  These are the points in
a pipeline refresh tree which potentially have the largest number
of children and could benefit the most from the change, especially
if the ZK server has some measurable latency.

Change-Id: I0871cc05a2d13e4ddc4ac284bd67e5e3003200ad
2022-11-09 10:51:29 -08:00
James E. Blair 9a279725f9 Strictly sequence reconfiguration events
In the before times when we only had a single scheduler, it was
naturally the case that reconfiguration events were processed as they
were encountered and no trigger events which arrived after them would
be processed until the reconfiguration was complete.  As we added more
event queues to support SOS, it became possible for trigger events
which arrived at the scheduler to be processed before a tenant
reconfiguration caused by a preceding event to be complete.  This is
now even possible with a single scheduler.

As a concrete example, imagine a change merges which updates the jobs
which should run on a tag, and then a tag is created.  A scheduler
will process both of those events in succession.  The first will cause
it to submit a tenant reconfiguration event, and then forward the
trigger event to any matching pipelines.  The second event will also
be forwarded to pipeline event queues.  The pipeline events will then
be processed, and then only at that point will the scheduler return to
the start of the run loop and process the reconfiguration event.

To correct this, we can take one of two approaches: make the
reconfiguration more synchronous, or make it safer to be
asynchronous.  To make reconfiguration more synchronous, we would need
to be able to upgrade a tenant read lock into a tenant write lock
without releasing it.  The lock recipes we use from kazoo do not
support this.  While it would be possible to extend them to do so, it
would lead us further from parity with the upstream kazoo recipes, so
this aproach is not used.

Instead, we will make it safer for reconfiguration to be asynchronous
by annotating every trigger event we forward with the last
reconfiguration event that was seen before it.  This means that every
trigger event now specifies the minimum reconfiguration time for that
event.  If our local scheduler has not reached that time, we should
stop processing trigger events and wait for it to catch up.  This
means that schedulers may continue to process events up to the point
of a reconfiguration, but will then stop.  The already existing
short-circuit to abort processing once a scheduler is ready to
reconfigure a tenant (where we check the tenant write lock contenders
for a waiting reconfiguration) helps us get out of the way of pending
reconfigurations as well.  In short, once a reconfiguration is ready
to start, we won't start processing tenant events anymore because of
the existing lock check.  And up until that happens, we will process
as many events as possible until any further events require the
reconfiguration.

We will use the ltime of the tenant trigger event as our timestamp.
As we forward tenant trigger events to the pipeline trigger event
queues, we decide whether an event should cause a reconfiguration.
Whenever one does, we note the ltime of that event and store it as
metadata on the tenant trigger event queue so that we always know what
the most recent required minimum ltime is (ie, the ltime of the most
recently seen event that should cause a reconfiguration).  Every event
that we forward to the pipeline trigger queue will be annotated to
specify that its minimum required reconfiguration ltime is that most
recently seen ltime.  And each time we reconfigure a tenant, we store
the ltime of the event that prompted the reconfiguration in the layout
state.  If we later process a pipeline trigger event with a minimum
required reconfigure ltime greater than the current one, we know we
need to stop and wait for a reconfiguration, so we abort early.

Because this system involves several event queues and objects each of
which may be serialized at any point during a rolling upgrade, every
involved object needs to have appropriate default value handling, and
a synchronized model api change is not helpful.  The remainder of this
commit message is a description of what happens with each object when
handled by either an old or new scheduler component during a rolling
upgrade.

When forwarding a trigger event and submitting a tenant
reconfiguration event:

The tenant trigger event zuul_event_ltime is initialized
from zk, so will always have a value.

The pipeline management event trigger_event_ltime is initialzed to the
tenant trigger event zuul_event_ltime, so a new scheduler will write
out the value.  If an old scheduler creates the tenant reconfiguration
event, it will be missing the trigger_event_ltime.

The _reconfigureTenant method is called with a
last_reconfigure_event_ltime parameter, which is either the
trigger_event_ltime above in the case of a tenant reconfiguration
event forwarded by a new scheduler, or -1 in all other cases
(including other types of reconfiguration, or a tenant reconfiguration
event forwarded by an old scheduler).  If it is -1, it will use the
current ltime so that if we process an event from an old scheduler
which is missing the event ltime, or we are bootstrapping a tenant or
otherwise reconfiguring in a context where we don't have a triggering
event ltime, we will use an ltime which is very new so that we don't
defer processing trigger events.  We also ensure we never go backward,
so that if we process an event from an old scheduler (and thus use the
current ltime) then process an event from a new scheduler with an
older (than "now") ltime, we retain the newer ltime.

Each time a tenant reconfiguration event is submitted, the ltime of
that reconfiguration event is stored on the trigger event queue.  This
is then used as the min_reconfigure_ltime attribute on the forwarded
trigger events.  This is updated by new schedulers, and ignored by old
ones, so if an old scheduler process a tenant trigger event queue it
won't update the min ltime.  That will just mean that any events
processed by a new scheduler may continue to use an older ltime as
their minimum, which should not cause a problem.  Any events forwarded
by an old scheduler will omit the min_reconfigure_ltime field; that
field will be initialized to -1 when loaded on a new scheduler.

When processing pipeline trigger events:

In process_pipeline_trigger_queue we compare two values: the
last_reconfigure_event_ltime on the layout state which is either set
to a value as above (by a new scheduler), or will be -1 if it was last
written by an old scheduler (including in the case it was overwritten
by an old scheduler; it will re-initialize to -1 in that case).  The
event.min_reconfigure_ltime field will either be the most recent
reconfiguration ltime seen by a new scheduler forwarding trigger
events, or -1 otherwise.  If the min_reconfigure_ltime of an event is
-1, we retain the old behavior of processing the event regardless.
Only if we have a min_reconfigure_ltime > -1 and it is greater than
the layout state last_reconfigure_event_ltime (which itself may be -1,
and thus less than the min_reconfigure_ltime) do we abort processing
the event.

(The test_config_update test for the Gerrit checks plugin is updated
to include an extra waitUntilSettled since a potential test race was
observed during development.)

Change-Id: Icb6a7858591ab867e7006c7c80bfffeb582b28ee
2022-07-18 10:51:59 -07:00
Vitaliy Lotorev ab68665f12 Replace 'web' section with 'webclient'
'web' section is used by zuul-web component while zuul REST API
client uses 'webclient' section.

Change-Id: I145c9270ca6676abd0d4977ce1c4c637d304a264
2022-06-05 17:47:17 +03:00
James E. Blair 3ffbf10f25 Add prune-database command
This adds a zuul-admin command which allows operators to delete old
database entries.

Change-Id: I4e277a07394aa4852a563f4c9cdc39b5801ab4ba
2022-05-30 07:31:16 -07:00
Matthieu Huin 57c78c08e1 Clarify zuul admin CLI scope
We have two CLIs: zuul-client for REST-related operations, which cover
tenant-scoped, workflow modifying actions such as enqueue, dequeue and
promote; and zuul which supercedes zuul-client and covers also true admin
operations like ZooKeeper maintenance, config checking and issueing auth tokens.
This is a bit confusing for users and operators, and can induce code
duplication.

* Rename zuul CLI into zuul-admin. zuul is still a valid endpoint
  and will be removed after next release.
* Print a deprecation warning when invoking the admin CLI as zuul
  instead of zuul-admin, and when running autohold-*, enqueue-*,
  dequeue and promote subcommands. These subcommands will need to be
  run with zuul-client after next release.
* Clarify the scopes and deprecations in the documentation.

Change-Id: I90cf6f2be4e4c8180ad0f5e2696b7eaa7380b411
2022-05-19 15:35:30 +02:00
James E. Blair 864a2b7701 Make a global component registry
We generally try to avoid global variables, but in this case, it
may be helpful to set the component registry as a global variable.

We need the component registry to determine the ZK data model API
version.  It's relatively straightforward to pass it through the
zkcontext for zkobjects, but we also may need it in other places
where we might alter processing of data we previously got from zk
(eg, the semaphore cleanup).  Or we might need it in serialize or
deserialize methods of non-zkobjects (for example, ChangeKey).

To account for all potential future uses, instantiate a global
singleton object which holds a registry and use that instead of
local-scoped component registry objects.  We also add a clear
method so that we can be sure unit tests start with clean data.

Change-Id: Ib764dbc3a3fe39ad6d70d4807b8035777d727d93
2022-02-14 10:58:34 -08:00
James E. Blair 29fbee7375 Add a model API version
This is a framework for making upgrades to the ZooKeeper data model
in a manner that can support a rolling Zuul system upgrade.

Change-Id: Iff09c95878420e19234908c2a937e9444832a6ec
2022-01-27 12:19:11 -08:00
Zuul 4808bc025e Merge "Add "zuul delete-pipeline-state" command" 2022-01-27 11:26:26 +00:00
James E. Blair 65da4efdd4 Add "zuul delete-pipeline-state" command
This is intended to aid Zuul developers who are diagnosing a bug
with a running Zuul and who have determined that Zuul may be able to
correct the situation and resume if a pipeline is completely reset.

It is intrusive and not at all guaranteed to work.  It may make things
worse.  It's basically just a convenience method to avoid firing up
the REPL and issuing Python commands directly.  I can't enumerate the
requirements where it may or may not work.  Therefore the documentation
recommends against its use and there is no release note included.

Nevertheless, we may find it useful to have such a command during
a crisis in the future.

Change-Id: Ib637c31ff3ebbb2733a4ad9b903075e7b3dc349c
2022-01-26 16:36:04 -08:00
James E. Blair 3aa546da86 Remove the rpc client and listener
These are not used any more, remove them from the scheduler and
the "zuul" client.

Change-Id: I5a3217dde32c5f8fefbb0a7a8357a737494d2956
2022-01-25 06:44:09 -08:00
Tristan Cacqueray cb13bdb90c Remove ZooKeeperClient for tenant-conf-check
This change enables running the tenant-conf-check without access
to the ZooKeeper service.

Change-Id: I285cd44f86e5d900715b052b13bf7b2bc58e77a4
2022-01-10 20:04:02 +00:00
Clark Boylan 5b1ba567c8 Prevent duplicate config file entries
It is currently possible to list default zuul config file paths in the
extra-config-paths config directive. Doing so will duplicate the configs
in Zuul which can cause problems. Prevent this entirely via
configuration validation.

Note: There has been a bit of refactoring to ensure that the voluptuous
schema is validated when reading the config. This ensures that an
invalid config doesn't produce hard to understand error messages because
loadTPCs() has attempted to process configuration that isn't valid.
Instead we can catch schema errors early and report them with human
friendly messages.

Change-Id: I07e9d4d3614cbc6cdee06b2866f7ae41d7779135
2021-11-15 15:16:25 -08:00
Felix Edel 3029b16489 Make the ConfigLoader work independently of the Scheduler
This is an early preparation step for removing the RPC calls between
zuul-web and the scheduler.

We want to format the status JSON and do the job freezing (job freezing
API) directly in zuul-web without utilising the scheduler via RPC. In
order to make this work, zuul-web must instantiate a ConfigLoader.
Currently this would require a scheduler instance which is not available
in zuul-web, thus we have to make this parameter optional.

Change-Id: I41214086aaa9d822ab888baf001972d2846528be
2021-11-10 09:15:53 +01:00
Clark Boylan d7bca47d35 Cleanup empty secrets dirs when deleting secrets
The zuul delete-keys command can leave us with empty org and project
dirs in zookeeper. When this happens the zuul export-keys command
complaisn about secrets not being present. Address this by checking if
the project dir and org dir should be cleaned up when calling
delete-keys.

Note this happend to OpenDev after renaming all projects from foo/* to
bar/* orphaning the org level portion of the name.

Change-Id: I6bba5ea29a752593b76b8e58a0d84615cc639346
2021-10-19 09:38:21 -07:00
Albin Vass 6e96fcfc67 Exit sucessfully when manipulating project keys
Change-Id: Idb2918fab4d17aa611bf81f42d5b86abc865514f
2021-09-21 16:04:29 +02:00
James E. Blair e2dd49b5be Add delete-state command to delete everything from ZK
This will give operators a tool for manual recovery in case of
emergency.

Change-Id: Ia84beb08b685f59a24f76cb0b6adf518f6e64362
2021-08-24 10:07:41 -07:00
James E. Blair a0af6004de Add copy-keys and delete-keys zuul client commands
These can be used when renaming a project.

Change-Id: I98cf304914449622f9db48651b83e0744b676498
2021-08-24 10:07:41 -07:00
James E. Blair 49d945b5bd Add commands to export/import keys to/from ZK
This removes the filesystem-based keystore in favor of only using
ZooKeeper.  Zuul will no longer load missing keys from the filesystem,
nor will it write out decrypted copies of all keys to the filesystem.

This is more secure since it allows sites better control over when and
where secret data are written to disk.

To provide for system backups to aid in disaster recovery in the case
that the ZK data store is lost, two new scheduler commands are added:

* export-keys
* import-keys

These write the password-protected versions of the keys (in fact, a
raw dump of the ZK data) to the filesystem, and read the same data
back in.  An administrator can invoke export-keys before performing a
system backup, and run import-keys to restore the data.

A minor doc change recommending the use of ``zuul-scheduler stop`` was
added as well; this is left over from a previous version of this change
but warrants updating.

This also removes the test_keystore test file; key generation is tested
in test_v3, and key usage is tested by tests which have encrypted secrets.

Change-Id: I5e6ea37c94ab73ec6f850591871c4127118414ed
2021-08-24 10:07:41 -07:00
James E. Blair 29234faf6c Add nodeset to build table
The name of the nodeset used by a job may be of interest to users
as they look at historical build results (did py38 run on focal or
bionic?).  Add a column for that purpose.

Meanwhile, the node_name column has not been used in a very long
time (at least v3).  It is universally null now.  As a singular value,
it doesn't make a lot of sense for a multinode system.  Drop it.

The build object "node_labels" field is also unused; drop it as well.

The MQTT schema is updated to match SQL.

Change-Id: Iae8444dfdd52561928c80448bc3e3158744e08e6
2021-07-08 15:47:47 -07:00
Tristan Cacqueray 6a8ca2d07b zuul tenant-conf-check: disable scheduler creation
This change prevents the tenant-conf-check from failing when
running without a ZooKeeper service.

Change-Id: Ib4f96268e40afd46eb531f84e0d20751bb985fc3
2021-06-11 16:00:39 +00:00
Ian Wienand 01efd92d2a client: fix REST autohold-list response
The current REST client code takes the autohold API response and
modifies it into a format that the display function can't work with.
I couldn't exactly pinpoint how it got this way, but it seems
emperically that the RPC response is the same as the API response;
i.e. we can remove this munging and the display function can handle
both.

 zuul --zuul-url=https://zuul.opendev.org/ autohold-list --tenant openstack

is an example of this response.

Change-Id: I1087041f1054244130f1b5a68a0282742d6581d7
2021-05-27 17:04:15 +10:00
Simon Westphahl 8b6a887336 Store tenants in unparsed abide as dict
Simplify the code here as preparation for the cross-scheduler config
loading.

Change-Id: I0afd8c814973199781598dbefd51498ec92d733a
2021-05-05 08:26:54 +02:00
Jan Kubovy e7e1fa2660 Instantiate executor client, merger, nodepool and app within Scheduler
Executor client, merger, nodepool and app were instantiated outside the
scheduler and then set using "setX" methods.

Those three components are considered as mandatory and should therefore
be part of all scheduler instances.

This was useful for layout validation where the scheduler was not run
but just instantiated. Since the layout validation does not need to
instantiate a scheduler anymore, this can be simplified by instantiating
these components within the scheduler's constructor.

Change-Id: Ide96a85d17820e3950704577ca6fd0d082e26182
2021-03-09 16:06:29 -08:00
Jan Kubovy 5d1aeeffb5 Make ConnectionRegistry mandatory for Scheduler
So far the connection registry was added after the Scheduler was
instantiated.

We can make the ConnectionRegistry mandatory to simplify the
Scheduler instantiation.

Change-Id: Iff7b1a597c97f2cd13bea75f9f23585b0e7f76b3
2021-03-08 18:51:32 -08:00
Matthieu Huin daa51d766f Bump pyjwt to 2.0.0
The release of pyjwt 2.0.0 changed the behavior of some functions, which
caused errors. Fix the errors, use pyjwt 2.0.0's better handling of JWKS,
and pin requirement to 2.X to avoid future potential API breaking changes.

Change-Id: Ibef736e0f635dfaf4477cc2a90a22665da9f1959
2021-01-14 12:35:18 +00:00
Åsmund Østvold 442925a333 zuul client command 'autohold-list' require argument --tenant
Change-Id: I2cf3907647d0821f1b5268879e5d52835c196d4c
2020-10-21 12:01:41 +02:00
Matthieu Huin 2f70e78ec0 REST API: remove deprecated trigger arg in enqueue endpoint
Do not use the trigger argument if passed to the endpoint.

Change-Id: If730a4f8ac55150c158be73b1fbfd13c6797a3e6
2020-08-18 12:25:14 +02:00
Tobias Henkel 79889887bc
Support promote via tenant scoped rest api
The tenant scoped rest api already supports enqueue, dequeue and some
others. For project admins it's also useful to be able to use promote
to push high priority changes to the front of the gate.

Change-Id: I27e06ea2fc813c6f084fd01a2e9af284d3b15b89
2020-07-24 14:36:17 +02:00
Matthieu Huin 2793d27b12 CLI: Fix errors with the REST client
* Fix error in instantiating Session object: attributes "verify",
  "headers" must be passed on after creating the session
* Fix error in computing the API URL from the server's base URL

Change-Id: Id80956e7026db5ed7192fbc4c2dff3afbbd3c9a8
2020-06-17 07:44:44 +00:00
Fabien Boucher 31b83dd2e8 Remove ununecessary shebangs
The commands are managed as entry-points so remove
ununecessary shebangs. Also lib/re2util.py does not
require a shebang as well.

zuul_return.py does not have a main and is not supposed
to be run directly.

Ununecessary shebangs for non executable script causes
rpmlint issues.

Change-Id: I6015daaa0fe35b6935fcbffca1907c01c9a26134
2020-05-18 19:10:33 +02:00
Antoine Musso 4aca407bd6 Add client_id to RPC client
A Gearman client can set a client id which is then used on the server
side to identify the connection. Lack of a client_id makes it harder to
follow the flow when looking at logs:

 gear.Connection.b'unknown'  INFO Connected to 127.0.0.1 port 4730
 gear.Server Accepted connection
  <gear.ServerConnection ... name: None ...>
                                   ^^^^

In RPCClient, introduce a client_id argument which is passed to
gear.Client().
Update callers to set a meaningful client_id.

Change-Id: Idbd63f15b0cde3d77fe969c7650f4eb18aec1ef6
2020-01-28 10:16:19 +01:00
Matthieu Huin b599c7249d authentication config: add optional max_validity_time, skew
The Zuul admin can configure authenticators with an optional
"max_validity_time" field, which is the maximum age in seconds
for a valid authentication token. By default there is no
maximum age set for tokens, except the one deduced from
the token's "exp" claim.
If "max_validity" is set, tokens without an "iat" claim will
be rejected.

This is meant as an extra security to avoid accidentally issueing
very long lived tokens through the CLI.

The "skew" field can be used to mitigate clocks discrepancies
between Zuul and a JWT emitter.

Change-Id: I9351ca016b60050b5f3b3950b840d5f719e919ce
2019-12-10 16:39:29 +01:00
Matthieu Huin a0015014c9
enqueue: make trigger deprecated
The patchset or ref, pipeline and project should be enough to trigger an
enqueue. The trigger argument is not validated or used anymore when
enqueueing via RPC.

Change-Id: I9166e6d44291070f01baca9238f04feedcee7f5b
2019-12-10 07:33:30 +01:00
David Shrewsbury e16810d178 Remove --id option for autohold_delete/autohold_info
Since the hold request ID is the only argument, simplify things by
removing the need for using the --id option.

Change-Id: I0c176dca3cd7e3007348b5aafae812ebb55556d2
2019-10-24 15:31:45 -04:00
Zuul 1b3b79be33 Merge "Auto-delete expired autohold requests" 2019-09-20 20:59:23 +00:00
Zuul e6ca06db0e Merge "Add scheduler config options for hold expiration" 2019-09-20 20:17:22 +00:00
Zuul 84f5e22385 Merge "Record held node IDs with autohold request" 2019-09-20 19:48:20 +00:00
Zuul b865ec81bd Merge "Add autohold-info CLI command" 2019-09-20 19:19:07 +00:00
Zuul 853c2e0834 Merge "Store autohold requests in zookeeper" 2019-09-20 15:58:49 +00:00
David Shrewsbury 9f5743366d Auto-delete expired autohold requests
When a request is created with a node expiration, set a request
expiration for 24 hours after the nodes expire.

Change-Id: I0fbf59eb00d047e5b066d2f7347b77a48f8fb0e7
2019-09-18 10:09:08 -04:00