Commit Graph

37 Commits

Author SHA1 Message Date
Jan Kubovy d518e56208 Prepare Zookeeper for scale-out scheduler
This change is a common root for other
Zookeeper related changed regarding
scale-out-scheduler. Zookeeper becoming
a central component requires to increase
"maxClientCnxns".

Since the ZooKeeper class is expected to grow
significantly (ZooKeeper is becoming a central part
of Zuul) a split of the ZooKeeper class (zk.py) into
zk module is done here to avoid the current god-class.

Also the zookeeper log is copied to the "zuul_output_dir".

Change-Id: I714c06052b5e17269a6964892ad53b48cf65db19
Story: 2007192
2021-02-15 14:44:18 +01:00
James E. Blair 93ec3daf47 Add TLS support for ZooKeeper
This adds a script to generate TLS certs for zookeeper.

It also adds new config file options for specifying certs for a
TLS connection, adds a howto document to advise admins on how
to configure ZK for TLS.

It also removes the 'required' flag for the SASL auth parameters,
since they are not actually required.

Include the default openssl.cnf file since some distros modify it
to specify paths that are incompatbile with the zk-ca.sh script.

Change-Id: Icd976cc32dfd9f75f8cfb1c9ad11e08af31723d6
2020-03-18 14:47:37 -07:00
David Shrewsbury 951c405845 Sort autoholds by request ID
Return the hold request IDs returned from the zookeeper API  as
a sorted list so that they will appear in sorted order in the
zuul CLI output.

Change-Id: I3a3d738ac2bebb8b446cb0710bf9f5452c232372
2019-12-09 15:16:43 -05:00
David Shrewsbury fdbabc863c Handle upgrade of autohold held nodes
An earlier change alters the autohold held node IDs from a list
to a dict structure. If someone upgrades to this change, but has
existing autoholds with the old format, we will break them. This
change handles both cases.

Change-Id: Ib6381e459a9694cae61ba3229d48ddfe8f55f392
2019-10-25 08:49:40 -04:00
Tristan Cacqueray e85fb93d1d Store a list of held nodes per held build in hold request
Instead of storing a flat list of nodes per hold request, this
change updates the request nodes attribute to become a list of
dictionary with the build uuid and the held node list.

Change-Id: I9e50e7ccadc58fb80d5e80d9f5aac70eb7501a36
2019-10-24 13:39:16 -04:00
David Shrewsbury 6bbf3609bb Mark nodes as USED when deleting autohold
Marking the nodes as USED will allow nodepool to delete them.
If we are unsuccessful in marking any of the held nodes as used,
we simply log the error and try again at some future point until
all nodes are eventually marked, allowing the hold request to be
deleted.

Change-Id: Idd41c58b5cce0aa9b6cd186fa5c33066012790b8
2019-09-18 10:08:46 -04:00
David Shrewsbury f6b6991af2 Add caching of autohold requests
Change-Id: I94d4a0d2e8630d360ad7c5d07690b6ed33b22f75
2019-09-16 10:46:36 -04:00
David Shrewsbury 716ac1f2e1 Store autohold requests in zookeeper
Storing autohold requests in ZooKeeper, rather than in-memory,
allows us to remember requests across restarts, and is a necessity
for future work to scale out the scheduler.

Future changes to build on this will allow us to store held node
information with the change for easy node identification, and to
delete any held nodes for a request using the zuul CLI.

A new 'zuul autohold-delete' command is added since hold requests
are no longer automatically deleted.

This makes the autohold API:
   zuul autohold: Create a new hold request
   zuul autohold-list: List current hold requests
   zuul autohold-delete: Delete a hold request

Change-Id: I6130175d1dc7d6c8ce8667f9b14ae9377737d280
2019-09-16 08:47:53 -04:00
Simon Westphahl e0321f0daa Ensure correct lexical sorting of node requests
The recently introduced change to request child nodes of paused jobs
with a higher priority causes the priority to be actually lower in
case the pipeline precedence is set to 'high'.

Due to the lexical sorting of node requests in Nodepool, 99 will be
treated as the lowest prio in this case.

This is especially apparent when Nodepool is up against quota.

Change-Id: I094dee4f357c9974b6d9e95fcd70b02115d9de93
2019-03-14 15:39:23 +01:00
Zuul dcfeb3a42b Merge "web: add /{tenant}/nodes route" 2018-12-29 14:39:17 +00:00
Zuul 0fc3485454 Merge "web: add /{tenant}/labels route" 2018-12-29 14:35:28 +00:00
Zuul 9f849e1f99 Merge "Remove nodeid argument from updateNode" 2018-12-01 20:10:18 +00:00
James E. Blair 505c32e4b2
Fix updating relative priority
This code path was untested and had some typos.  Correct them and
ensure the path is tested.

Change-Id: Ib4a283f739b12295f480684b9b93ad8a60abf350
2018-12-01 15:28:05 +01:00
James E. Blair 5b5a161b71 Remove nodeid argument from updateNode
This function expects a node object with the .id attribute populated.

Change-Id: Ic9fcc74a873760f45b23e9af7345d9bf998a41f1
2018-11-30 12:11:47 +00:00
James E. Blair 0b00c4685b
Set relative priority of node requests
Add a relative_priority field to node requests and continuously
adjust it for each queue item based on the contents of queues.

This allows for a more fair distribution of build resources between
different projects.  The first item in a pipeline from a given
project (or, in the case of a dependent pipeline, group of projects)
has equal priority to all other first-items of other projcets in
the same pipeline.  Second items have a lower priority, etc.

Depends-On: https://review.openstack.org/620954
Change-Id: Id3799aeb2cec6d96a662bfa394a538050f7ea947
2018-11-30 12:50:34 +01:00
David Shrewsbury 95784a6630 Handle missing node during hold check
A node can get removed from underneath us when we iterate through
all of them. Handle that better.

Change-Id: I28eb479eba4d59ef15e53c99f2abb58fb669ad39
2018-10-16 09:31:47 -04:00
Tristan Cacqueray 8436ed38b7 web: add /{tenant}/nodes route
This change adds a /nodes route to return the nodes status.

Change-Id: I81b495d29659f9a130c75f4c3f32cfd0f47ef15f
2018-09-12 11:09:11 -06:00
Tristan Cacqueray 22efec8423 web: add /{tenant}/labels route
This change adds a zk client to the ZuulWeb service to implement a
labels route for returning the available labels.

Change-Id: Iecf2948e4895829da3f090734c93fcd0c8a497b5
2018-09-12 11:09:11 -06:00
Fabien Boucher bc20de95e5 Remove unecessary shebang and exec bit
Change-Id: I54de68b11f055a9269ca5efb8a57f81d57f9d55f
2018-07-26 07:12:24 +00:00
Tristan Cacqueray f306a690a1 zk: retry initial zookeeper connection attempts
This change mitigates connection issue when zuul is starting before
zookeeper is ready, for example when the control plane is restarted.

Change-Id: Ia0b077b5e0429cfd6b4020254dc1b472ef2ecfbf
2018-07-01 01:00:44 +00:00
David Shrewsbury f21bb2893a Better exception handling during autohold
Our autohold can linger longer than we requested if we get an
exception during node iteration. Let's handle that particular
exception better, and also handle ANY exceptions that may bubble
up by deleting the autohold if that occurs.

Change-Id: I9d64995406e86cbad7536b85a3206fda7faac253
2017-10-13 11:26:45 -04:00
Zuul b68d660844 Merge "Update node requests after nodes" into feature/zuulv3 2017-10-06 22:04:26 +00:00
James E. Blair 7a8f48df23 Update node requests after nodes
In case we lose the connection before fully updating all the nodes
associated with a node request, set the request attributes last.

Change-Id: Ib5099005ffb2990940672ce34623bc35b8903739
2017-10-06 14:31:21 -07:00
David Shrewsbury 94e95886e2 Handle double node locking snafu
If our queue processing is slow, and we lose the ZooKeeper session
after a node request has been fulfilled, but before we actually accept
the nodes, we need to be aware of this and not try to use the nodes
given to us.

Also pass the request ID along in the event queue since the actual
request object can have its ID changed out from underneath us on a
resubmit. Compare this ID with the request ID, and also verify that
the actual request still exists.

Furthermore, when we lock nodes, let's make sure that they are actually
allocated to the request we are processing.

Change-Id: Id89f6542afcf3f5d4a0b392b5cb8cf21ec3f6865
2017-10-05 17:26:59 -04:00
Monty Taylor 6dc5bc146b
Map pipeline precedence to nodepool node priority
We set precedence in our pipeline configs but we do not pass it through
to the nodepool NodeRequest priority, which means that check can starve
gate.

Change-Id: Id3fa6f9ad6bdf23bf3af43c48289c4b918ea04f1
2017-09-29 18:10:06 -05:00
James E. Blair e2f0a87ad8 Add ZK session timeout option
Change-Id: If804c18f2103baa12c9c3bd0344a166fac1ea749
2017-09-28 10:35:12 -07:00
David Shrewsbury 955799b377 Remove duplicated states from zk.py
Zuul defines the node states within model.py. The states in zk.py
are a holdover from copied code from the nodepool code base and
are not used. Let's just have a single place for the states.

Change-Id: I3f0cd40a04040baffbc04895049e7472ad30bf4b
2017-07-31 19:43:15 +00:00
David Shrewsbury ffab07a844 Implement autohold
Adds the 'autohold' client option, the scheduler implementation
of it, and a unit test for it.

The autohold is automatically removed from the in-memory data
structure once we've reached the number of requested runs of
the job.

Story: 2000905
Change-Id: Ieac0b5fee6801313fa23cce69520eb348735ad99
2017-07-31 14:53:43 -04:00
Clint Byrum f322fe2eee Encoding changes in tests for py3
In many cases we do need to be explicit about bytes vs. strings for
python 3 compatibility.

Change-Id: I9cbc5c73004d03f711f8a6e5752a0865ae55fb9f
2017-05-19 06:45:31 -07:00
Clint Byrum 1d0c7d1941 view changes for py3
Python3 changes several methods on dict's to views, requiring us to
convert them to lists.

Change-Id: Ib08c03564f198d1c08142c44bf9baac6a73816dd
2017-05-19 06:45:31 -07:00
Paul Belanger 9790c6add2 Remove ZooKeeperConnectionConfig class
This code was only used in our nodepool integration tests, so remove
it and update our documentation.

Change-Id: I5698321992f58064683a772720e1349742d96d25
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-21 13:26:54 -04:00
James E. Blair 0d5a36e3ff Remove zk host list parsing
Just pass through the zookeeper host list string unparsed since
that what we're going to expect from our config file.

Change-Id: Ife8fd97860ad35c793ef956adbb9d626569f60bf
2017-02-21 11:08:47 -05:00
James E. Blair 20c39d5c5b Remove unused clasess from zk.py
We ended up using classes from model.py instead to better interact
with the rest of Zuul, so remove these.

Change-Id: I4fa4a06b27d9ef6cc7f7878f29a92aafd7ffe9d1
2017-01-05 08:55:02 -08:00
James E. Blair cacdf2b659 Mark nodes as 'in-use' before launching jobs
While we immediately lock a node given to us by nodepool, we delay
setting the node to 'in-use' until we actually request that the job
be launched so that if we end up canceling the job before it is
run, we might return the node unused to nodepool.

Change-Id: I2d2c0f9cdb4c199f2ed309e7b0cfc62e071037fa
2017-01-04 16:09:24 -08:00
James E. Blair a38c28efa3 Lock nodes when nodepool request is fulfilled
This is continuing work on implementing the Zuul<->Nodepool protocol
from the Zuulv3 spec.

Change-Id: Ic8477e607fd09b85a37f47cbee7da905c017c534
2017-01-04 16:08:43 -08:00
James E. Blair 15be0e1e11 Re-submit node requests on ZooKeeper disconnect
Change-Id: I689bf812c713fa6f5f37958b7001b0d5fb0a254b
2017-01-04 09:11:35 -08:00
James E. Blair dce6ceac8e Add FakeNodepool test fixture
Add a fake nodepool that immediately successfully fulfills all
requests, but actually uses the Nodepool ZooKeeper API.

Update the Zuul Nodepool facade to use the Nodepool ZooKeeper API.

Change-Id: If7859f0c6531439c3be38cc6ca6b699b3b5eade2
2016-12-21 14:16:51 -08:00