Commit Graph

47 Commits

Author SHA1 Message Date
James E. Blair 9c496e2939 Redact k8s connection info in node list
The node list (web and cli) displays the connection port for the
node, but the k8s drivers use that to send service account
credential info to zuul.

To avoid exposing this to users if operators have chosen to make
the nodepool-launcher webserver accessible, redact the connection
port if it is not an integer.

This also affects the command-line nodepool-list in the same way.

Change-Id: I7a309f95417d47612e40d983b3a2ec6ee4d0183a
2023-12-01 10:30:03 -08:00
James E. Blair 4ef3ebade8 Update references of build "number" to "id"
This follows the previous change and is intended to have little
or no behavior changes (only a few unit tests are updated to use
different placeholder values).  It updates all textual references
of build numbers to build ids to better reflect that they are
UUIDs instead of integers.

Change-Id: I04b5eec732918f5b9b712f8caab2ea4ec90e9a9f
2023-08-02 11:18:15 -07:00
James E. Blair 084c6d56c2 Add a cache for image lists
For sites with large numbers of image uploads, performing a web API
request of /image-list can be time-consuming.  To improve response
time, maintain a TreeCache of image uploads and use that when listing
images.

The "image-list" CLI is unable to use this cache, and is already as
fast as it can be given the number of ZK requests it needs to issue.

The same is true for "dib-image-list", and this change adds a cache
for that as well.

Finally, since we otherwise would have three or four nearly identical
cache implementations, the TreeCache system is refactored into a base
class that can be used for all the caches.  It has some hook points
to deal with the very slight behavior differences.

Change-Id: Ibff0a9016936b461eccb1b48dcf42f5ad8d8434e
2023-01-31 16:43:43 -08:00
Dr. Jens Harbott abe6ba9759 Add username to detailed node list output
With the static driver it is possible to have multiple static nodes
defined that only differ by their username. In order to be able to
distinguish them, include the username in the output of the
"nodepool list --detail" output.

Signed-off-by: Dr. Jens Harbott <harbott@osism.tech>
Change-Id: I702ccbf412731ef3400eb722386629356a244334
2022-11-14 10:24:50 +01:00
Zuul 2455b4de12 Merge "Include user/driver data in node detail list" 2022-08-05 23:47:17 +00:00
James E. Blair cf5f63bd6f Speed up node listing
This speeds up node listing in the CLI ("nodepool list") and the
webapp ("/node-list") significantly.

The main culprit is that filling in the "locked" field is expensive
because we attempt to lock each node.  To make that faster, we
now just query the lock contenders to determine whether it is locked
(if there are contenders, it's locked).

Further, in the webapp, we can use the cache more aggressively.  First,
we update the cache listener to watch lock contenders and cache those
values on our Node objects.  That means the webapp doesn't even need
to use the optmization above.  Further, we can have the webapp use
cached node ids, at which point it doesn't need to make any ZK queries
at all.

With a local setup of 6000 nodes and a localhost ZK connection (real
world times will be much higher due to network delays), this
takes the web server node list from 3 seconds to 0.009 seconds.

The CLI node list improves from 2.1 seconds to 0.8 seconds (excluding
startup time).

Change-Id: Id857556865b6ad75b9ec404bd7ef0c45e2a527bd
2022-07-12 14:05:48 -07:00
James E. Blair eb9121a733 Include user/driver data in node detail list
This lets operators see the user_data and driver_data in the node
list when passing --detail.  That can be useful for identifying
issues with the metastatic driver, or any other debugging which
would benefit from seeing that data.

Change-Id: I2f36ce98a183b7a8e289376f2228b6370900a057
2022-06-30 15:25:25 -07:00
James E. Blair 138b68a5a7 Convert dib-request-list to image-status command
This augments the dib-request list (which shows what images have
manual build requests) with information about whether the image
is paused.  The resulting command is renamed to "image-status".

Change-Id: If75a8757b4ec93563e47bfdf0a239a9c21660c45
2022-06-21 14:12:22 -07:00
Simon Westphahl d6e8bd72df Expose image build requests in web UI and cli
Image build requests can now be retrieved through the /dib-request-list
endpoint or via the dib-request-list sub-command. The list will show the
age of the request and if it is still pending or if there is already a
build in progress.

Change-Id: If73d6c9fcd5bd94318f389771248604a7f51c449
2022-06-21 13:32:35 -07:00
James E. Blair 1323d0b556 Update some variable names
Now that the component we registered is a "pool" change the call
sites to use "launcher_pools" instead of "launchers".  This may
reduce some ambiguity.

(s/launcher/pool/ might still be ambiguous since it may not be clear
whethere we're talking about our own pools or other pools; thus the
choice of "launcher_pool" for the variable name.)

Also, remove a redundant test assertion.

Change-Id: I865883cdb115bf72a3bd034d9290f60666d64b66
2022-05-23 13:30:50 -07:00
James E. Blair a612aa603c Add the component registry from Zuul
This uses a cache and lets us update metadata about components
and act on changes quickly (as compared to the current launcher
registry which doesn't have provision for live updates).

This removes the launcher registry, so operators should take care
to update all launchers within a short period of time since the
functionality to yield to a specific provider depends on it.

Change-Id: I6409db0edf022d711f4e825e2b3eb487e7a79922
2022-05-23 07:41:27 -07:00
Clark Boylan dfcf770260 Ignore unparsable/empty image build ZNode data
As with the parent it seems that we can end up with empty image build
records as well as empty image upload records. Handle the build case as
well so that we can still list image builds properly when in this state.
This logs the problem as well so that you can debug it further.

Change-Id: Ic5521f5e2d2b65db94c2a5794f474913631a7a1b
2020-10-06 12:44:18 +02:00
Zuul 0bf8712c80 Merge "Ignore unparsable/empty image upload ZNode data" 2020-09-03 05:41:59 +00:00
James E. Blair 00d62af3c3 Add image-pause CLI command
This adds a CLI commend to set a flag in ZK for images indicating
that the image should be paused.  This can be used to quickly pause
the building and uploading of one or more images globally.  This
will effectively be boolean OR'd with the pause value for diskimage
builds in the config file.

In particular, this can be used to pause images for short durations,
either because a fix is imminent, or to allow the system to remain
stable while a configuration change goes through the CI/CD workflow.

Change-Id: I21a573dfc337c51f319afe3695d5446b2c91d70b
2020-08-20 15:48:03 -07:00
Simon Westphahl 638c6dbb1f Ignore unparsable/empty image upload ZNode data
Multiple builders running cleanup of uploads in parallel can lead to a
race condition which results in empty image upload nodes in Zookeeper.

The problem is the creation of the image upload number lock inside the
image upload. The lock creation can re-created an already deleted image
upload w/o any data.

Later iterations will fail to parse the empty znode data.

It's not 100% clear how we arrive at this race condition, but we
definitely see multiple builders in the 'guarded' section trying to
delete an upload.

The following exception shows that there is some kind of race:

    2020-06-23 11:00:45,225 ERROR nodepool.builder.CleanupWorker.0:
        Exception cleaning up build <ImageBuild {'state': 'ready',
        'state_time': 1592718469.360345, 'builder':
        'nodepool-builder-2', 'builder_id':
        '8e2c599b-fbfd-4d68-b8df-2fd2efdaa705', 'formats': 'qcow2,raw',
        'username': 'Administrator', 'python_path': 'auto', 'id':
        '0000013010', 'stat': ZnodeStat(czxid=129130944630,
        mzxid=129131204208, ctime=1592712393474, mtime=1592718469363,
        version=1, cversion=1, aversion=0, ephemeralOwner=0,
        dataLength=214, numChildren=1, pzxid=129131204230)}> of image
        example-image in provider <Provider example-provider>:
    Traceback (most recent call last):
      File "/opt/nodepool/lib/python3.6/site-packages/nodepool/builder.py", line 499, in _cleanupImage
        self._cleanupProvider(provider, image, build.id)
      File "/opt/nodepool/lib/python3.6/site-packages/nodepool/builder.py", line 292, in _cleanupProvider
        self._deleteUpload(upload)
      File "/opt/nodepool/lib/python3.6/site-packages/nodepool/builder.py", line 345, in _deleteUpload
        upload.provider_name, upload.id)
      File "/opt/nodepool/lib/python3.6/site-packages/nodepool/zk.py", line 1572, in deleteUpload
        self.client.delete(path, recursive=True)
      File "/opt/nodepool/lib/python3.6/site-packages/kazoo/client.py", line 1341, in delete
        return self._delete_recursive(path)
      File "/opt/nodepool/lib/python3.6/site-packages/kazoo/client.py", line 1374, in _delete_recursive
        self._delete_recursive(child_path)
      File "/opt/nodepool/lib/python3.6/site-packages/kazoo/client.py", line 1376, in _delete_recursive
        self.delete(path)
      File "/opt/nodepool/lib/python3.6/site-packages/kazoo/client.py", line 1343, in delete
        return self.delete_async(path, version).get()
      File "/opt/nodepool/lib/python3.6/site-packages/kazoo/handlers/utils.py", line 75, in get
        raise self._exception
    kazoo.exceptions.NotEmptyError

Order of events that might have led to the above exception:
  - A locks upload
  - A deletes children of upload (including lock of A)
  - B locks upload (possible, because lock was deleted by A)
  - A tries do delete the parent node, which leads to the above exception
  - ...

Related change: https://review.opendev.org/#/c/716566/
Change-Id: I5989c9f5f4c9f1615ae7372250e2e5fdad720256
2020-08-14 07:32:50 +02:00
Simon Westphahl 1b28a77284 Include event id in node request listings
The event id will now be included in the command output of
'request-list' as well as the HTTP endpoint for outstanding node
requests.

Change-Id: I2475ae879fd0321bac020c84935a5a1d52b8b916
2020-01-27 15:27:58 +01:00
Fabien Boucher f57ac1881a
Remove uneeded shebang and exec bit on some files
Having python files with exec bit and shebang defined in
/usr/lib/python-*/site-package/ is not fine in a RPM package.

Instead of carrying a patch in nodepool RPM packaging better
to fix this directly upstream.

Change-Id: I5a01e21243f175d28c67376941149e357cdacd26
2019-12-13 19:30:03 +01:00
James E. Blair 5fee6be810 Add relative priority to request list
Change-Id: Ief6dfc800097aa239d07a50c34bc72dcb328d4c5
2018-11-30 15:05:51 -08:00
Tristan Cacqueray 5745c807c9 status: add pool column to nodepool list output
This helps debug pool quota usage.

Change-Id: Id6cdef978e1ba80a4fea5e5bfa862755fb6d7e40
2018-07-15 23:07:22 +00:00
Ian Wienand 4074b0f0f9 Add label-list webapp endpoint
This is useful for getting the list of all available labels.
Originally implemented in Iafff02d546abb34affa88310f6a97918166cbf47,
this is based on the new info available from
Icfb73fbe3b67321235a78ea7ed9bf4319567eb1a

Co-Authored-By: Tristan Cacqueray <tdecacqu@redhat.com>
Change-Id: I4b43ac0e2ba44516ff289e93dbf553033fc9e130
Depends-On: https://review.openstack.org/548376
2018-03-01 11:14:03 +11:00
Ian Wienand 89790013f3 Consolidate node_list, add generic filter
node_list takes an argument "detail" which adds a rather arbitrary
list of results to the output.  This comes from the command-line,
where we're trying to keep width under a certain length; but doesn't
make as much sense here (especially for json).

For dashboard type applications, replace this with a simple "fields"
parameter which, if set, will only return those fields it sees in the
common text output function.

Note, this purposely doesn't apply to the JSON output, as it expected
client-side filtering is more appropriate there.  We could also add
generic field support to the command-line tools, if considered
worthwhile.

Add some documentation on all the end-points, and add info about these
parameters.

Change-Id: Ifbf1019b77368124961e7aa28dae403cabe50de1
2018-03-01 11:14:03 +11:00
Ian Wienand 2d60240fec Consolidate status results
All the _list functions return the same thing; save the results and
use a common output function to generate the json or text output.

Change-Id: I9cb44b09de2cb948e7381ef10302b21040433a2c
2018-03-01 11:13:55 +11:00
Ian Wienand cc95d679bd Revert "Add /label-list to the webapp"
This reverts commit b737889250.

As discussed in [1] the label list is intended to provide a list of
available labels so users don't have to dig into configuration files
to determine what is available to run jobs on.  As written it possibly
misses nodes without a min-ready or in-use nodes, so an alternative
implementation is probably called for.  This is also currently returns
values in a plain list, rather than a list of dicts, which prevents
some de-deuplication refactoring.

Thus remove it for now, pending some more work.

[1] http://lists.zuul-ci.org/pipermail/zuul-discuss/2018-February/000039.html

Change-Id: Ic88da24dd5d140697d4af8e6f5ee0936c44f59c5
2018-02-28 11:31:13 +11:00
mhuin a4f87b6270 Refactor status functions, add web endpoints, allow params
This patch refactors status functions so that instead of having one
function per output format, the output format is simply a parameter.

New status endpoints are added:

* image-list.json
* request-list(.json)

The endpoint node-list(.json) accepts a node_id parameter for filtering.

Change-Id: I44f386754d343239fc047701cc3cab2dc8b1572a
2018-02-28 10:13:27 +11:00
Tristan Cacqueray b737889250 Add /label-list to the webapp
This change adds /label-list and /label-list.json that returns available
labels over http.

Change-Id: Iafff02d546abb34affa88310f6a97918166cbf47
2018-02-22 05:35:29 +00:00
Tristan Cacqueray f2ad9ee0bc Add /node-list to the webapp
This change adds /node-list and /node-list.json to enable listing
current node list over http.

Change-Id: I256ebbba8692e78827a9104b08b433aafa26c71c
2018-02-06 13:21:08 +01:00
ChangBo Guo(gcb) 13ac2512ba Refactor method node_list in status.py
Extract same logics into one internal function, and call it
when get one node's values or iterate some nodes' value.

Change-Id: Idd536931a40f2369d3bfc4c623e94844c95f78a0
2018-02-02 17:41:48 +08:00
Tobias Henkel 7d79770840 Do pep8 housekeeping according to zuul rules
The pep8 rules used in nodepool are somewhat broken. In preparation to
use the pep8 ruleset from zuul we need to fix the findings upfront.

Change-Id: I9fb2a80db7671c590cdb8effbd1a1102aaa3aff8
2018-01-17 02:17:45 +00:00
Tobias Henkel d81c35249a Rename ssh_port to connection_port
With the upcoming windows support we don't have ssh as the only
connection type. As a preparation for this generalize ssh_port to
connection_port.

Change-Id: Ic1939054f0604411e0122db8dbd7e9886ceaa974
2017-12-06 20:52:34 +01:00
Clark Boylan 185f5c9914 Reorg non detailed instance listing columns
"AZ" is an almost always unused piece of info so move it from non
detailed info to detailed info when listing instances. Public IP addrs
are used all the time though so move from from detailed to non detailed
listings.

This should give better balance to the default usage of the list command
for day to day ops.

Change-Id: I5bb6255ea76b78b8427d1f145d3f8e077f11013d
2017-11-27 07:59:34 +01:00
David Shrewsbury d125538023 Fix node list output
Missed a place for the new allocated_to field output.

Change-Id: I9ff2166cb49542b71e9dbe9fb6b5f50498a6bfab
2017-09-11 11:24:58 -06:00
David Shrewsbury d399b3ac4f Add node.allocated_to to node detail output
Change-Id: Ia17a8821365f3a7bae6f874c0b46e2016b44595a
2017-09-11 10:15:16 -06:00
Tobias Henkel c440df4189 Fix detail headers order for nodepool list
When running 'nodepool list --detail' the ordering of the headings is
wrong. The 'Hold Job' and 'Launcher' headings need to be swapped.

Change-Id: I195392744d5b596c2d51335af9eba68e0a1ad671
2017-08-08 08:25:09 +02:00
David Shrewsbury 5837004e56 Add hold job to nodepool list output
This is set by Zuul when a node is held. The comment
field is also filled in by Zuul, but we already output
that.

Change-Id: I7af5c6c12d94ba9be15daa13dca138ee3785a972
2017-08-02 11:24:34 -04:00
David Shrewsbury 7383896833 Add --detail option to nodepool list command
The output of the 'list' command has gotten quite large. Simplify
the default output, and output the complete info when the --detail
option is given.

Also adds/modifies the list_nodes tests to assert that the column
output changes when the new option is used.

Change-Id: I60f849225c8c7fce1c524f132e54da58b25ae752
2017-08-02 09:42:46 -04:00
Tristan Cacqueray a5077fc344 Add support for custom ssh port
This change adds 'ssh_port' to the Node class.

Change-Id: I5e6d3969ae04f90abd1a3fd908c160cda4791bad
2017-07-06 06:34:28 +00:00
David Shrewsbury 6595344a48 Add request-list nodepool command
Command to list all node requests in ZooKeeper.

Change-Id: I2437aa69b3aacf8a0c71466af55aab539271789c
2017-03-14 12:09:54 -04:00
David Shrewsbury 4e6dedfd0f Add lock state to node listing
Also make waitfornodes in the devstack script to wait for unlocked
nodes.

Change-Id: I76f35d46f66f9e3beafe9f5f5a93049256a8df44
2017-02-15 19:47:54 -05:00
David Shrewsbury 59fa7781c7 Implement node cleanup
To prove it works, this also reworks the 'delete' command to use
ZooKeeper.

Summary:
  - Re-enables the 'delete' command
  - Adds waitForNodeDeletion() for testing.
  - Re-enables tests:
     - test_node_delete_success
     - test_delete
     - test_delete_now
  - Fixes a bug in Node.__eq__ causing it to fail.

Change-Id: I539bca7d2d3d3b90f8e04e9098065e8b6797b194
2017-02-15 19:47:32 -05:00
David Shrewsbury 672612deaa Update nodepool 'list' command
Enables the nodepool 'list' command to speak ZooKeeper.
Re-enables the test_node_list test as well. Needed to
individually skip failing tests in test_commands.py.

Adds 'comment' attribute to the Node model since this is
output by the 'list' command.

Update waitForNodes() to use zookeeper syntax.

Change-Id: I61a92470054985c974f3c20d5be358b399925795
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-02-11 11:10:31 -05:00
Ian Wienand 6c4ba1493d Add dib-image-list.json webapp target
I have monitoring scripts that push the build status to me.  This
became a little harder with the split to multiple builders, as the
logs for each build are kept on the builder.

Rather than parsing tables, add a .json target to the dib-image-list
so my script can find the most recent build, see the builder and
retrieve the latest logs from there.

Future work is to do the image-list and make a nicer status page, if
this approach is OK.

Change-Id: I3410a4e4efd649732747b0502d66d212f50fa1bb
2016-12-22 14:35:51 +11:00
David Shrewsbury d48053b91c Re-enable test: test_image_delete_invalid
Change-Id: I0774bf8886cf9c6cefc98971cfc200000fa8fe2d
2016-11-21 11:16:09 -05:00
David Shrewsbury ed62ff9909 Use image name in DIB image IDs
DIB image IDs used to be unique across all images. This is no longer
the case as each image uses sequence numbers within each image build.
DIB files on disk are named with <image-name>-<seq> format for the
base name. Change the dib-image-list output for the ID field to be
the combined image name/ID since that uniquely identfies DIB builds.

Change-Id: I78906272126ac33f504d905f056247cd17af78e6
2016-11-15 17:47:40 +00:00
David Shrewsbury ad3e22a622 Transition ZK API from dict to object model
The methods to get and store builds and uploads will now take and/or
return objects of ImageBuild or ImageUpload types.

ZooKeeper API methods that used to return tuples (to include a build
or upload ID along with the corresponding data) are now simplified to
return a list of objects which will contain the ID.

Since our object model validates state values, we no longer need to
test for empty state values (which are not allowed), so two tests
accounting for that are removed.

Story: 2000767

Change-Id: I4e97f8d0d1e3095a6c3e988a8266805a82a0b520
2016-11-10 12:51:50 -08:00
James E. Blair 551d86a887 Update 'image-list' cli to ZK
This updates image-list to use ZK instead of the db.

It also adds the 'external_name' attribute to image uploads in
zookeeper.  This displays the image name as it would appear in
a glance image list to the user.

Change-Id: I41b2b80243bec0e5152494e2744106a874cf6804
2016-10-07 09:46:14 -07:00
James E. Blair f9c2bb92c3 Update 'nodepool dib-image-list' command to ZK
This updates the nodepool dib-image-list command to use
Zookeeper.

It adds a couple of methods to the ZK facade to assist.

Also, log kazoo at WARNING level in the CLI (because
connect/disconnect are logged at INFO, and are unecessary for a
normal CLI invocation).

Change-Id: I0c7fd3a96e00ae853638f982df5c8e0b036cca7c
2016-10-07 09:46:09 -07:00
James E. Blair 9aacffe24a Add a simple status webapp
Change-Id: I8f3cb28d9c5136b1da54637d5fec82baf2348a24
2016-07-14 15:10:38 -07:00