Commit Graph

33 Commits

Author SHA1 Message Date
Monty Taylor 34aae137fa Remove TaskManager and just use keystoneauth
Support for concurrency and rate limiting has been added to keystoneauth,
which is the library openstacksdk uses to talk to OpenStack. Instead
of managing concurrency in nodepool using the TaskManager and pool of
worker threads, let keystoneauth take over. This also means we no longer
have a hook into the request process, so we defer statsd reporting to
the openstacksdk layer as well.

Change-Id: If21a10c56f43a121d30aa802f2c89d31df97f121
2019-04-02 09:36:13 +00:00
Ian Wienand d1326df6ee Use openstacksdk submit_task
Task manager stats sending was broken with openstacksdk updates and
fixed in I8617ab2895d1544a6902ae5a3d6a97b87bfd2ec9.

What used to happen is that openstacksdk run_task() would call
post_run_task(); however with changes to use a threadpool this became
an async call so was not reflecting the post-run state but just the
post-insert-into-threadpool-queue state.

The referenced change moved the post_run_task() call into
submit_task(), where it is called after the task has been wait()ed
for (and actually run).

Unfortunatley, since this file overrides submit_task, it means that
post_run_task is not being called any more and we are not producing
timing stats.

I believe we can use the openstacksdk task manager's submit_task
directly.  There are two differences; the "raw" argument was removed
from upstream with I7b46e263a76d84573bdfbbece57b1048764ed939 and is no
longer necessary.  The ManagerStoppedException is private to this file
and doesn't appear to have an external dependencies; thus the
openstacksdk's TaskManagerStopped exception will work just the same.

Change-Id: I427e6ae9e4beae6d551427fc12a3cde2c1d03aba
2018-11-09 07:28:38 +11:00
Ian Wienand f18e2e8c76 Normalise more of the API stats calls
We currently have keys like "ComputePostOs-volumes_boot" for providers
using boot-from-volume and other various "os-" keys depending on the
provider.  Normalise all these to regular CamelCase.  A basic
test-case is added.

Additionally add some documentation on the API call stats, pointing
out they reflect internal details so are subject to change.  A release
note is added for the updated stats.

Change-Id: If8398906a5a7ad3d96e985263b0c841e5dcaf7b5
2018-09-28 18:49:30 +10:00
Monty Taylor 67824d8e64
Make statsd key look like the keys from shade
openstacksdk produces task names that look like compute.DELETE.servers,
which are {service_type}.{METHOD}.{url}.{parts} but shade and nodepool
have been reporting to statsd with {Service_type}{Method}{Url}{Parts}.

Translate the task.name produced by sdk into the key nodepool expects to
report as.

Also, emit 'Manager ran task' log lines

The super class emits these, but on the openstacksdk logger. We don't
really want to turn that on at debug level for normal usage, so rather
than calling super(post_run_task) just straight-up override it.

Make sure we log using the transformed name in all cases.

Change-Id: I7f21aefc204366f7621643fdba76b7e70ce4caf5
2018-07-24 13:57:05 -05:00
Artem Goncharov fc1f80b6d1
Replace shade and os-client-config with openstacksdk.
os-client-config is now just a wrapper around openstacksdk. The shade
code has been imported into openstacksdk. To reduce complexity, just use
openstacksdk directly.

openstacksdk's TaskManager has had to grow some features to deal with
SwiftService. Making nodepool's TaskManager a subclass of openstacksdk's
TaskManager ensures that we get the thread pool set up properly.

Change-Id: I3a01eb18ae31cc3b61509984f3817378db832b47
2018-07-14 08:44:03 -05:00
Monty Taylor b8aa756515
Remove Task class
This is not use in the nodepool codebase anymore. The tasks submitted to
the TaskManager all exist inside of shade/sdk now.

Change-Id: Ie914be9f8687f4634996fb880e5c383dab7de6f0
2018-07-14 08:44:03 -05:00
Monty Taylor d8407027cf
Change TaskManager from is-a to has-a thread
As a small stepping stone towards consolidating the
shade/sdk/nodepool TaskManager implementations, change TaskManager from
being a thread to having a thread. This should allow us to make
nodepool's TaskManager inherit from sdk's TaskManager without losing our
minds trying to understand how multi-inheritance works with Thread as
one of the base classes.

Change-Id: Ie89eb9e68e5e3f0a7a55cdd555c683631f305b8e
2018-07-05 13:31:42 -05:00
Monty Taylor 6ae554162c Remove use of six
nodepool is python3 only now, so there is no need to continue to use the
six library. Remove its use and shift to just using python3 versions of
things.

Change-Id: Ic1d1b4b736deba1b8325adf0b446e2121eaa6b20
2018-05-08 09:50:59 +10:00
Tobias Henkel 7d79770840 Do pep8 housekeeping according to zuul rules
The pep8 rules used in nodepool are somewhat broken. In preparation to
use the pep8 ruleset from zuul we need to fix the findings upfront.

Change-Id: I9fb2a80db7671c590cdb8effbd1a1102aaa3aff8
2018-01-17 02:17:45 +00:00
Paul Belanger 927c79c4a2
Use six.reraise for python3
This currently is invalid syntax in python3. Use six so we can support
both python2 and python3.

Change-Id: I7eb664908bdc14551ffac3bb3665cbc5cb84bd84
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-05-18 12:47:41 -04:00
Paul Belanger d892837cad
Fix imports for python3
The syntax for imports has changed for python3, lets use the new
syntax.

Change-Id: Ia985424bf23b44e492f51182179d2e476cdcccbb
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-05-17 15:19:48 -04:00
James E. Blair fe153656df Don't use taskmanagers in builder
ProviderManager is a TaskManager, and TaskManagers are intended
to serialize API requests to a single cloud from multiple threads.
Currently each worker in the builder has its own set of
ProviderManagers.  That means that we are performing cloud API calls
in parallel.  That's probably okay since we perform very few of them,
mostly image uploads and deletes.  And in fact, we probably want
to avoid blocking on image uploads.

However, there is a thread associated with each of these
ProviderManagers, and even though they are idle, in aggregate they
add up to a significant CPU cost.

This makes the use of a TaskManager by a ProviderManager optional
and sets the builder not to use it in order to avoid spawning these
useless threads.

Change-Id: Iaf6498c34a38c384b85d3ab568c43dab0bcdd3d5
2016-12-07 11:58:24 -08:00
Monty Taylor 6a0fd67cbe Log task name more succinctly
This changes the log line from:

  DEBUG nodepool.ProviderManager: Manager bluebox-sjc running task <nodepool.provider_manager.ListFlavorsTask object at 0x7f122c025a50> (queue: 0)
  DEBUG nodepool.ProviderManager: Manager bluebox-sjc ran task <nodepool.provider_manager.ListFlavorsTask object at 0x7f122c025a50> in 0.0167989730835s

to:

  DEBUG nodepool.ProviderManager: Manager bluebox-sjc running task ListFlavorsTask (queue: 0)
  DEBUG nodepool.ProviderManager: Manager bluebox-sjc ran task ListFlavorsTask in 0.0167989730835s

Change-Id: Ice64fe11c4ceb104ae171a8d7590ada0ed450bc6
2016-01-19 14:56:03 -05:00
Ian Wienand d5a2bb2c05 Don't use global statsd
Some history to motivate the change : the original statsd 2.0 era
worked by just doing "import statsd" and it setup a global "statsd"
object based on env variables for you, or set it to None if they
didn't exist.

With statsd 3.0 this changed semantics slightly to provide default
values, so to maintain the status-quo we added the "stats" wrapper
(I4791c9d26f2309f78a556de42af5b9945005aa46) which did the same thing.

However, having a global object set at import time like this isn't the
best idea.  For example, when running unit tests, you may want to set
the statsd host to your fake logger, but if it is already setup at
import time your unit test can't override it when it calls the various
classes.  It creates other ordering problems as we are splitting up
nodepool into more components.

Thus we move to creating a separate client in each object as it is
instantiated.  To maintain the existing behaviour of returning "None"
if the env variables aren't set we keep it in stats.py behind a new
function.  All stats callers are modified to get a client in their
__init__()

See also: Ib84655378bdb7c7c3c66bf6187b462b3be2f908d -- similar changes
for zuul

Change-Id: I6d339a8c631f8508a60e9ef890173780157adefd
2016-01-15 12:38:21 +11:00
Jenkins da34aba534 Merge "Convert timing metrics to milliseconds" 2015-07-04 03:10:12 +00:00
Jenkins c462906905 Merge "Convert to use latest statsd version" 2015-06-23 17:57:33 +00:00
Timothy Chavez 1fc087f83a Convert timing metrics to milliseconds
Timing metrics which are sent to statsd must be in milliseconds[1].

[1] http://statsd.readthedocs.org/en/latest/types.html#timers

Change-Id: Iaaa40b88aa670aaef41c4dab0231c6cb8f5498bc
2015-06-02 11:28:49 -05:00
Clark Boylan e9072a3313 Debug dying task managers
Change-Id: I81684026071b5abd6162e9f117e7fad08da61ea2
2015-05-28 11:37:39 -04:00
Ian Wienand 5662105de4 Convert to use latest statsd version
statsd >= 3.0 changed the way it initializes itself; to start-up from
environment variables you need to import from 'statsd.defaults.env'.
It is also slightly different in that it provides default values; so
we check if the environment variable is set and avoid importing it if
statsd isn't configured.

This moves the statsd object creation into a common module so it can
be shared rather than create multiple clients.

Documentation is also updated to describe how to configure this

Change-Id: I4791c9d26f2309f78a556de42af5b9945005aa46
2015-05-26 15:53:08 +10:00
Monty Taylor 8b87890b97 Reset the client object after proxy timeouts
In situations where the connection between nodepool and the cloud
it manages may be through an evil proxy, the fact that the
requests library in python-novaclient keeps port connections open
means that the evil proxy might timeout the connection during the
period of time that the image is being built.

If the Task object encounters a ProxyError it needs to be propagated
up in the calling thread to the ProviderManager rather than caught
and handled in TaskManager in the main thread. Once caught in the
ProviderManager, create a new client object and retry once. If a second
ProxyError is raised, then it will be passed to the main thread the
same as any other Exception.

Change-Id: I3d69de48be1993c04356d209f622b0a17e6e8d03
2015-04-07 15:24:00 -07:00
Ramy Asselin 2766a5a126 Protect when statsd is not enabled
Fix for this: Id5186a288db3d03efd01d9328d69061dd746e9cb

Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 551, in __bootstrap_inner
    self.run()
  File "/usr/local/lib/python2.7/dist-packages/nodepool/task_manager.py", line 101, in run
    statsd.timing(key, dt)
AttributeError: 'NoneType' object has no attribute 'timing'

Change-Id: I23970d20d91b7a95fb1f3b4a8dd8f99dc6df1897
2015-04-01 16:17:33 -07:00
Monty Taylor 1ffcf0079b Add statsd counters to the TaskManager
Collect times for every task we perform, also grouped by provider.

Change-Id: Id5186a288db3d03efd01d9328d69061dd746e9cb
2015-04-01 17:49:08 -04:00
James E. Blair df5049f2e2 Don't retry forever when a provider is stopped
When we stop a provider, it does not accept any new tasks.  In that
case, we should raise a specific exception so that loops that retry
for a very long time can exit early since the "error" will never
recover.

Change-Id: I7387849ffdb96c487bae4c31ca7cbdd07b8afd8a
2014-09-03 22:15:54 +00:00
James E. Blair 8ccc227b2d Handle task manager shutdown more correctly
If a task manager was stopped with tasks in queue, they would
not be processed.  This could result in deadlocked threads if
one of the long-running threads was waiting on a task while
the main thread shut down a manager.

Change-Id: I3f58c599d472d134984e63b41e9d493be9e9d70b
2014-06-17 08:26:33 -07:00
Christian Berendt 8bbfde2199 Use import from six.moves to import the queue module
The name of the synchronized queue class is queue instead of
Queue in Python3.

Change-Id: I508268561f95c9fed2d39fb45731aab5d9d74111
2014-06-07 21:07:02 +02:00
James E. Blair b6539f9cdd Log task durations
Change-Id: I87c6f870ccb806d3484b38ac123ac89201a854cd
2014-06-03 14:31:15 -07:00
James E. Blair 734435b772 Log task manager queue length
A long queue isn't necessarily bad, but more information could be
helpful.

Change-Id: I34d6bb5c1627af1fbc700458d8950add296bc1bc
2014-06-03 14:31:15 -07:00
James E. Blair e829bcb6b1 Don't accept tasks for stopped managers
In case a provider manager is being replaced due to a configuration
change, raise exceptions for any tasks submitted after it is stopped.

This will probably cause many errors, but they should already be
handled and servers will eventually be deleted.

We can minimize the disruption by making the provider managers more
adaptable to changes, but this stopgap measure should at least fix
the current problem we are observing with threads that get stuck
and never complete.

Change-Id: I3d190881ede30480d7c4ae970a0cb2dd07c3e160
2014-05-22 15:57:49 -07:00
James E. Blair 3a1072cc2c Revert "Provide diagnostics when task rate limiting."
This reverts commit a01956ed70.

This outputs a huge amount of not very useful information.

Change-Id: Ie72a207ced8bb64ae1a01c88ca396f9df633e79c
2014-01-21 11:34:12 -08:00
Robert Collins a01956ed70 Provide diagnostics when task rate limiting.
Rate limiting is necessary for some clouds, but it can be helpful to
be able to see when it is kicking in. We expect it to kick in when
making API requests from public cloud providers with ratelimits (all
our current ones).

Change-Id: Ieca7b49f1719ac4b4a326ae6d3682f1bca08ad54
2014-01-21 14:31:05 +13:00
qianlin b9634b8bc4 Fix the logprint in task_manager.py
The log must been print "nodepool.TaskManager" rather than
"nodepool.ProviderManager".

Change-Id: I71382f9cfad929b8a011310fe9a30b518100a5ff
Closes-bug: #1254734
2013-11-25 22:20:23 +08:00
James E. Blair 04206cd8a0 Add a thread dump signal handler
Based on the one in Zuul, but output thread names.  Also, update
Thread constructors to provide useful names.

Change-Id: Ic710a729a1986b7cf67cd7f7812a396a020c4936
2013-10-11 10:49:53 -07:00
James E. Blair 0ec2246514 Add JenkinsManager
Same idea as a ProviderManager: serialize changes to each jenkins
server (with a rate limit).

Change-Id: I631d50dcfd13c29d2802c192d6e1ac7889256a90
2013-08-22 10:43:33 -07:00