Properly handle TaskManagerStopped exception

When we lose a task manager, we won't be able to create an instances.
Rather then continue to look until retries limit is reached, we raise an
exception early.

In the case of below, the retry limit is very high and results in logs
being spammed with the following:

  2019-02-12 16:41:15,628 ERROR nodepool.NodeLauncher-0001616109: Request 200-0000443406: Launch attempt 39047/999999999 failed for node 0001616109:
  Traceback (most recent call last):
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/nodepool/driver/openstack/handler.py", line 241, in launch
      self._launchNode()
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/nodepool/driver/openstack/handler.py", line 142, in _launchNode
      instance_properties=self.label.instance_properties)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/nodepool/driver/openstack/provider.py", line 340, in createServer
      return self._client.create_server(wait=False, **create_args)
    File "<decorator-gen-32>", line 2, in create_server
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/cloud/_utils.py", line 377, in func_wrapper
      return func(*args, **kwargs)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/cloud/openstackcloud.py", line 7020, in create_server
      self.compute.post(endpoint, json=server_json))
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/keystoneauth1/adapter.py", line 357, in post
      return self.request(url, 'POST', **kwargs)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/_adapter.py", line 154, in request
      **kwargs)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/task_manager.py", line 219, in submit_function
      return self.submit_task(task)
    File "/opt/rh/rh-python35/root/usr/lib/python3.5/site-packages/openstack/task_manager.py", line 185, in submit_task
      name=self.name))
  openstack.exceptions.TaskManagerStopped: TaskManager rdo-cloud-tripleo is no longer running

Change-Id: I5f907d19ec1e637defe90eb944f4e5bd759e8a74
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
This commit is contained in:
Paul Belanger 2019-02-12 12:25:54 -05:00
parent 9c7843d28a
commit e8ac13027e
1 changed files with 4 additions and 0 deletions

View File

@ -241,6 +241,10 @@ class OpenStackNodeLauncher(NodeLauncher):
try:
self._launchNode()
break
except openstack.exceptions.TaskManagerStopped:
# If we lost our TaskManager session, we won't be able to
# launch an instance, so there's no need to continue.
raise
except kze.SessionExpiredError:
# If we lost our ZooKeeper session, we've lost our node lock
# so there's no need to continue.