Commit Graph

4 Commits

Author SHA1 Message Date
Tobias Henkel 2dde4404e4
Fix missing semaphore release on node failure
Currently when a node failure occurs on a job with a semaphore it is
not getting released properly. This is only recoverable by a scheduler
restart.

Change-Id: Ifa463824f4a394e015a6ee11fcd51bee163492f8
2019-01-18 08:34:58 +01:00
Tobias Henkel ae887dab58
Improve resource usage with semaphores
Currently when jobs use semaphores they first get and lock the build
nodes and then aquire the semaphore. If there are many jobs waiting
for the semaphore this can block a substantial part of the available
resources. In order to make this safe default to acquire the semaphore
before requesting the nodes.

However in some cases when jobs with a semaphore shall run as fast as
possible in a consecutive manner then it might be preferrable to
accept some waste of resources. In order to support this use case the
job using a semaphore can override this behavior and still acquire the
semaphore after getting the nodes.

Change-Id: Id6f582ec29219d280d05319d1b822c7934437b7a
2018-11-20 15:20:59 +01:00
Tobias Henkel c5e6f5cefe
Fix missing semaphore release on zk error
During problems with zk connectivity jobs can fail locking nodes
[1]. In this case the build doesn't get created and attached to the
queue item. However semaphores are already aquired at this point and
don't get released in this case. Fix this by releasing the semaphore
when hitting this exception.

[1] Trace:
2018-04-05 10:56:56,936 ERROR zuul.Pipeline.example.check: Exception while executing job example-test for change <Change 0x7f65e9dd59e8 14,55692b4a936fff57e33036399927332849a53a92>:
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/zuul/manager/__init__.py", line 396, in _executeJobs
    self.sched.nodepool.useNodeSet(nodeset)
  File "/usr/lib/python3.6/site-packages/zuul/nodepool.py", line 117, in useNodeSet
    self.sched.zk.storeNode(node)
  File "/usr/lib/python3.6/site-packages/zuul/zk.py", line 213, in storeNode
    self.client.set(path, self._dictToStr(node.toDict()))
  File "/usr/lib/python3.6/site-packages/kazoo/client.py", line 1242, in set
    return self.set_async(path, value, version).get()
  File "/usr/lib/python3.6/site-packages/kazoo/handlers/utils.py", line 79, in get
    raise self._exception
kazoo.exceptions.NoNodeError

Change-Id: I851876ece318aa047e523c50f4c721417d1af6b7
2018-04-10 18:49:07 +02:00
James E. Blair 9ea0d0b937 Move semaphore tests to their own class
Create a dedicated config directory for the semaphore tests and
remove them from the single-tenant configuration.

Create a simplified form of commitLayoutUpdate which accepts a
path to a replacement zuul.yaml and commits it to the specified
config repository to aid in reconfiguration tests.  The existing
similar methods rely on an entire shadow git repository which
requires additional git filesystem operations in tests.

Change-Id: I0f8e99b6ad262ece5a5649a480e0393872761903
2017-04-20 10:48:56 -07:00