Currently when a node failure occurs on a job with a semaphore it is
not getting released properly. This is only recoverable by a scheduler
restart.
Change-Id: Ifa463824f4a394e015a6ee11fcd51bee163492f8
Currently when jobs use semaphores they first get and lock the build
nodes and then aquire the semaphore. If there are many jobs waiting
for the semaphore this can block a substantial part of the available
resources. In order to make this safe default to acquire the semaphore
before requesting the nodes.
However in some cases when jobs with a semaphore shall run as fast as
possible in a consecutive manner then it might be preferrable to
accept some waste of resources. In order to support this use case the
job using a semaphore can override this behavior and still acquire the
semaphore after getting the nodes.
Change-Id: Id6f582ec29219d280d05319d1b822c7934437b7a
During problems with zk connectivity jobs can fail locking nodes
[1]. In this case the build doesn't get created and attached to the
queue item. However semaphores are already aquired at this point and
don't get released in this case. Fix this by releasing the semaphore
when hitting this exception.
[1] Trace:
2018-04-05 10:56:56,936 ERROR zuul.Pipeline.example.check: Exception while executing job example-test for change <Change 0x7f65e9dd59e8 14,55692b4a936fff57e33036399927332849a53a92>:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/zuul/manager/__init__.py", line 396, in _executeJobs
self.sched.nodepool.useNodeSet(nodeset)
File "/usr/lib/python3.6/site-packages/zuul/nodepool.py", line 117, in useNodeSet
self.sched.zk.storeNode(node)
File "/usr/lib/python3.6/site-packages/zuul/zk.py", line 213, in storeNode
self.client.set(path, self._dictToStr(node.toDict()))
File "/usr/lib/python3.6/site-packages/kazoo/client.py", line 1242, in set
return self.set_async(path, value, version).get()
File "/usr/lib/python3.6/site-packages/kazoo/handlers/utils.py", line 79, in get
raise self._exception
kazoo.exceptions.NoNodeError
Change-Id: I851876ece318aa047e523c50f4c721417d1af6b7
After upgrading Gerrit to 2.13 our gate stopped working. The reason
for this is that after a successful gate run zuul does something like
'gerrit review --label verified=2 --submit'. The verified label in
Gerrit by default is configured as 'Verified'. The newer version of
gerrit behaves different now. It accepts the +2 vote on verified but
doesn't submit the patch anymore if the casing is not correct. This
forces us to specify the label in the same casing as gerrit
expects. In that case the tolower() in canMerge prevents the patch
from entering the gate.
In order to avoid confusion and be consistent, avoid any case
conversions and use the labels exactly as defined in Gerrit.
Note that this patch requires changes to the pipelines such that the
labels are spelled exactly as defined in Gerrit.
Change-Id: I9713a075e07b268e4f2620c0862c128158283c7c
Create a dedicated config directory for the semaphore tests and
remove them from the single-tenant configuration.
Create a simplified form of commitLayoutUpdate which accepts a
path to a replacement zuul.yaml and commits it to the specified
config repository to aid in reconfiguration tests. The existing
similar methods rely on an entire shadow git repository which
requires additional git filesystem operations in tests.
Change-Id: I0f8e99b6ad262ece5a5649a480e0393872761903