Retry jobs on winrm send_input failed

In some cases winrm tasks fail with the message 'winrm send_input
failed'. This error is unrelated to the jobs and is caused by some
underlying problems with the winrm connection or powershell on the
node. Unfortunately ansible treats this failure as a task failure
rather than like an unreachable host and thus returns with its
standard error code 2 for failed tasks so zuul doesn't catch this and
retry the job.

However this error can easily be spotted by checking the ansible
output for the string 'FATAL ERROR DURING FILE TRANSFER' which is
unique to this failure in ansible. By catching this and returning
RESULT_UNREACHABLE we can automatically retry these jobs.

Change-Id: Iceba265e9783a96b8f50dfcf269468ddee1df810
This commit is contained in:
Tobias Henkel 2019-01-20 10:50:39 +01:00
parent 691b1bc17c
commit 85c70008d6
No known key found for this signature in database
GPG Key ID: 03750DEC158E5FA2
1 changed files with 9 additions and 0 deletions

View File

@ -1947,6 +1947,15 @@ class AnsibleJob(object):
job_output.write("{now} | {line}\n".format(
now=datetime.datetime.now(),
line=line.decode('utf-8').rstrip()))
elif ret == 2:
# This is a workaround to detect winrm connection failures that are
# not detected by ansible. These can be detected if the string
# 'FATAL ERROR DURING FILE TRANSFER' is in the ansible output.
# In this case we should treat the host as unreachable and retry
# the job.
for line in syntax_buffer:
if b'FATAL ERROR DURING FILE TRANSFER' in line:
return self.RESULT_UNREACHABLE, None
return (self.RESULT_NORMAL, ret)