As a first step towards supporting multiple ansible versions we need
tooling to manage ansible installations. This moves the installation
of ansible from the requirements.txt into zuul. This is called as a
setup hook to install the ansible versions into
<prefix>/lib/zuul/ansible. Further this tooling abstracts knowledge
that the executor must know in order to actually run the correct
version of ansible.
The actual usage of multiple ansible versions will be done in
follow-ups.
For better maintainability the ansible plugins live in
zuul/ansible/base where plugins can be kept in different versions if
necessary. For each supported ansible version there is a specific
folder that symlinks the according plugins.
Change-Id: I5ce1385245c76818777aa34230786a9dbaf723e5
Depends-On: https://review.openstack.org/623927
In the zuul_stream callback we check for 'MODULE FAILURE' and then log
stdout, exception or stderr whichever we find first. However in some
cases stdout exists but is empty. In this case we don't log the error
reason and the user needs to look into the json log which contains the
correct information. This can be easily fixed by checking if stdout is
empty.
Change-Id: I8019d40e2e99310eaeab03f652368dac66e1ee17
This change adds executor support for generic build resources.
Using the connection_port, a kube config file is generated to enable ansible
tasks to use the resource. Context and pod name is also exported in a
zuul.resources variable to enable job direct access to the build resource.
Currently this change supports 'kubectl' resource for containers that behave
like a machine, and 'project'/'namespace' resource for native containers.
Change-Id: Icdb9e800177dc770c58f65b02456a6b904be0666
Currently we don't have any stdout logs with windows at all in the
buildlog but only in the json log. This is a little bit
inconvenient. As live streaming doesn't work at the moment for windows
just don't live stream win_shell and win_command tasks but treat them
similar to localhost tasks and loop items. With this we at least get
after the fact stdout logs of these tasks.
Change-Id: Ieda6cb51dd6af0d5fb4d0a1f6f7d60d66c9fd7cb
Currently we do live streaming of command and shell tasks in
loops. However this is severely broken as we only get the output of
the first task that ran. This is hard to solve as a per task streaming
in loops is not possible because of missing start events. Further
streaming several tasks at once is difficult too because we don't know
how many tasks we need to stream upfront.
So the intermediate solution until the whole log streaming is reworked
is to just not live stream tasks in loops.
While we're at it also fix a broken log statement and remove duplicate
exit code prints of action tasks.
Change-Id: Ic1358d5b9f939549ffdd5d770fe4eafc047be6f1
There is an extra unexpected argument when streaming the timeout
exception if we timeout waiting for the logger.
Change-Id: I7186a77267b811635b74b28f7619bdc976573bd4
After reworking the injection of zuul_log_id this is no longer
necessary.
This reverts commit dec1900178.
This reverts commit d6f1fd13fa.
Change-Id: I2c491e0f643638ef57ef86f505974fb012939569
Currently when delegating multiple hosts to a single host we don't
handle that case correctly during streaming. In this case we only
stream one log instead of all logs. Fix that by stream the log for
each inventory host.
Change-Id: I2d488df17b20899324030c90075f2fa6e7cd1256
The log streaming callback is not being called in the same way
in Ansible 2.5 as it was in 2.3. In particular, in some cases
different Task objects are used for different hosts. This,
combined with the fact that the callback is only called once for
a given task means that in these cases we are unable to supply
the zuul_log_id to the Task object for the second host on a task.
This can be resolved by injecting the zuul_log_id within the command
action plugin based on the task uuid directly.
Change-Id: I7ff35263c52d93aeabe915532230964994c30850
The log streaming callback is not being called in the same way
in Ansible 2.5 as it was in 2.3. In particular, in some cases
different Task objects are used for different hosts. This,
combined with the fact that the callback is only called once for
a given task means that in these cases we are unable to supply
the zuul_log_id to the Task object for the second host on a task.
To correct this, a local copy of the linear strategy plugin is
added, with the change that for every host-task, it calls either
the normal on_task_start callback, or a new zuul_task_start
callback. This ensures that we are able to set up log streaming
on every host-task.
We plan to move to a different system for establishing log streaming
soon so that we don't have to keep carrying this patched plugin.
Story: 2002528
Task: 22067
Change-Id: Ifd17d799fc28174df5194a6cd09e0e25f3ea75ac
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw-carit.de>
Co-Authored-By: Clark Boylan <clark.boylan@gmail.com>
Currently we can leak secrets if we encounter unreachable nodes
combined with a task using with_items and no_log. In this case the
item variables are written to both the job-output.json and
job-output.txt. Upstream Ansible has the same issue [1].
The text log can be fixed by defining the v2_runner_on_unreachable
callback the same as v2_runner_on_failed.
The json log can be fixed the same way as the upstream Ansible issue.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=1588855
Change-Id: Ie5dd2a6b11e8e276da65fe470f364107f3dd07ef
When using the free play strategy the command and shell tasks fail
logging due to an undefined variable. Fix that and add tests for it.
Change-Id: Ib869b12c54201b59f3de852b502ff62183db7266
With the update to Ansible 2.5 command and shell tasks used in
handlers are broken and fail with [1]. The reason for this is that the
callback v2_playbook_on_task_start is not called anymore for
handlers. Instead the callback v2_playbook_on_handler_task_start is
called for them. This leads to a missing zuul_log_id in the handler
task and trying to log to /tmp/console-None.log. In case this file was
already created by a handler using sudo it may not be accessible which
leads to the exception.
This can be fixed by also defining v2_playbook_on_handler_task_start
in zuul_stream.
Also add a validation of zuul_log_id in the command module. This
should make it easier to spot such errors next time.
[1] Trace
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 185, in follow
with Console(log_uuid) as console:
File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 162, in __enter__
self.logfile = open(self.logfile_name, 'ab', buffering=0)
IOError: [Errno 13] Permission denied: '/tmp/console-None.log'
Change-Id: Ib9dd7fe09e4e7734f7a9ada876e6ce450ebc5038
Story: 2002528
Task: 22067
We suspect there are more fixes required, and the additional check
these changes added may cause more harm in the interim.
This reverts commit 3512a5608c.
This reverts commit d94a0d6f06.
Change-Id: I6e874cd68a1cf6f974e36ab1870573bddf4e647b
When using the free play strategy the command and shell tasks fail
logging due to an undefined variable. Fix that and add tests for it.
Change-Id: I830d64d6089f869dc5a0e3d4fe346e6480884793
With the update to Ansible 2.5 command and shell tasks used in
handlers are broken and fail with [1]. The reason for this is that the
callback v2_playbook_on_task_start is not called anymore for
handlers. Instead the callback v2_playbook_on_handler_task_start is
called for them. This leads to a missing zuul_log_id in the handler
task and trying to log to /tmp/console-None.log. In case this file was
already created by a handler using sudo it may not be accessible which
leads to the exception.
This can be fixed by also defining v2_playbook_on_handler_task_start
in zuul_stream.
Also add a validation of zuul_log_id in the command module. This
should make it easier to spot such errors next time.
[1] Trace
Exception in thread Thread-1:
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
self.run()
File "/usr/lib/python2.7/threading.py", line 754, in run
self.__target(*self.__args, **self.__kwargs)
File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 185, in follow
with Console(log_uuid) as console:
File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 162, in __enter__
self.logfile = open(self.logfile_name, 'ab', buffering=0)
IOError: [Errno 13] Permission denied: '/tmp/console-None.log'
Change-Id: I1978aa8faa488fec87406e1481d455d49731f867
Story: 2002528
Task: 22067
We special case ansible debug tasks with msg parameters to bypass normal
logging. This meant that multiline debug messages were passed through
verbatim and didn't get per line timestamps and log formatting. This is
not consistent with the rest of our logging so we update it to log each
line in a multi line debug message separately.
Change-Id: Iacb5a101b2a2b971f631e9a14caa51899495ea2d
Previously we didn't reraise, continue or return after having timeout
exceptions connecting to the job log streaming executor. This would
result in subsequent exceptions trying to use the socket object that
doesn't exist. Clean this up by returning when handling this error this
will nicely shutdown the streaming thread.
We also log this state to the job's log so that end users and ops are
aware of the problem.
Change-Id: I7fcfe68197d8afdb54f134bdf0e0e8053ce4faf2
We set a 5 second timeout on our log callback connection from the
executor to the job node. This has created problems when there are quiet
periods during logging as there may be no data sent on the stream for
more than 5 seconds. The timeout is valuable when setting up the
initial connection as it quickly catches underlying network problems,
but is less useful when streaming the logs after connecting.
Address this by disabling the timeout once we are connected.
Change-Id: I66ab17c55d54a3ae0e3f4572bed72826315b4b22
If security group or firewall on the target node silently drop
connection to zuul_console daemon - do not wait infinitely and instead
show a message.
After spending days figuring out the reason for zuul_console/stream not
working properly and figuring out that the default group of the node
does not allow 19885 connection from zuul-master (in default and zuul
security group) it became clear, that some extra information is required
back to user. I am open for making it timeout configurable, but 5 sec
should be enough initially.
Change-Id: I4e395b650e83a78d66aa362a420ef0d836c80d35
Our custom command.py Ansible module is updated to match the
version from 2.5, plus our additions.
strip_internal_keys() is moved within Ansible yet again.
Change-Id: Iab951c11b23a24757cf5334b36bc8f7d12e19db0
Depends-On: https://review.openstack.org/567007
We currently rely on detecting debug tasks by checking if there is
only a single variable other than variables starting with
'_ansible'. This doesn't work anymore with ansible 2.4. Detecting them
by the action should work better and more reliably.
Further also strip the 'changed' variable in this case as this has no
meaning for a debug task.
Change-Id: I562329ce68c279965afc32ba5fda6615423de43a
In case ansible crashes while executing a command module we loose the
last line of the log. This is caused because we don't stream the
remaining buffer but the last line we split out of the buffer.
I wasn't able to construct a test case which makes the original code
break thus this change is without a test case.
Change-Id: I10ffe248c756e0a77f39f6e075dd3615c53f1b03
The log stream is read in chunked blocks. When having multi byte
unicode characters in the log stream it can happen that this character
is split into different buffers. This can break the decode step with
an exception [1]. This can be fixed by treating the buffer as binary
and decoding the final lines.
Further we must expect that the data also contains binary data. In
order to cope with this further harden the final decoding by adding
'backslashreplace'. This will replace every occurrence of an
undecodable character by an appropriate escape sequence. This way we
can retain all the information (even binary) without being unable to
decode the stream.
[1]: Log output
Ansible output: b'Exception in thread Thread-10:'
Ansible output: b'Traceback (most recent call last):'
Ansible output: b' File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner'
Ansible output: b' self.run()'
Ansible output: b' File "/usr/lib/python3.5/threading.py", line 862, in run'
Ansible output: b' self._target(*self._args, **self._kwargs)'
Ansible output: b' File "/var/lib/zuul/ansible/zuul/ansible/callback/zuul_stream.py", line 140, in _read_log'
Ansible output: b' more = s.recv(4096).decode("utf-8")'
Ansible output: b"UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 4094-4095: unexpected end of data"
Ansible output: b''
Change-Id: I568ede2a2a4a64fd3a98480cebcbc2e86c54a2cf
We are moving _emit_playbook_banner from zuul_stream to the
executor because the data that is meant to be printed inside
the banner is not available until v2_playbook_on_play_start which
is triggered on every play. This ensures that the banner is only
printed once per playbook, even if the playbook has multiple plays.
Additionally, we're also printing a 'footer' banner at the end of
each playbook in order to provide an explicit and easily parsable
result from the perspective of Ansible.
This will make it easier for people and systems to parse the job
output to find the result of particular playbooks.
For example, in Zuul v2, Elastic-Recheck would rely on the last lines
of the console to reliably tell when the job was completed and what was
it's status.
This new v3 approach allows increased granularity by providing the
ability to filter down to specific playbooks or even on a phase basis.
Change-Id: I7e34206d7968bf128e140468b9a222ecbce3a8f1
Logs may have whitespace prefixing used to properly left align content.
Preserve this and only strip trailing whitespace from the logs. Also to
simplify the process of stripping whitespace consolidate it all to just
before we write the logs rather than earlier in the process.
Change-Id: I6ca957902a9ea4a5f0dfdddeda4bd398a81979d9
It is possible the IP address passed into zuul_stream is IPv6. Switch
to using socket.create_connection as it will resolve both ipv4 / ipv6.
Change-Id: I4a4fc5e9d49b66851cc4e5806670b8ee70fa9012
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We try one verison of this with module_stdout, but in other cases
'exception' is returned. Give it a try too.
Change-Id: I47b2361d92f34972e276221007fce347bc36cec4
It's really noisy and doesn't provide much value given that we have
timestamps already. The elapsed time is useful, so keep that.
Change-Id: I30093affcdc1a02637205f42433698d4d70eab15
Emitting spacer lines at the end of tasks was actually resulting in
spacer lines internal to a task when loops or multiple hosts were in
use.
Instead, emit them at the start of a play and before the stats so that
task output is consistent and grouped.
Change-Id: Ia5bdb214f78e59854985a628fd408c7fc505b953
When iterating over complex objects, printing them out in the task
summary line is very noisy and hard to follow.
Change-Id: I4ae748f167ee03ba97a4bcbf2417dae0271e24a5
If an exception is thrown from ansible, we don't log it properly.
Add a playbook to the functional test of logging that doesn't mask
failures (it lets the tasks report as failed) So that we cover all four
possibilities for ansible module failure exception:
* failed task
* failed task in item
* failed task that reports as success
* failed task in item that reports as success
Change-Id: I9e355350f5ec34251f5030eaaa3f4bb7b4438a6e
We need to pass a working logging config to zuul_stream and ara so that
alembic migrations don't step on pre-playbook output.
Write a logging config using json, then pass its location in env vars so that
zuul_stream and ara can pick it up and pass it to dictConfig.
In support of this, create a LoggingConfig class so that we don't have
to copy key names and logic between executor.server, zuul_stream and
zuul_json. Since we have one, go ahead and use it for the server logging
config too, providing them with a slightly richer default logging
config for folks who don't provide a logging config file of their own.
The log config processing has to go into zuul.ansible because it's
needed in the bubblewrap and we don't have it in the python path
otherwise.
Change-Id: I3d7ac797fd2ee2c53f5fbd79d3ee048be6ca9366