Commit Graph

117 Commits

Author SHA1 Message Date
Tobias Henkel cd9827e664
Manage ansible installations within zuul
As a first step towards supporting multiple ansible versions we need
tooling to manage ansible installations. This moves the installation
of ansible from the requirements.txt into zuul. This is called as a
setup hook to install the ansible versions into
<prefix>/lib/zuul/ansible. Further this tooling abstracts knowledge
that the executor must know in order to actually run the correct
version of ansible.

The actual usage of multiple ansible versions will be done in
follow-ups.

For better maintainability the ansible plugins live in
zuul/ansible/base where plugins can be kept in different versions if
necessary. For each supported ansible version there is a specific
folder that symlinks the according plugins.

Change-Id: I5ce1385245c76818777aa34230786a9dbaf723e5
Depends-On: https://review.openstack.org/623927
2019-03-15 09:09:16 +01:00
Tobias Henkel 6f754dd878
Log exception on module failure with empty stdout
In the zuul_stream callback we check for 'MODULE FAILURE' and then log
stdout, exception or stderr whichever we find first. However in some
cases stdout exists but is empty. In this case we don't log the error
reason and the user needs to look into the json log which contains the
correct information. This can be easily fixed by checking if stdout is
empty.

Change-Id: I8019d40e2e99310eaeab03f652368dac66e1ee17
2019-03-06 21:51:33 +01:00
Zuul 6bb3f9d9ea Merge "dict_object.keys() is not required for *in* operator" 2019-01-07 10:27:39 +00:00
Tristan Cacqueray 3795280382 executor: add support for generic build resource
This change adds executor support for generic build resources.
Using the connection_port, a kube config file is generated to enable ansible
tasks to use the resource. Context and pod name is also exported in a
zuul.resources variable to enable job direct access to the build resource.

Currently this change supports 'kubectl' resource for containers that behave
like a machine, and 'project'/'namespace' resource for native containers.

Change-Id: Icdb9e800177dc770c58f65b02456a6b904be0666
2018-12-11 04:06:49 +00:00
confi-surya e2bcc15c8d dict_object.keys() is not required for *in* operator
cleanup of .keys() from dict_object.keys() *in* operator

Change-Id: Ic79633899db628a7572840392933293426d0c187
2018-12-03 12:04:55 +05:30
Tobias Henkel 265803dd56
Improve windows logging
Currently we don't have any stdout logs with windows at all in the
buildlog but only in the json log. This is a little bit
inconvenient. As live streaming doesn't work at the moment for windows
just don't live stream win_shell and win_command tasks but treat them
similar to localhost tasks and loop items. With this we at least get
after the fact stdout logs of these tasks.

Change-Id: Ieda6cb51dd6af0d5fb4d0a1f6f7d60d66c9fd7cb
2018-11-06 10:50:40 +01:00
Tobias Henkel 0ae7a157e2
Don't do live streaming in loops
Currently we do live streaming of command and shell tasks in
loops. However this is severely broken as we only get the output of
the first task that ran. This is hard to solve as a per task streaming
in loops is not possible because of missing start events. Further
streaming several tasks at once is difficult too because we don't know
how many tasks we need to stream upfront.

So the intermediate solution until the whole log streaming is reworked
is to just not live stream tasks in loops.

While we're at it also fix a broken log statement and remove duplicate
exit code prints of action tasks.

Change-Id: Ic1358d5b9f939549ffdd5d770fe4eafc047be6f1
2018-11-05 22:21:38 +01:00
Zuul 09df5f33f2 Merge "Treat ::1 as localhost in addition to 127.0.0.1" 2018-06-19 09:23:09 +00:00
Tobias Henkel 3d889e1307
Remove extra argument when logging logger timeout
There is an extra unexpected argument when streaming the timeout
exception if we timeout waiting for the logger.

Change-Id: I7186a77267b811635b74b28f7619bdc976573bd4
2018-06-14 15:07:13 +02:00
Tobias Henkel 686f500cac
Revert "Temporarily override Ansible linear strategy"
After reworking the injection of zuul_log_id this is no longer
necessary.

This reverts commit dec1900178.
This reverts commit d6f1fd13fa.

Change-Id: I2c491e0f643638ef57ef86f505974fb012939569
2018-06-14 12:27:59 +02:00
Tobias Henkel cfbd075b12
Fix log streaming for delegated hosts
Currently when delegating multiple hosts to a single host we don't
handle that case correctly during streaming. In this case we only
stream one log instead of all logs. Fix that by stream the log for
each inventory host.

Change-Id: I2d488df17b20899324030c90075f2fa6e7cd1256
2018-06-14 12:27:59 +02:00
Tobias Henkel 72dd0e82c2
Move zuul_log_id injection to command action plugin
The log streaming callback is not being called in the same way
in Ansible 2.5 as it was in 2.3.  In particular, in some cases
different Task objects are used for different hosts.  This,
combined with the fact that the callback is only called once for
a given task means that in these cases we are unable to supply
the zuul_log_id to the Task object for the second host on a task.

This can be resolved by injecting the zuul_log_id within the command
action plugin based on the task uuid directly.

Change-Id: I7ff35263c52d93aeabe915532230964994c30850
2018-06-14 12:27:59 +02:00
Zuul 8b34ffc21c Merge "Temporarily override Ansible linear strategy (2/2)" 2018-06-13 23:21:13 +00:00
Zuul 3ab25403ec Merge "Fix command tasks with free play strategy" 2018-06-13 23:13:37 +00:00
Zuul 16d46796dc Merge "Fix broken command tasks in handlers" 2018-06-13 23:13:34 +00:00
James E. Blair dec1900178 Temporarily override Ansible linear strategy (2/2)
The log streaming callback is not being called in the same way
in Ansible 2.5 as it was in 2.3.  In particular, in some cases
different Task objects are used for different hosts.  This,
combined with the fact that the callback is only called once for
a given task means that in these cases we are unable to supply
the zuul_log_id to the Task object for the second host on a task.

To correct this, a local copy of the linear strategy plugin is
added, with the change that for every host-task, it calls either
the normal on_task_start callback, or a new zuul_task_start
callback.  This ensures that we are able to set up log streaming
on every host-task.

We plan to move to a different system for establishing log streaming
soon so that we don't have to keep carrying this patched plugin.

Story: 2002528
Task: 22067

Change-Id: Ifd17d799fc28174df5194a6cd09e0e25f3ea75ac
Co-Authored-By: Tobias Henkel <tobias.henkel@bmw-carit.de>
Co-Authored-By: Clark Boylan <clark.boylan@gmail.com>
2018-06-13 15:38:36 -07:00
Tobias Henkel ffe7278c08 Fix information disclosure caused by unreachable nodes
Currently we can leak secrets if we encounter unreachable nodes
combined with a task using with_items and no_log. In this case the
item variables are written to both the job-output.json and
job-output.txt. Upstream Ansible has the same issue [1].

The text log can be fixed by defining the v2_runner_on_unreachable
callback the same as v2_runner_on_failed.

The json log can be fixed the same way as the upstream Ansible issue.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1588855

Change-Id: Ie5dd2a6b11e8e276da65fe470f364107f3dd07ef
2018-06-12 16:02:03 +00:00
Tobias Henkel 1310aa592b Fix command tasks with free play strategy
When using the free play strategy the command and shell tasks fail
logging due to an undefined variable. Fix that and add tests for it.

Change-Id: Ib869b12c54201b59f3de852b502ff62183db7266
2018-06-12 08:54:08 -07:00
Tobias Henkel bfcdff9118 Fix broken command tasks in handlers
With the update to Ansible 2.5 command and shell tasks used in
handlers are broken and fail with [1]. The reason for this is that the
callback v2_playbook_on_task_start is not called anymore for
handlers. Instead the callback v2_playbook_on_handler_task_start is
called for them. This leads to a missing zuul_log_id in the handler
task and trying to log to /tmp/console-None.log. In case this file was
already created by a handler using sudo it may not be accessible which
leads to the exception.

This can be fixed by also defining v2_playbook_on_handler_task_start
in zuul_stream.

Also add a validation of zuul_log_id in the command module. This
should make it easier to spot such errors next time.

[1] Trace
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 185, in follow
    with Console(log_uuid) as console:
  File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 162, in __enter__
    self.logfile = open(self.logfile_name, 'ab', buffering=0)
IOError: [Errno 13] Permission denied: '/tmp/console-None.log'

Change-Id: Ib9dd7fe09e4e7734f7a9ada876e6ce450ebc5038
Story: 2002528
Task: 22067
2018-06-12 08:53:59 -07:00
James E. Blair 896df11638 Revert callback fixes
We suspect there are more fixes required, and the additional check
these changes added may cause more harm in the interim.

This reverts commit 3512a5608c.
This reverts commit d94a0d6f06.

Change-Id: I6e874cd68a1cf6f974e36ab1870573bddf4e647b
2018-06-12 08:53:19 -07:00
Tobias Henkel 3512a5608c
Fix command tasks with free play strategy
When using the free play strategy the command and shell tasks fail
logging due to an undefined variable. Fix that and add tests for it.

Change-Id: I830d64d6089f869dc5a0e3d4fe346e6480884793
2018-06-12 12:30:52 +02:00
Tobias Henkel d94a0d6f06
Fix broken command tasks in handlers
With the update to Ansible 2.5 command and shell tasks used in
handlers are broken and fail with [1]. The reason for this is that the
callback v2_playbook_on_task_start is not called anymore for
handlers. Instead the callback v2_playbook_on_handler_task_start is
called for them. This leads to a missing zuul_log_id in the handler
task and trying to log to /tmp/console-None.log. In case this file was
already created by a handler using sudo it may not be accessible which
leads to the exception.

This can be fixed by also defining v2_playbook_on_handler_task_start
in zuul_stream.

Also add a validation of zuul_log_id in the command module. This
should make it easier to spot such errors next time.

[1] Trace
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 754, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 185, in follow
    with Console(log_uuid) as console:
  File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 162, in __enter__
    self.logfile = open(self.logfile_name, 'ab', buffering=0)
IOError: [Errno 13] Permission denied: '/tmp/console-None.log'

Change-Id: I1978aa8faa488fec87406e1481d455d49731f867
Story: 2002528
Task: 22067
2018-06-12 12:30:49 +02:00
Zuul 03d11709f6 Merge "Add timestamps to multiline debug message logs" 2018-06-11 06:06:02 +00:00
Zuul 1919f05189 Merge "Cleanup log streaming error handling" 2018-06-10 14:27:44 +00:00
Clark Boylan 10fa3339a5 Add timestamps to multiline debug message logs
We special case ansible debug tasks with msg parameters to bypass normal
logging. This meant that multiline debug messages were passed through
verbatim and didn't get per line timestamps and log formatting. This is
not consistent with the rest of our logging so we update it to log each
line in a multi line debug message separately.

Change-Id: Iacb5a101b2a2b971f631e9a14caa51899495ea2d
2018-06-07 09:23:08 -07:00
Zuul bbb44ff95d Merge "Update to Ansible 2.5" 2018-06-07 15:02:27 +00:00
Clark Boylan 8a13313e47 Treat ::1 as localhost in addition to 127.0.0.1
For ipv6 consistency treat ::1 as localhost in the log streaming
callback.

Change-Id: Idc323e29093ec63af431740896c59e7e5bd07290
2018-06-06 10:34:22 -07:00
Clark Boylan 39239b8049 Cleanup log streaming error handling
Previously we didn't reraise, continue or return after having timeout
exceptions connecting to the job log streaming executor. This would
result in subsequent exceptions trying to use the socket object that
doesn't exist. Clean this up by returning when handling this error this
will nicely shutdown the streaming thread.

We also log this state to the job's log so that end users and ops are
aware of the problem.

Change-Id: I7fcfe68197d8afdb54f134bdf0e0e8053ce4faf2
2018-06-05 11:16:24 -07:00
Clark Boylan fdf82d6418 Disable timeouts on the log streaming connection
We set a 5 second timeout on our log callback connection from the
executor to the job node. This has created problems when there are quiet
periods during logging as there may be no data sent on the stream for
more than 5 seconds. The timeout is valuable when setting up the
initial connection as it quickly catches underlying network problems,
but is less useful when streaming the logs after connecting.

Address this by disabling the timeout once we are connected.

Change-Id: I66ab17c55d54a3ae0e3f4572bed72826315b4b22
2018-06-05 10:55:16 -07:00
Artem Goncharov 179a966655 dont wait infinitely for the connection to zuul_console server
If security group or firewall on the target node silently drop
connection to zuul_console daemon - do not wait infinitely and instead
show a message.

After spending days figuring out the reason for zuul_console/stream not
working properly and figuring out that the default group of the node
does not allow 19885 connection from zuul-master (in default and zuul
security group) it became clear, that some extra information is required
back to user. I am open for making it timeout configurable, but 5 sec
should be enough initially.

Change-Id: I4e395b650e83a78d66aa362a420ef0d836c80d35
2018-05-11 15:42:08 +02:00
David Shrewsbury 7bced14e8c Update to Ansible 2.5
Our custom command.py Ansible module is updated to match the
version from 2.5, plus our additions.

strip_internal_keys() is moved within Ansible yet again.

Change-Id: Iab951c11b23a24757cf5334b36bc8f7d12e19db0
Depends-On: https://review.openstack.org/567007
2018-05-08 21:20:13 +02:00
Tobias Henkel 36924466e9
Handle debug tasks more robust in zuul_stream
We currently rely on detecting debug tasks by checking if there is
only a single variable other than variables starting with
'_ansible'. This doesn't work anymore with ansible 2.4. Detecting them
by the action should work better and more reliably.

Further also strip the 'changed' variable in this case as this has no
meaning for a debug task.

Change-Id: I562329ce68c279965afc32ba5fda6615423de43a
2018-02-08 16:34:26 +01:00
Tobias Henkel 62a56d100e Correctly stream the remaining buffer
In case ansible crashes while executing a command module we loose the
last line of the log. This is caused because we don't stream the
remaining buffer but the last line we split out of the buffer.

I wasn't able to construct a test case which makes the original code
break thus this change is without a test case.

Change-Id: I10ffe248c756e0a77f39f6e075dd3615c53f1b03
2017-11-26 08:39:28 +01:00
Tobias Henkel e771b6eeec Do late decoding of log stream buffer
The log stream is read in chunked blocks. When having multi byte
unicode characters in the log stream it can happen that this character
is split into different buffers. This can break the decode step with
an exception [1]. This can be fixed by treating the buffer as binary
and decoding the final lines.

Further we must expect that the data also contains binary data. In
order to cope with this further harden the final decoding by adding
'backslashreplace'. This will replace every occurrence of an
undecodable character by an appropriate escape sequence. This way we
can retain all the information (even binary) without being unable to
decode the stream.

[1]: Log output
Ansible output: b'Exception in thread Thread-10:'
Ansible output: b'Traceback (most recent call last):'
Ansible output: b'  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner'
Ansible output: b'    self.run()'
Ansible output: b'  File "/usr/lib/python3.5/threading.py", line 862, in run'
Ansible output: b'    self._target(*self._args, **self._kwargs)'
Ansible output: b'  File "/var/lib/zuul/ansible/zuul/ansible/callback/zuul_stream.py", line 140, in _read_log'
Ansible output: b'    more = s.recv(4096).decode("utf-8")'
Ansible output: b"UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 4094-4095: unexpected end of data"
Ansible output: b''

Change-Id: I568ede2a2a4a64fd3a98480cebcbc2e86c54a2cf
2017-10-27 07:59:32 +02:00
David Moreau-Simard d4c47855ce
Explicitely print the result status of each playbook run
We are moving _emit_playbook_banner from zuul_stream to the
executor because the data that is meant to be printed inside
the banner is not available until v2_playbook_on_play_start which
is triggered on every play. This ensures that the banner is only
printed once per playbook, even if the playbook has multiple plays.

Additionally, we're also printing a 'footer' banner at the end of
each playbook in order to provide an explicit and easily parsable
result from the perspective of Ansible.

This will make it easier for people and systems to parse the job
output to find the result of particular playbooks.

For example, in Zuul v2, Elastic-Recheck would rely on the last lines
of the console to reliably tell when the job was completed and what was
it's status.
This new v3 approach allows increased granularity by providing the
ability to filter down to specific playbooks or even on a phase basis.

Change-Id: I7e34206d7968bf128e140468b9a222ecbce3a8f1
2017-10-03 16:44:21 -04:00
Zuul 850035eba5 Merge "Log exeptions for module failure cases" into feature/zuulv3 2017-09-22 16:55:51 +00:00
Monty Taylor 1764d2582d
Shift zuul_stream socket reading to use a for loop
The generator seems to be crashing python.

Change-Id: I40381d3677cc5613b522eb004b633696718e5a78
2017-09-11 17:42:50 -06:00
Zuul c0bd5e3b99 Merge "Handle debug messages cleanly" into feature/zuulv3 2017-09-07 20:31:03 +00:00
Zuul 24931edc7a Merge "Stop output start and end times for each task" into feature/zuulv3 2017-09-07 20:28:05 +00:00
Zuul ae488c92e2 Merge "Fix spacer-lines to work with multi-node and items" into feature/zuulv3 2017-09-07 15:46:34 +00:00
Monty Taylor 7b871b47f3
Handle debug messages cleanly
This changes from:

  2017-09-06 18:17:37.173853 | TASK [debug]
  2017-09-06 18:17:37.216661 | localhost | ok:
  2017-09-06 18:17:37.216882 | localhost | {
  2017-09-06 18:17:37.216967 | localhost |   "msg": "Job triggered from https://review.openstack.org/#/c/501485"
  2017-09-06 18:17:37.217047 | localhost | }

to:

  2017-09-06 18:20:34.383995 | TASK [debug]
  2017-09-06 18:20:34.429648 | Job triggered from https://review.openstack.org/#/c/501485

Change-Id: Iac39124c2a3b286215d175f98742d00ec5a12778
2017-09-06 18:22:40 -05:00
Clark Boylan 3d4e466e33 Only strip trailing whitespace from console logs
Logs may have whitespace prefixing used to properly left align content.
Preserve this and only strip trailing whitespace from the logs. Also to
simplify the process of stripping whitespace consolidate it all to just
before we write the logs rather than earlier in the process.

Change-Id: I6ca957902a9ea4a5f0dfdddeda4bd398a81979d9
2017-09-06 13:45:57 -07:00
Paul Belanger 974fe0ddbc
Support IPv6 with zuul_stream
It is possible the IP address passed into zuul_stream is IPv6. Switch
to using socket.create_connection as it will resolve both ipv4 / ipv6.

Change-Id: I4a4fc5e9d49b66851cc4e5806670b8ee70fa9012
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-09-03 18:54:03 -04:00
Monty Taylor 4e335a7c53
Log exeptions for module failure cases
We try one verison of this with module_stdout, but in other cases
'exception' is returned. Give it a try too.

Change-Id: I47b2361d92f34972e276221007fce347bc36cec4
2017-09-02 18:42:26 -05:00
Monty Taylor 34ac31f186
Stop output start and end times for each task
It's really noisy and doesn't provide much value given that we have
timestamps already. The elapsed time is useful, so keep that.

Change-Id: I30093affcdc1a02637205f42433698d4d70eab15
2017-09-01 13:13:20 -05:00
Monty Taylor 849dc84431
Fix spacer-lines to work with multi-node and items
Emitting spacer lines at the end of tasks was actually resulting in
spacer lines internal to a task when loops or multiple hosts were in
use.

Instead, emit them at the start of a play and before the stats so that
task output is consistent and grouped.

Change-Id: Ia5bdb214f78e59854985a628fd408c7fc505b953
2017-09-01 13:10:13 -05:00
Monty Taylor a36b3b314c
Don't output complex items in the summary line
When iterating over complex objects, printing them out in the task
summary line is very noisy and hard to follow.

Change-Id: I4ae748f167ee03ba97a4bcbf2417dae0271e24a5
2017-08-31 15:50:03 -05:00
Monty Taylor c9f5675749
Add support for debug statements
We were eating these before, which is unfortunate.

Change-Id: I3d95599f66344c75403f680d72b91bfcfe5e29fb
2017-08-31 15:50:03 -05:00
Monty Taylor eb09398678
Handle logging ansible python errors properly
If an exception is thrown from ansible, we don't log it properly.

Add a playbook to the functional test of logging that doesn't mask
failures (it lets the tasks report as failed) So that we cover all four
possibilities for ansible module failure exception:

* failed task
* failed task in item
* failed task that reports as success
* failed task in item that reports as success

Change-Id: I9e355350f5ec34251f5030eaaa3f4bb7b4438a6e
2017-08-31 15:50:00 -05:00
Monty Taylor 7093b49f04
Write a logging config file and pass it to callbacks
We need to pass a working logging config to zuul_stream and ara so that
alembic migrations don't step on pre-playbook output.

Write a logging config using json, then pass its location in env vars so that
zuul_stream and ara can pick it up and pass it to dictConfig.

In support of this, create a LoggingConfig class so that we don't have
to copy key names and logic between executor.server, zuul_stream and
zuul_json. Since we have one, go ahead and use it for the server logging
config too, providing them with a slightly richer default logging
config for folks who don't provide a logging config file of their own.

The log config processing has to go into zuul.ansible because it's
needed in the bubblewrap and we don't have it in the python path
otherwise.

Change-Id: I3d7ac797fd2ee2c53f5fbd79d3ee048be6ca9366
2017-08-28 20:06:10 -05:00