Commit Graph

407 Commits

Author SHA1 Message Date
Zuul d1953295a1 Merge "Enable TCP keepalives in zuul_stream" 2024-02-27 18:14:12 +00:00
James E. Blair 5a8e373c3b Replace Ansible 6 with Ansible 9
Ansible 6 is EOL and Ansible 9 is available.  Remove 6 and add 9.

This is usually done in two changes, but this time it's in one
since we can just rotate the 6 around to make it a 9.

command.py has been updated for ansible 9.

Change-Id: I537667f66ba321d057b6637aa4885e48c8b96f04
2024-02-15 16:20:45 -08:00
James E. Blair 556b9917a0 Enable TCP keepalives in zuul_stream
Some remote commands can run for some time without producing output.
Some network devices/firewalls/pnats/etc may decide to drop the
connection since it looks idle.  Try to avoid that by sending TCP
keepalives, and if that doesn't work, that should at least cause
us to be notified that the connection is dead.  In that case, the
log streamer will receive a timeout error on the socket recv call.
We will log it and stop streaming.

Change-Id: Ib8e2aae0de07ff2dcc9de8e244f09e0cf9796e20
2024-02-14 11:15:04 -08:00
James E. Blair b4e49fd8a1 Fix 10 second delay after skipped command task
The log streaming system suffers a 10 second delay after a skipped
command task because:

* when processing the results for the next task, we ensure that the
  log streams for all of the previous tasks have stopped
* when stopping a log stream, we wait for the remote command process
  to finish before we close the stream
* if a log file has not appeared yet, we can't determine if the
  stream has finished
* so we wait 10 seconds for the log to appear before we proceed and
  terminate the stream regardless.

The underlying issue is that the code that terminates the stream does
not know whether to expect a log file (the normal case) or not (the
case for a skipped task).

To correct that, we make a new Streamer class which bundles all of
the information about a particular log streamer and which is
individually addressable.  This way we can stop an individual
streamer immediately if it's corresponding task is skipped, or stop
all of the streamers with a 10 second grace period in the normal
case.

Since the behavioral difference is a 10 second delay (but otherwise
no change in job output), we can't test the behavioral outcome of
this change, but we can exercise the code by ensuring that there are
skipped command tasks in the remote stream tests.

Change-Id: Id6ee13e5a82aa8fa3a2a0dd293cf99ed5b84347a
2024-02-02 09:43:37 -08:00
Clark Boylan 2846fe2e0d Replace python imp module with importlib
The imp module has been deprecated for quite some time. The importlib
module has been present since python 3.6 and replaces the imp module.
This is important because python3.12 removes the imp module entirely and
any code using imp needs to be rewritten to be python3.12 compatible.

The rewrite here comes from the suggestions in an upstream discussion
about this transition [0].

[0] https://discuss.python.org/t/how-do-i-migrate-from-imp/27885

Change-Id: Iaa742c71bbb208225726c2b8fa5af111f48667c9
2024-01-10 13:09:39 -08:00
James E. Blair e55748ba69 Make Ansible variable freezing more efficient
We currently iterate over every job/host/etc variable in the freeze
playbook.  The reason is because if any value in any variable is
Undefined according to jinja, the Ansible combine filter throws
an error.  What we want to do in Zuul is merge any variable we can,
but if any is undefined, we skip it.  Thus, the process of combining
the variables one at a time in a task and ignoring errors.

This process can be slow, especially if we have start with a large
amount of data in one of the early variables.  The combine filter
needs to reprocess the large data repeatedly for each additional
variable.

To improve the process, we create a new action plugin, "zuul_freeze"
which takes a list of variables we want to freeze, then templates
them one at a time and stores the result in a cacheable fact.  This
is the essence of what we were trying to accomplish with the combine
filter.

Change-Id: Ie41f404762daa1b1a5ae47f6ec1aa1954ad36a39
2023-09-14 14:00:45 -07:00
James E. Blair 1b042ba4ab Add job failure output detection regexes
This allows users to trigger the new early failure detection by
matching regexes in the streaming job output.

For example, if a unit test job outputs something sufficiently
unique on failure, one could write a regex that matches that and
triggers the early failure detection before the playbook completes.

For hour-long unit test jobs, this could save a considerable amount
of time.

Note that this adds the google-re2 library to the Ansible venvs.  It
has manylinux wheels available, so is easy to install with
zuul-manage-ansible.  In Zuul itself, we use the fb-re2 library which
requires compilation and is therefore more difficult to use with
zuul-manage-ansible.  Presumably using fb-re2 to validate the syntax
and then later actually using google-re2 to run the regexes is
sufficient.  We may want to switch Zuul to use google-re2 later for
consistency.

Change-Id: Ifc9454767385de4c96e6da6d6f41bcb936aa24cd
2023-08-21 16:41:21 -07:00
James E. Blair f04d80912e Fix issue with early failure and blocks
Ansible tasks that fail within block tasks call the failure
callback, which means that they triggered Zuul early failure
detection even if they were later rescued.  To avoid this,
ignore block tasks for purposes of early failure detection.

Also make early failure detection "sticky".  This helps
uncover errors like this (the "remote" tests uncover this
particular failure if the result is made sticky), and also
ensures consistent behavior in dependent pipelines.

Change-Id: I505667678e7384386819b5389036e4fb4f108afd
2023-08-21 16:41:08 -07:00
Simon Westphahl 3b011296e6 Keep task stdout/stderr separate in result object
Combining stdout/stderr in the result can lead to problems when e.g.
the stdout of a task is used as an input for another task.

This is also different from the normal Ansible behavior and can be
surprising and hard to debug for users.

The new behavior is configurable and off by default to retain backward
compatibility.

Change-Id: Icaced970650913f9632a8db75a5970a38d3a6bc4
Co-Authored-By: James E. Blair <jim@acmegating.com>
2023-08-17 16:22:41 -07:00
Zuul 6c0ffe565f Merge "Report early failure from Ansible task failures" 2023-07-29 18:28:08 +00:00
James E. Blair 60a8dfd451 Add Ansible 8
This is the currently supported version of Ansible.  Since 7 is out
of support, let's skip it.

Change-Id: I1d13c23189dce7fd9db291ee03a452089b92a421
2023-07-19 15:46:48 -07:00
James E. Blair 1170b91bd8 Report early failure from Ansible task failures
We can have the Ansible callback plugin tell the executor to tell
the scheduler that a task has failed and therefore the job will
fail.  This will allow the scheduler to begin a gate reset before
the failing job has finished and potentially save much developer
and CPU time.

We take some extra precautions to try to avoid sending a pre-fail
notification where we think we might end up retrying the job
(either due to a failure in a pre-run playbook, or an unreachable
host).  If that does happen then a gate pipeline might end up
flapping between two different NNFI configurations (ie, it may
perform unecessary gate resets behind the change with the retrying
job), but should still not produce an incorrect result.  Presumably
the detections here should catch that case sufficiently early, but
due to the nature of these errors, we may need to observe it in
production to be sure.

Change-Id: Ic40b8826f2d54e45fb6c4d2478761a89ef4549e4
2023-06-29 13:40:34 -07:00
James E. Blair 70077ee6d5 Do not wait for streamer when disabled
When a user sets zuul_console_disabled, we don't need to try to
connect to the streaming daemon.  In fact, they may have set it
because they know it won't be running.  Check for this and avoid
the connection step in that case and therefore avoid the extraneous
"Waiting on logger" messages and extra 30 second delay at the end
of each task.

Change-Id: I86af231f1ca1c5b54b21daae29387a8798190a58
2023-04-10 11:14:00 -07:00
Clark Boylan ee3339c8e6 Fix more file opening ResourceWarnings
I've managed to get better at grepping for this and this finds some of
the stragglers. They are all file opens without closes fixed by using a
with open() context manager.

Change-Id: I7b8c8516a86558e2027cb0f01aefe2dd1849069c
2023-02-07 17:12:15 -08:00
James E. Blair f9eb499870 Remove Ansible 5
Change-Id: Icd8c33dfe1c8ffd21a717a1a94f1783c244a6b82
2022-10-11 17:03:57 -07:00
Zuul 4ea2a19fe8 Merge "zuul-stream: Add variable to disable writing streaming files" 2022-09-27 02:29:44 +00:00
James E. Blair 2d6b5c19ba Remove support for Ansible 2
Versions 2.8 and 2.9 are no longer supported by the Ansible project.

Change-Id: I888ddcbecadd56ced83a27ae5a6e70377dc3bf8c
2022-09-14 17:14:10 -07:00
Ian Wienand c54ccda128
zuul-stream: Add variable to disable writing streaming files
Upon further discussion we recently found another case of leaking
console streaming files; if the zuul_console is not running on port
19885, or can not be reached, the streaming spool files will still be
leaked.

The prior work in I823156dc2bcae91bd6d9770bd1520aa55ad875b4 has the
receiving side indicate to the zuul_console daemon that it should
remove the spool file.

If this doesn't happen, either because the daemon was never there, or
it is firewalled off, the streaming spool files are left behind.

This modifies the command action plugin to look for a variable
"zuul_console_disable" which will indicate to the library running the
shell/command task not to write out the spool file at all, as it will
not be consumed.

It is expected this would be set at a host level in inventory for
nodes that you know can not or will not have access to zuul_console
daemon.

We already have a mechanism to disable this for commands running in a
loop; we expand this with a new string type.  The advantage of this is
it leaves the library/command.py side basically untouched.

Documentation is updated, and we cover this with a new test.

Change-Id: I0273993c3ece4363098e4bf30bfc4308bb69a8b4
2022-09-14 11:19:40 +10:00
Zuul 3d60914a0e Merge "Add Ansible 6" 2022-09-08 21:30:00 +00:00
Zuul 47227b743f Merge "zuul_stream : Use !127.0.0.1 for loopback" 2022-09-08 21:05:02 +00:00
Zuul 2bc60aaf8b Merge "zuul-stream : Test against a Python 2.7 container" 2022-09-08 07:17:40 +00:00
Zuul 9302e57682 Merge "zuul_stream: handle non-string msg value" 2022-09-07 22:32:54 +00:00
Zuul 1db6725641 Merge "zuul_stream: handle failed_when tasks" 2022-09-07 20:57:07 +00:00
Ian Wienand 78050adb51 zuul_stream : Use !127.0.0.1 for loopback
This is a follow-on to Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728 to
use a different loopback address for the local connection to the
Python 2.7 container.  This way, we don't have to override the
existing localhost/127.0.0.1 matches that avoid the executor trying to
talk to a zuul_console daemon.  These bits are removed.

The comment around the port settings is updated while we're here.

Change-Id: I33b2198baba13ea348052e998b1a5a362c165479
2022-09-07 10:48:25 +10:00
Ian Wienand 34543b8ac5 zuul-stream : Test against a Python 2.7 container
Change Ief366c092e05fb88351782f6d9cd280bfae96237 intoduced a bug in
the streaming daemons because it was using Python 3.6 features.  The
streaming console needs to work on all Ansible managed nodes, which
includes back to Python 2.7 nodes (while Ansible supports that).

This introduces a regression test by building about the smallest
Python 2.7 container that can be managed by Ansbile.  We start this
container and modify the test inventory to include it, then run the
stream tests against it.

The existing testing runs against the "new" console but also tests
against the console OpenDev's Zuul starts to ensure
backwards-compatability.  Since this container wasn't started by Zuul
it doesn't have this, so that testing is skipped for this node.

It might be good to abstract all testing of the console daemons into
separate containers for each Ansible supported managed-node Python
version -- it's a bit more work than I want to take on right now.
This should ensure the lower-bound though and prevent regressions for
older platforms.

Change-Id: Ia78ad9e3ec51bc47bf68c9ff38c0fcd16ba2e728
2022-09-07 10:48:25 +10:00
James E. Blair 7949efd255 Add Ansible 6
Change-Id: I0d450d9385b9aaab22d2d87fb47798bf56525f50
2022-09-02 10:12:55 -07:00
James E. Blair 36de4939a8 Do not use _display outside the main thread in zuul_stream
With the defaulh "linear" strategy (and likely others), Ansible will
send the on_task_start callback, and then fork a worker process to
execute that task.  Since we spawn a thread in the on_task_start
callback, we can end up emitting a log message in this method while
Ansible is forking.  If a forked process inherits a Python file object
(i.e., stdout) that is locked by a thread that doesn't exist in the
fork (i.e., this one), it can deadlock when trying to flush the file
object.  To minimize the chances of that happening, we should avoid
using _display outside the main thread.

The Python logging module is supposed to use internal locks which are
automatically aqcuired and released across a fork.  Assuming this is
(still) true and functioning correctly, we should be okay to issue
our Python logging module calls at any time.  If there is a fault
in this system, however, it could have a potential to cause a similar
problem.

If we can convince the Ansible maintainers to lock _display across
forks, we may be able to revert this change in the future.

Change-Id: Ifc6b835c151539e6209284728ccad467bef8be6f
2022-08-31 13:26:58 -07:00
Ian Wienand 0e9f474241 zuul_stream: handle non-string msg value
I noticed in some of our testing a construct like

 debug:
   msg: '{{ ansible_version }}'

was actually erroring out; you'll see in the console output if you're
looking

 Ansible output: b'TASK [Print ansible version msg={{ ansible_version }}] *************************'
 Ansible output: b'[WARNING]: Failure using method (v2_runner_on_ok) in callback plugin'
 Ansible output: b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at'
 Ansible output: b"0x7f502760b490>): 'dict' object has no attribute 'startswith'"

and the job-output.txt will be empty for this task (this is detected
by by I9f569a411729f8a067de17d99ef6b9d74fc21543).

This is because the msg value here comes in as a dict, and in several
places we assume it is a string.  This changes places we inspect the
msg variable to use the standard Ansible way to make a text string
(to_text function) and ensures in the logging function it converts the
input to a string.

We test for this with updated tasks in the remote_zuul_stream tests.
It is slightly refactored to do partial matches so we can use the
version strings, which is where we saw the issue.

Change-Id: I6e6ed8dba2ba1fc74e7fc8361e8439ea6139279e
2022-08-24 13:32:37 +10:00
Ian Wienand 98296dd5f6 zuul_stream: handle failed_when tasks
Currently the task in the test playbook

  - hosts: compute1
    tasks:
    - name: Command Not Found
      command: command-not-found
      failed_when: false

is failing in the zuul_stream callback with an exception trying to
fill out the "delta" value in the message here.  The result dict
(taken from the new output) shows us why:

 2022-08-24 07:19:27.079961 | TASK [Command Not Found]
 2022-08-24 07:19:28.578380 | compute1 | ok: ERROR (ignored)
 2022-08-24 07:19:28.578622 | compute1 | {
 2022-08-24 07:19:28.578672 | compute1 |   "failed_when_result": false,
 2022-08-24 07:19:28.578700 | compute1 |   "msg": "[Errno 2] No such file or directory: b'command-not-found'",
 2022-08-24 07:19:28.578726 | compute1 |   "rc": 2
 2022-08-24 07:19:28.578750 | compute1 | }

i.e. it has no start/stop/delta in the result (it did run and fail, so
you'd think it might ... but this is what Ansible gives us).

This checks for this path; as mentioned the output will now look like
above in this case.

This was found by the prior change
I9f569a411729f8a067de17d99ef6b9d74fc21543.  This fixes the current
warning, so we invert the test to prevent further regressions.

Change-Id: I106b2bbe626ed5af8ca739d354ba41eca2f08f77
2022-08-24 13:29:45 +10:00
Zuul 6d0d04560c Merge "zuul-stream : don't write out logfile for tasks in loops" 2022-08-17 01:54:22 +00:00
Zuul bb1b0d5c11 Merge "zuul_console: do not use f-strings" 2022-08-15 02:47:55 +00:00
Ian Wienand 3037ce79a1 zuul_console: do not use f-strings
In Ief366c092e05fb88351782f6d9cd280bfae96237 I missed that this runs
in the context of the remote node; meaning that it must support all
the Python versions that might run there.  f-strings are not 3.5
compatible.

I'm thinking about how to lint this better (a syntax check run?)

Change-Id: Ia4133b061800791196cd631f2e6836cb77347664
2022-08-15 08:37:04 +10:00
Ian Wienand df3f9dcd30 zuul-stream: automatically remove streaming files
When using protocol version 1, send a finalise message when streaming
is complete so that the zuul_console daemon can delete the temporary
file.

We test this by inspecting the Ansible console output, which logs a
message with the UUID of the streaming job.  We dump the temporary
files on the remote side and make sure a console file for that job
isn't present.

Change-Id: I823156dc2bcae91bd6d9770bd1520aa55ad875b4
2022-08-09 17:04:46 +10:00
Ian Wienand c1b2fa5598 zuul-stream: implement a protocol and version
A refresher on how this works, to the best of my knowledge

 1 Firstly, Zuul's Ansible has a library task "zuul_console:" which is
   run against the remote node; this forks a console daemon, listening
   on a default port.

 2 We have a action plugin that runs for each task, and if that task
   is a command/shell task, assigns it a unique id

 3 We then override with library/command.py (which backs command/shell
   tasks) with a version that forks and runs the process on the target
   node as usual, but also saves the stdout/stderr output to a
   temporary file named per the unique uuid from the step above.

 4 At the same time we have the callback plugin zuul_stream.py, which
   Ansible is calling as it moves through starting, running and
   finishing the tasks.  This looks at the task, and if it has a UUID
   [2], sends a request to the zuul_console [1], which opens the
   temporary file [3] and starts streaming it back.

 5 We loop reading this until the connection is closed by [1],
   eventually outputting each line.

In this way, the console log is effectively streamed and saved into
our job output.

We have established that we expect the console [1] is updated
asynchronously to the command/streaming [3,4] in situation such as
static nodes.  This poses a problem if we ever want to update either
part -- for example we can not change the file-name that the
command.py file logs to, because an old zuul_console: will not know to
open the new file.  You could imagine other fantasy things you might
like to do; e.g. negotiate compression etc. that would have similar
issues.

To provide the flexibility for these types of changes, implement a
simple protocol where the zuul_stream and zuul_console sides exchange
their respective version numbers before sending the log files.  This
way they can both decide what operations are compatible both ways.

Luckily the extant protocol, which is really just sending a plain
uuid, can be adapted to this.  When an old zuul_console server gets
the protocol request it will just look like an invalid log file, which
zuul_stream can handle and thus assume the remote end doesn't know
about protocols.

This bumps the testing timeout; it seems that the extra calls make for
random failures.  The failures are random and not in the same place,
I've run this separately in 850719 several times and not seen any
problems with the higher timeout.  This test is already has a settle
timeout slightly higher so I think it must have just been on the edge.

Change-Id: Ief366c092e05fb88351782f6d9cd280bfae96237
2022-08-09 17:04:40 +10:00
Ian Wienand f8677cddb9 zuul-stream : don't write out logfile for tasks in loops
In zuul_stream.py:v2_playbook_on_task_start() it checks for
"task.loop" and exits if the task is part of a loop.

However the library/command.py override still writes out the console
log despite it never being read.  To avoid leaving this file around,
mark a sentinel uuid in the action plugin if the command is part of a
loop.  In that case, for simplicity we just write to /dev/null -- that
way no other assumptions in the library/command.py have to change; it
just doesn't leave a file on disk.

This is currently difficult to test as the infrastructure zuul_console
leaves /tmp/console-* files and we do not know what comes from that,
or testing.  After this and the related change
I823156dc2bcae91bd6d9770bd1520aa55ad875b4 are deployed to the
infrastructure executors, we can make a simple and complete test for
the future by just ensuring no /tmp/console-* files are left behind
afer testing.  I have tested this locally and do not see files from
loops, which I was before this change.

Change-Id: I4f4660c3c0b0f170561c14940cc159dc43eadc79
2022-07-23 07:30:06 +10:00
Clark Boylan 1cdf491a2e Handle non default loopvars in Ansible callback stream plugin
The Zuul Ansible callback stream plugin assumed that the ansible loop
var was always called 'item' in the result_dict. You can override this
value (and it is often necessary to do so to avoid collisions) to
something less generic. In those cases we would get errors like:

  b'[WARNING]: Failure using method (v2_runner_item_on_ok) in callback plugin'
  b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at'
  b"0x7fbecc97c910>): 'item'"

And stream output would not include the info typically logged.

Address this by checking if ansible_loop_var is in the results_dict and
using that value for the loop var name instead. We still fall back to
'item' as I'm not sure that ansible_loop_var is always present.

Change-Id: I408e6d4af632f8097d63c04cbcb611d843086f6c
2022-07-06 15:59:17 -07:00
Guillaume Chauvel 6a53da691f
Fix zuul_run_command ret when command is not found
If the command didn't exist:
    - Popen would throw an exception
    - 't' would not be updated (t is None)
    - return code would not be written to the console
    - zuul_stream would wait unecessary for 10 seconds

As rc is defined in normal case or in both exceptions, it can be written
in each case to the console.

Change-Id: I77a4e1bdc6cd163143eacda06555b62c9195ee38
2022-06-17 21:26:08 +02:00
James E. Blair 096e47dc43 Fix console log streaming with duplicated roles
If a role is applied to a host more than once (via either play
roles or include_roles, but not via an include_role loop), it will
have the same task UUID from ansible which means Zuul's command
plugin will write the streaming output to the same filename, and
the log streaming will request the same file.  That means the file
might look this after the second invocation:

2022-05-19 17:06:23.673625 | one
2022-05-19 17:06:23.673781 | [Zuul] Task exit code: 0
2022-05-19 17:06:29.226463 | two
2022-05-19 17:06:29.226605 | [Zuul] Task exit code: 0

But since we stop reading the log after "Task exit code", the user
would see "one" twice, and never see "two".

Here are some potential fixes for this that don't work:

* Accessing the task vars from zuul_stream to store any additional
  information: the callback plugins are not given the task vars.
* Setting the log id on the task args in zuul_stream instead of
  command: the same Task object is used for each host and therefore
  the command module might see the task object after it has been
  further modified (in other words, nothing host-specific can be
  set on the task object).
* Setting an even more unique uuid than Task._uuid on the Task
  object in zuul_stream and using that in the command module instead
  of Task._uuid: in some rare cases, the actual task Python object
  may be different between the callback and command plugin, yet still
  have the same _uuid; therefore the new attribute would be missing.

Instead, a global variable is used in order to transfer data between
zuul_stream and command.  This variable holds a counter for each
task+host combination.  Most of the time it will be 1, but if we run
the same task on the same host again, it will increment.  Since Ansible
will not run more than one task on a host simultaneously, so there is
no race between the counter being incremented in zuul_stream and used
in command.

Because Ansible is re-invoked for each playbook, the memory usage is
not a concern.

There may be a fork between zuul_stream and command, but that's fine
as long as we treat it as read-only in the command plugin.  It will
have the data for our current task+host from the most recent zuul_stream
callback invocation.

This change also includes a somewhat unrelated change to the test
infrastructure.  Because we were not setting the log stream port on
the executor in tests, we were actually relying on the "real" OpenDev
Zuul starting zuul_console on the test nodes rather than the
zuul_console we set up for each specific Ansible version from the tests.
This corrects that and uses the correct zuul_console port, so that if we
make any changes to zuul_console in the future, we will test the
changed version, not the one from the Zuul which actually runs the
tox-remote job.

Change-Id: Ia656db5f3dade52c8dbd0505b24049fe0fff67a5
2022-05-21 08:42:03 -07:00
Zuul 4151f91684 Merge "zuul_json: remove no_log handling" 2022-05-03 05:37:49 +00:00
Zuul 7130197e79 Merge "ansible zuul_json callback : handle string results" 2022-05-03 05:37:41 +00:00
Zuul 5b281e76a9 Merge "Add support for Ansible 5" 2022-04-28 09:01:09 +00:00
Zuul 2fd6e8fd5d Merge "Remove actiongeneral / actiontrusted dirs" 2022-04-28 06:31:49 +00:00
Zuul 8b24154448 Merge "Remove unused functions from zuul.ansible.paths" 2022-04-28 06:14:04 +00:00
Zuul b053fb759c Merge "Remove workspace path check from zuul_return" 2022-04-28 00:45:28 +00:00
Ian Wienand 7996f634fa zuul_json: remove no_log handling
The current no_log handling here was added in the following steps

Ic1becaf2f3ab345da22fa62314f1296d76777fec was the original zuul_json
and added the check of the result object

  if result._result.get('_ansible_no_log', False):

This happened in July, 2017.  However, soon after in October 2017,
Ansible merged [1] which changed things so that the callbacks
(v2_runner_on_ok) get "clean copy" of the TaskResult object; e.g.

   if isinstance(arg, TaskResult):
       new_args.append(arg.clean_copy())

It can be seen at [2] that this is where results are censored.  This
change made it into Ansible 2.5.  Ergo this callback will never see
uncensored results here.

The second part of the no_log processing here was added with
I9e8d08f75207b362ca23457c44cc2f38ff43ac23 in March 2018.  This was to
work around an issue with loops, where the uncensored values would
"leak" into the results under certain circumstances when using loops,
e.g.

 - name: A templated no_log loop
   a_task:
     a_arg: 'foo'
   no_log: '{{ item }}'
   loop:
     - True
     - False

Ansible merged a few changes related to fixing this.  [3] was merged
in June 2018 for this exact issue of unreachable hosts.  [4] was
merged in August 2018, which makes it so that if any loop item is
"no_log", the whole task is.  Both of these changes made it into
Ansible 2.7.

Ansible has merged test-cases for these, but we have also merged
test-cases, so we have double coverage against regression in the
future.

Ergo, I believe we can revert both of these checks.  This also means
we don't have to worry about special-casing string results as done in
I02bcd307bcfad8d99dd0db13d979ce7ba3d5e0e4.

[1] 01b6c7c9c6
[2] f7c2b1986c/lib/ansible/executor/task_result.py (L13)
[3] 336b3762b2
[4] bda074d34e

Change-Id: I00ef08869f3a8f08a1affa5e15e3386a1891f11e
2022-04-28 09:57:14 +10:00
Ian Wienand bc45371fdd ansible zuul_json callback : handle string results
We have noted a problem with tasks such as "yum:" or "package:" that
output is not being logged in the job json file.

After investigation, I have found that this is failing with an
exception in the results parsing at

   for index, item_result in enumerate(
           clean_result.get('results', [])):
       if not item_result.get('_ansible_no_log', False):
           continue

When we are *not* processing a task loop, "results" is a simple list
of strings about the package results; thus when it walks each result
the "item_result.get('_ansible_no_log', False)" step fails because a
string has no ".get()".  This causes the plugin to exit and the
information in the resulting .json to be incomplete, leading to the
tasks being missing from the UI.  I believe this was a regression
introduced with I9e8d08f75207b362ca23457c44cc2f38ff43ac23.

When we *are* processing a task loop, "results" is a list where each
entry is a loop-item result.  In this case, we are always walking a
list of dictionaries, so the existing ".get()" call works.

Change-Id: I02bcd307bcfad8d99dd0db13d979ce7ba3d5e0e4
2022-04-28 09:55:24 +10:00
Zuul 71453dd7f2 Merge "Remove localhost restriction from command.py" 2022-04-27 23:31:45 +00:00
Zuul 04a62a18dc Merge "Remove unused ansible modules" 2022-04-27 21:46:54 +00:00
James E. Blair ebf5c96d57 Add support for Ansible 5
This adds support for Ansible 5.  As mentioned in the reno, only
the major version is specified; that corresponds to major.minor in
Ansible core, so is approximately equivalent to our current regime.

The command module is updated to be based on the current code in
ansible core 2.12.4 (corresponding to community 5.6.0).  The previous
version is un-symlinked and copied to the 2.8 and 2.8 directories
for easy deletion as they age out.

The new command module has corrected a code path we used to test
that the zuul_stream module handles python exceptions in modules,
so instead we now take advantage of the ability to load
playbook-adjacent modules to add a test fixture module that always
raises an exception.  The zuul stream functional test validation is
adjusted to match the new values.

Similarly, in test_command in the remote tests, we relied on that
behavior, but there is already a test for module exceptions in
test_module_exception, so that check is simply removed.

Among our Ansible version tests, we occasionally had tests which
exercised 2.8 but not 2.9 because it is the default and is otherwise
tested.  This change adds explicit tests for 2.9 even if they are
redundant in order to make future Ansible version updates easier and
more mechanical (we don't need to remember to add 2.9 later when
we change the default).

This is our first version of Ansible where the value of
job.ansible-version could be interpreted as an integer, so the
configloader is updated to handle that possibility transparently,
as it already does for floating point values.

Change-Id: I694b979077d7944b4b365dbd8c72aba3f9807329
2022-04-14 13:33:53 -07:00
James E. Blair 0a1abf8ead Remove actiongeneral / actiontrusted dirs
We no longer need these directories since there is no deffirence
between which Ansible plugins we load for the trusted vs untrusted
context.

Change-Id: Ibd460d89ebd75a0b58ce715284916e1e1628b518
2022-04-13 14:17:04 -07:00