zuul/zuul - zuul - OpenDev: Free Software Needs Free Tools

Commit Graph

Author	SHA1	Message	Date
James E. Blair	b4e49fd8a1	Fix 10 second delay after skipped command task The log streaming system suffers a 10 second delay after a skipped command task because: * when processing the results for the next task, we ensure that the log streams for all of the previous tasks have stopped * when stopping a log stream, we wait for the remote command process to finish before we close the stream * if a log file has not appeared yet, we can't determine if the stream has finished * so we wait 10 seconds for the log to appear before we proceed and terminate the stream regardless. The underlying issue is that the code that terminates the stream does not know whether to expect a log file (the normal case) or not (the case for a skipped task). To correct that, we make a new Streamer class which bundles all of the information about a particular log streamer and which is individually addressable. This way we can stop an individual streamer immediately if it's corresponding task is skipped, or stop all of the streamers with a 10 second grace period in the normal case. Since the behavioral difference is a 10 second delay (but otherwise no change in job output), we can't test the behavioral outcome of this change, but we can exercise the code by ensuring that there are skipped command tasks in the remote stream tests. Change-Id: Id6ee13e5a82aa8fa3a2a0dd293cf99ed5b84347a	2024-02-02 09:43:37 -08:00
Simon Westphahl	3b011296e6	Keep task stdout/stderr separate in result object Combining stdout/stderr in the result can lead to problems when e.g. the stdout of a task is used as an input for another task. This is also different from the normal Ansible behavior and can be surprising and hard to debug for users. The new behavior is configurable and off by default to retain backward compatibility. Change-Id: Icaced970650913f9632a8db75a5970a38d3a6bc4 Co-Authored-By: James E. Blair <jim@acmegating.com>	2023-08-17 16:22:41 -07:00
Ian Wienand	0e9f474241	zuul_stream: handle non-string msg value I noticed in some of our testing a construct like debug: msg: '{{ ansible_version }}' was actually erroring out; you'll see in the console output if you're looking Ansible output: b'TASK [Print ansible version msg={{ ansible_version }}] *************************' Ansible output: b'[WARNING]: Failure using method (v2_runner_on_ok) in callback plugin' Ansible output: b'(<ansible.plugins.callback.zuul_stream.CallbackModule object at' Ansible output: b"0x7f502760b490>): 'dict' object has no attribute 'startswith'" and the job-output.txt will be empty for this task (this is detected by by I9f569a411729f8a067de17d99ef6b9d74fc21543). This is because the msg value here comes in as a dict, and in several places we assume it is a string. This changes places we inspect the msg variable to use the standard Ansible way to make a text string (to_text function) and ensures in the logging function it converts the input to a string. We test for this with updated tasks in the remote_zuul_stream tests. It is slightly refactored to do partial matches so we can use the version strings, which is where we saw the issue. Change-Id: I6e6ed8dba2ba1fc74e7fc8361e8439ea6139279e	2022-08-24 13:32:37 +10:00
Guillaume Chauvel	6a53da691f	Fix zuul_run_command ret when command is not found If the command didn't exist: - Popen would throw an exception - 't' would not be updated (t is None) - return code would not be written to the console - zuul_stream would wait unecessary for 10 seconds As rc is defined in normal case or in both exceptions, it can be written in each case to the console. Change-Id: I77a4e1bdc6cd163143eacda06555b62c9195ee38	2022-06-17 21:26:08 +02:00
James E. Blair	096e47dc43	Fix console log streaming with duplicated roles If a role is applied to a host more than once (via either play roles or include_roles, but not via an include_role loop), it will have the same task UUID from ansible which means Zuul's command plugin will write the streaming output to the same filename, and the log streaming will request the same file. That means the file might look this after the second invocation: 2022-05-19 17:06:23.673625 \| one 2022-05-19 17:06:23.673781 \| [Zuul] Task exit code: 0 2022-05-19 17:06:29.226463 \| two 2022-05-19 17:06:29.226605 \| [Zuul] Task exit code: 0 But since we stop reading the log after "Task exit code", the user would see "one" twice, and never see "two". Here are some potential fixes for this that don't work: * Accessing the task vars from zuul_stream to store any additional information: the callback plugins are not given the task vars. * Setting the log id on the task args in zuul_stream instead of command: the same Task object is used for each host and therefore the command module might see the task object after it has been further modified (in other words, nothing host-specific can be set on the task object). * Setting an even more unique uuid than Task._uuid on the Task object in zuul_stream and using that in the command module instead of Task._uuid: in some rare cases, the actual task Python object may be different between the callback and command plugin, yet still have the same _uuid; therefore the new attribute would be missing. Instead, a global variable is used in order to transfer data between zuul_stream and command. This variable holds a counter for each task+host combination. Most of the time it will be 1, but if we run the same task on the same host again, it will increment. Since Ansible will not run more than one task on a host simultaneously, so there is no race between the counter being incremented in zuul_stream and used in command. Because Ansible is re-invoked for each playbook, the memory usage is not a concern. There may be a fork between zuul_stream and command, but that's fine as long as we treat it as read-only in the command plugin. It will have the data for our current task+host from the most recent zuul_stream callback invocation. This change also includes a somewhat unrelated change to the test infrastructure. Because we were not setting the log stream port on the executor in tests, we were actually relying on the "real" OpenDev Zuul starting zuul_console on the test nodes rather than the zuul_console we set up for each specific Ansible version from the tests. This corrects that and uses the correct zuul_console port, so that if we make any changes to zuul_console in the future, we will test the changed version, not the one from the Zuul which actually runs the tox-remote job. Change-Id: Ia656db5f3dade52c8dbd0505b24049fe0fff67a5	2022-05-21 08:42:03 -07:00
James E. Blair	ebf5c96d57	Add support for Ansible 5 This adds support for Ansible 5. As mentioned in the reno, only the major version is specified; that corresponds to major.minor in Ansible core, so is approximately equivalent to our current regime. The command module is updated to be based on the current code in ansible core 2.12.4 (corresponding to community 5.6.0). The previous version is un-symlinked and copied to the 2.8 and 2.8 directories for easy deletion as they age out. The new command module has corrected a code path we used to test that the zuul_stream module handles python exceptions in modules, so instead we now take advantage of the ability to load playbook-adjacent modules to add a test fixture module that always raises an exception. The zuul stream functional test validation is adjusted to match the new values. Similarly, in test_command in the remote tests, we relied on that behavior, but there is already a test for module exceptions in test_module_exception, so that check is simply removed. Among our Ansible version tests, we occasionally had tests which exercised 2.8 but not 2.9 because it is the default and is otherwise tested. This change adds explicit tests for 2.9 even if they are redundant in order to make future Ansible version updates easier and more mechanical (we don't need to remember to add 2.9 later when we change the default). This is our first version of Ansible where the value of job.ansible-version could be interpreted as an integer, so the configloader is updated to handle that possibility transparently, as it already does for floating point values. Change-Id: I694b979077d7944b4b365dbd8c72aba3f9807329	2022-04-14 13:33:53 -07:00
Tobias Henkel	bf4e9893d0	Block localhost shell tasks in untrusted playbooks Zuul was designed to block local code execution in untrusted environments to not only rely on bwrap to contain a job. This got broken since the creation of a command plugin that injects the zuul_job_id which is required for log streaming. However this plugin doesn't do a check if the task is a localhost task. Further it is required in trusted and untrusted environments due to log streaming. Thus we need to fork this plugin and restrict the variant that is used in untrusted environments. We do this by moving actiongeneral/command.py back to action/*. We further introduce a new catecory actiontrusted which gets the unrestricted version of this plugin. Change-Id: If81cc46bcae466f4c071badf09a8a88469ae6779 Story: 2007935 Task: 40391	2020-07-21 19:18:10 +02:00
Tristan Cacqueray	053e6388cd	zuul-tox-remote: use unique zuul_console service This change ensures the zuul-tox-remote job is testing the unique zuul_console service instead of the one setup by the job. Change-Id: If2e2324235a6608985168d606d95adeade36afe0	2019-05-17 07:56:58 +00:00
Tobias Henkel	5c2b61e638	Make ansible version configurable Currently the default ansible version is selected by the version of zuul itself. However we want to make this configurable per deployment (zuul.conf), tenant and job. Change-Id: Iccbb124ac7f7a8260c730fbc109ccfc1dec09f8b	2019-03-15 09:09:16 +01:00
Tobias Henkel	6f754dd878	Log exception on module failure with empty stdout In the zuul_stream callback we check for 'MODULE FAILURE' and then log stdout, exception or stderr whichever we find first. However in some cases stdout exists but is empty. In this case we don't log the error reason and the user needs to look into the json log which contains the correct information. This can be easily fixed by checking if stdout is empty. Change-Id: I8019d40e2e99310eaeab03f652368dac66e1ee17	2019-03-06 21:51:33 +01:00
Tobias Henkel	0ae7a157e2	Don't do live streaming in loops Currently we do live streaming of command and shell tasks in loops. However this is severely broken as we only get the output of the first task that ran. This is hard to solve as a per task streaming in loops is not possible because of missing start events. Further streaming several tasks at once is difficult too because we don't know how many tasks we need to stream upfront. So the intermediate solution until the whole log streaming is reworked is to just not live stream tasks in loops. While we're at it also fix a broken log statement and remove duplicate exit code prints of action tasks. Change-Id: Ic1358d5b9f939549ffdd5d770fe4eafc047be6f1	2018-11-05 22:21:38 +01:00
Zuul	a5c6c31043	Merge "Fix race in tox-remote test_command"	2018-06-21 20:20:18 +00:00
Tobias Henkel	d2f67f497d	Fix race in tox-remote test_command In test_command we create the file /tmp/console-None.log. We now have two hosts in that test that are mapped to the same host. This could lead to failures [1] during creation of that file. Fix that by adding run_once. [1] Test failure 2018-06-20 14:12:00.349980 \| TASK [Create unwritable /tmp/console-None.log] 2018-06-20 14:12:00.727614 \| controller \| changed 2018-06-20 14:12:00.734316 \| compute1 \| ERROR 2018-06-20 14:12:00.734612 \| compute1 \| { 2018-06-20 14:12:00.734734 \| compute1 \| "gid": 1000, 2018-06-20 14:12:00.734828 \| compute1 \| "group": "zuul", 2018-06-20 14:12:00.734910 \| compute1 \| "mode": "0444", 2018-06-20 14:12:00.734987 \| compute1 \| "msg": "Error, could not touch target: [Errno 13] Permission denied: \'/tmp/console-None.log\'", 2018-06-20 14:12:00.735063 \| compute1 \| "owner": "zuul", 2018-06-20 14:12:00.735137 \| compute1 \| "path": "/tmp/console-None.log", 2018-06-20 14:12:00.735211 \| compute1 \| "size": 0, 2018-06-20 14:12:00.735284 \| compute1 \| "state": "file", 2018-06-20 14:12:00.735357 \| compute1 \| "uid": 1000 2018-06-20 14:12:00.735430 \| compute1 \| } Change-Id: Ib1d48fea57447131d7018e6a2772f289a668299a	2018-06-21 19:31:57 +00:00
Clark Boylan	7451fdd1e8	Make streaming test run longer than connection timeout This is an attempt to make the existing streaming log test more robust based on observed behavior in the wild. Basically we need to make sure that if nothing is logged for longer than our streaming connection timeout that we still get all the data. Change-Id: I0a60c92d8a75dac195f6b85de379b612a2148e37	2018-06-20 15:19:31 +02:00
Tobias Henkel	1ef438c450	Remove failed_when when creating /tmp/console-None.log This should make the streaming test more robust once zuul is updated with the fix for this. Change-Id: I0568a7abc59d3823e7e87678550e6a61575aad4b	2018-06-15 04:53:46 +00:00
James E. Blair	6526ce8282	Add test for shell after include_role We have observed that if a shell task runs on multiple hosts after an include_role, the task will report a missing zuul_log_id on the second host. This adds a test for that. Change-Id: I59787285674947c24b8919f1c4d5ced9439da9f8 Story: 2002528 Task: 22067	2018-06-13 15:38:36 -07:00
Tobias Henkel	28dc87d4ee	Add blocks to the zuul stream test A further edge case which could potentially break in the future is using blocks with rescue and always tasks. Add them to the test to make sure they won't break. Change-Id: I2a45ba303825dff8ed9f64a68a6b71c0e5afa08c	2018-06-12 08:58:30 -07:00
Tobias Henkel	1310aa592b	Fix command tasks with free play strategy When using the free play strategy the command and shell tasks fail logging due to an undefined variable. Fix that and add tests for it. Change-Id: Ib869b12c54201b59f3de852b502ff62183db7266	2018-06-12 08:54:08 -07:00
Tobias Henkel	bfcdff9118	Fix broken command tasks in handlers With the update to Ansible 2.5 command and shell tasks used in handlers are broken and fail with [1]. The reason for this is that the callback v2_playbook_on_task_start is not called anymore for handlers. Instead the callback v2_playbook_on_handler_task_start is called for them. This leads to a missing zuul_log_id in the handler task and trying to log to /tmp/console-None.log. In case this file was already created by a handler using sudo it may not be accessible which leads to the exception. This can be fixed by also defining v2_playbook_on_handler_task_start in zuul_stream. Also add a validation of zuul_log_id in the command module. This should make it easier to spot such errors next time. [1] Trace Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(self.__args, *self.__kwargs) File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 185, in follow with Console(log_uuid) as console: File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 162, in __enter__ self.logfile = open(self.logfile_name, 'ab', buffering=0) IOError: [Errno 13] Permission denied: '/tmp/console-None.log' Change-Id: Ib9dd7fe09e4e7734f7a9ada876e6ce450ebc5038 Story: 2002528 Task: 22067	2018-06-12 08:53:59 -07:00
James E. Blair	896df11638	Revert callback fixes We suspect there are more fixes required, and the additional check these changes added may cause more harm in the interim. This reverts commit `3512a5608c`. This reverts commit `d94a0d6f06`. Change-Id: I6e874cd68a1cf6f974e36ab1870573bddf4e647b	2018-06-12 08:53:19 -07:00
Tobias Henkel	3512a5608c	Fix command tasks with free play strategy When using the free play strategy the command and shell tasks fail logging due to an undefined variable. Fix that and add tests for it. Change-Id: I830d64d6089f869dc5a0e3d4fe346e6480884793	2018-06-12 12:30:52 +02:00
Tobias Henkel	d94a0d6f06	Fix broken command tasks in handlers With the update to Ansible 2.5 command and shell tasks used in handlers are broken and fail with [1]. The reason for this is that the callback v2_playbook_on_task_start is not called anymore for handlers. Instead the callback v2_playbook_on_handler_task_start is called for them. This leads to a missing zuul_log_id in the handler task and trying to log to /tmp/console-None.log. In case this file was already created by a handler using sudo it may not be accessible which leads to the exception. This can be fixed by also defining v2_playbook_on_handler_task_start in zuul_stream. Also add a validation of zuul_log_id in the command module. This should make it easier to spot such errors next time. [1] Trace Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner self.run() File "/usr/lib/python2.7/threading.py", line 754, in run self.__target(self.__args, *self.__kwargs) File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 185, in follow with Console(log_uuid) as console: File "/tmp/ansible_X_j_M1/ansible_module_command.py", line 162, in __enter__ self.logfile = open(self.logfile_name, 'ab', buffering=0) IOError: [Errno 13] Permission denied: '/tmp/console-None.log' Change-Id: I1978aa8faa488fec87406e1481d455d49731f867 Story: 2002528 Task: 22067	2018-06-12 12:30:49 +02:00
James E. Blair	a51a9e7b8b	Add zuul-stream remote tests This tests zuul-stream using the 'remote' tests since many aspects of zuul-stream require a remote node to be exercised. Change-Id: I08edee6f9388faf13615c6dc310df7a94d0419b2	2018-03-20 16:23:41 -07:00

23 Commits