Do late decoding of log stream buffer

The log stream is read in chunked blocks. When having multi byte
unicode characters in the log stream it can happen that this character
is split into different buffers. This can break the decode step with
an exception [1]. This can be fixed by treating the buffer as binary
and decoding the final lines.

Further we must expect that the data also contains binary data. In
order to cope with this further harden the final decoding by adding
'backslashreplace'. This will replace every occurrence of an
undecodable character by an appropriate escape sequence. This way we
can retain all the information (even binary) without being unable to
decode the stream.

[1]: Log output
Ansible output: b'Exception in thread Thread-10:'
Ansible output: b'Traceback (most recent call last):'
Ansible output: b'  File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner'
Ansible output: b'    self.run()'
Ansible output: b'  File "/usr/lib/python3.5/threading.py", line 862, in run'
Ansible output: b'    self._target(*self._args, **self._kwargs)'
Ansible output: b'  File "/var/lib/zuul/ansible/zuul/ansible/callback/zuul_stream.py", line 140, in _read_log'
Ansible output: b'    more = s.recv(4096).decode("utf-8")'
Ansible output: b"UnicodeDecodeError: 'utf-8' codec can't decode bytes in position 4094-4095: unexpected end of data"
Ansible output: b''

Change-Id: I568ede2a2a4a64fd3a98480cebcbc2e86c54a2cf
This commit is contained in:
Tobias Henkel 2017-10-25 14:23:31 +02:00
parent 6949c24c59
commit e771b6eeec
3 changed files with 21 additions and 6 deletions

View File

@ -46,3 +46,6 @@
args:
chdir: /itemloop/somewhere/that/does/not/exist
failed_when: false
- name: Print binary data
command: echo -e '\x80abc'

View File

@ -58,3 +58,8 @@
shell: |
egrep "^.+\| node1 \| OSError.+\/failure-itemloop\/" job-output.txt
egrep "^.+\| node2 \| OSError.+\/failure-itemloop\/" job-output.txt
- name: Validate output - binary data
shell: |
egrep "^.*\| node1 \| \\\\x80abc" job-output.txt
egrep "^.*\| node2 \| \\\\x80abc" job-output.txt

View File

@ -128,22 +128,29 @@ class CallbackModule(default.CallbackModule):
continue
msg = "%s\n" % log_id
s.send(msg.encode("utf-8"))
buff = s.recv(4096).decode("utf-8")
buff = s.recv(4096)
buffering = True
while buffering:
if "\n" in buff:
(line, buff) = buff.split("\n", 1)
done = self._log_streamline(host, line)
if b'\n' in buff:
(line, buff) = buff.split(b'\n', 1)
# We can potentially get binary data here. In order to
# being able to handle that use the backslashreplace
# error handling method. This decodes unknown utf-8
# code points to escape sequences which exactly represent
# the correct data without throwing a decoding exception.
done = self._log_streamline(
host, line.decode("utf-8", "backslashreplace"))
if done:
return
else:
more = s.recv(4096).decode("utf-8")
more = s.recv(4096)
if not more:
buffering = False
else:
buff += more
if buff:
self._log_streamline(host, line)
self._log_streamline(
host, line.decode("utf-8", "backslashreplace"))
def _log_streamline(self, host, line):
if "[Zuul] Task exit code" in line: