Commit Graph

45 Commits

Author SHA1 Message Date
James E. Blair 179fa02ed0 Build a new skopeo for the zuul-executor container image
New versions of docker are no longer compatible with old versions
of skopeo.  To correct this, build a new version of skopeo for
the container images.  We need 1.14+ which is not available in
debian yet, so we build 1.15 (the latest tagged release) from
source.

Change-Id: I5a5c351e90b06d3acdd02f3117aa29eafb72445e
2024-03-21 12:48:32 -07:00
Simon Westphahl e3104f3e5c
Prevent exception when getting namespace PIDs
ERROR zuul.AnsibleJob: [e: ...] [build: ...] Unable to list namespace pids
Traceback (most recent call last):
  File "/opt/zuul/lib/python3.11/site-packages/zuul/executor/server.py", line 2868, in runAnsible
    ns, pids = context.getNamespacePids(self.proc)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/zuul/lib/python3.11/site-packages/zuul/driver/bubblewrap/__init__.py", line 89, in getNamespacePids
    for child in pid_to_child_list.get(proc.pid):
TypeError: 'NoneType' object is not iterable

Change-Id: Ic8f4daac064da20b921189774d31859424a21fa0
2023-09-15 10:22:29 +02:00
James E. Blair 1c92165ab7 List process ids in bwrap namespace
If the kernel kills a process due to an out of memory error, it
can be difficult to track the process back to the build that triggered
it.  The kernel error just gives us a PID, but we don't know any of
the Ansible process ids.  Further, since they are in bwrap, Ansible
only knows its namespaced pid rather than the host pid, so we can't
simply output it in one of our callback plugins.

To aid in debugging, output all of the process ids within a namespace
right at the start of an ansible-playbook execution.  At this time,
it is certain that the Ansible process will have started, and it is
very likely that it is still running.  That should provide a way
to map from an OOM message back to an Ansible process id.

(Note that Ansible forks and this is unlikely to catch any forked
processes, so we will only see the main Ansible process id.  Typically
this is what the kernel should elect to kill, but if it does not,
we may need a futher change to repeat this process each time Ansible
forks.  Since that is more costly, let's see if we can avoid it.)

Change-Id: I9f262c3a3c5410427b0fb301cb4f1697b033ba2f
2023-06-28 13:31:06 -07:00
Clark Boylan ff166e3ea9 Document the source of the afs 0x40084301 ioctl magic number
During debugging of ioctl failures one of the things we explored was
that this magic number may no longer be correct. Turns out it is
correct, but documenting the source of this value may aid future
debugging.

Change-Id: I87504ee5763bbdc819e68f9defee3df5277eec51
2023-06-05 10:54:46 -07:00
James E. Blair eb550597b0 Use os.open with setpag
When we open the ioctl file to run the openafs setpag syscall,
we previously used the high-level open method, which apparently
issues an unwanted TCGETS ioctl which crashes the program with
a kernel error under certain versions of python+openafs+linux
(3.10.6, 1.8.9, 5.15.0, respectively).

Switch to a low-level open to avoid this call.

Change-Id: I5e08a6020cf6cd4ad2a0084effb697aa39dae9c6
2023-06-05 10:25:00 -07:00
Clark Boylan c1b0a00c60 Only check bwrap execution under the executor
The reason for this is that containers for zuul services need to run
privileged in order to successfully run bwrap. We currently only expect
users to run the executor as privilged and the new bwrap execution
checks have broken other services as a result. (Other services load the
bwrap system bceause it is a normal zuul driver and all drivers are
loaded by all services).

This works around this by add a check_bwrap flag to connection setup and
only setting it to true on the executor. A better longer term followup
fixup would be to only instantiate the bwrap driver on the executor in
the first place. This can probably be accomplished by overriding the
ZuulApp configure_connections method in the executor and dropping bwrap
creation in ZuulApp.

Temporarily stop running the quick-start job since it's apparently not
using speculative images.

Change-Id: Ibadac0450e2879ef1ccc4b308ebd65de6e5a75ab
2023-05-17 13:45:23 -07:00
Clark Boylan 0937872119 Use bwrap --disable-userns if possible
Newer bwrap has added the ability to disable additional nested user
namespace creation from with the bwrap execution context. Take advantage
of this feature in Zuul if we are able to in order to fortify Zuul's
security position.

In particular we need two conditions to take advantage of this. 1) bwrap
must be new enough to support the feature (>=0.8.0) and 2) we must be
running with user namespaces enabled. We explicitly check for both
conditions and add the appropriate invocation flags to bwrap when the
conditions are met.

Change-Id: Idf933a0847cb8570b551892186ca9c0057be127f
2023-05-16 10:12:21 -07:00
Clark Boylan 4ea5c621b9 Set default SSH_AUTH_SOCK in zuul-bwrap command
The zuul-bwrap command is useful for debugging things under the zuul
bwrap environment. Unfortunately, the way things are written it assumes
there will be an SSH_AUTH_SOCK. For much debugging you might manually do
in this environment an SSH_AUTH_SOCK is unnecessary. Instead of throwing
a obtuse error simply set the value to /dev/null if not otherwise set.

Change-Id: Iec0ee93c6e6b1b647a27c9a7fdf280d14d5d2596
2022-09-29 08:41:56 -07:00
James E. Blair a190e35bb8 Add a note about bwrap and setsid
https://github.com/containers/bubblewrap/issues/142 is relevant to
us, however our use of start_new_session in popen effectively
avoids the issue.  Add a note to that effect so that we don't
accidentally open a vulnerability later.

Also, clean up some py2-only code.

Change-Id: Icd4adee32f35c478661dc2d657cf6c9e55e1f7b5
2022-03-28 15:44:19 -07:00
Albin Vass 39305393c0 Drop ambient capabilities when running bwrap
Having ambient capabilties causes bwrap to error on start [1]
unless the bwrap executable also has the setuid bit set or is run as
root.

This can cause issues in openshift or podman unless ambient
capabilities are dropped [2].

[1] - bae85baf72/bubblewrap.c (L742)
[2] - https://github.com/containers/bubblewrap/issues/380

Change-Id: I15455fb400448d7672638f911d6cf045fa683a9b
2021-11-01 19:13:37 +01:00
Paul Belanger 927857082b Stop bind mounting zuul dir into bwrap
Once we landed the multi-ansible spec, we no longer need to include the
zuul directory where zuul-executor is run from. This is because we now
install ansible into its own virtualenv.

Change-Id: I35c66d7249841e32478b26b60d6e840fe3f2750d
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2019-06-22 18:36:21 -04:00
Tobias Henkel 74c1ba73ba
Mount tmpfs on ansible tmp dir
We explicitly set the ansible local_tmp dir to {work}/tmp. Since
ansible writes many small files in there we should mount a tmpfs there
to save iops.

Change-Id: Ia17d9dac8e7f5d8fb8e294c37a7b0a6621ee7c7c
2019-06-04 14:09:15 +02:00
Tristan Cacqueray 6fd6b6b57d bubblewrap: bind mount /etc/subuid
This file may be required by recent containers tool when doing unshare
actions.

Also make the image build jobs non-voting temporarily since they are
broken by the issue this change fixes.

Also, pin docker image to 2.16.8 for quick-start (squashed in here
to be able to merge again):

The new version 3.0.0 needs some configuration adjustment, git-review is
failing with:
  remote: error: branch refs/publish/master:
  remote: You need 'Create' rights to create new references.
  remote: User: user
  remote: Contact an administrator to fix the permissions

Change-Id: Iab45bf2322edf8a10d2d41a1fc9a098e17a39ea7
2019-05-16 09:33:16 +02:00
Monty Taylor 7fe0e780cf
Build zuul containers with dockerfile not pbrx
While pbrx is nice and all, it's quite the divergence from how
the rest of the container ecosystem works. Switch to using
Dockerfile and the python-builder image.

Bind mount ld.so.cache into bwrap context

When using images based on the python:slim base image, python
is installed in /usr/local and the linker needs to know to look
in /usr/local/lib for shared libraries.

Depends-On: https://review.openstack.org/632187
Change-Id: I84f6dd2a8e3222f7807103dcbb61bdadedfdd22d
2019-01-24 16:11:31 +00:00
Andreas Jaeger d9059524e0 Fix flake 3.6.0 warnings
flake 3.6.0 introduces a couple of new tests, handle them in the zuul
base:

* Disable "W504 line break after binary operator", this is a new warning
  with different coding style.
* Fix "F841 local variable 'e' is assigned to but never used"
* Fix "W605 invalid escape sequence" - use raw strings for regexes.
* Fix "F901 'raise NotImplemented' should be 'raise
  NotImplementedError'"
* Ignore "E252 missing whitespace around parameter equals" since it
  reports on parameters like:
  def makeNewJobs(self, old_job, parent: Job=None):

Change "flake8: noqa" to "noqa" since "flake8: noqa" is a file level
noqa and gets ignored with flake 3.6.0 if it's not at beginning of line
- this results in many warnings for files ./zuul/driver/bubblewrap/__init__.py and
./zuul/cmd/migrate.py. Fix any issues there.

Change-Id: Ia79bbc8ac0cd8e4819f61bda0091f4398464c5dc
2018-10-28 16:39:30 +01:00
Tobias Henkel 5a4db84e5a
Log cpu times of ansible executions
We need to be able to compare and discover ansible performance
regressions or improvements of ansible. Currently we have no way of
detecting changes there other than observing the overall system load
of executors. One way to get some metrics is to log the cpu times used
by individual ansible runs and the sum of them over the whole job
execution. With this one could grab that data from the log and analyse
them.

Change-Id: Ib0b62299c741533f0d1615f67eced9601498f00d
2018-07-14 10:32:06 +02:00
Fabien Boucher 0e01048069 Add /etc/localtime to bubblewrap default ro bind
This change lets programs running on the executor discover
the system default timezone.

Change-Id: Icc28d2103fe663b27a0842cd36efc6eeb38caa2b
2018-06-26 13:41:32 +02:00
Tobias Henkel ee9c392b40
Add standard ca certificate paths
When using the uri module in a base job it cannot validate ssl certs
unless you add the ca certificate paths to (un)trusted-ro-paths. This
seems a common use case so it makes sense to mount them into the bwrap
context by default if they are existing.

Change-Id: I2277374cdb8455dd9e39222ef0ecbab4c8ac786e
2018-03-16 16:34:18 +01:00
James E. Blair 1b22179d20 Add /etc/alternatives to bwrap
On some systems, some fairly fundamental binaries route through
here.

Change-Id: I6258fbe8e7a4728bf85a6b918cf6518d2643d5ed
2017-08-31 10:10:38 -07:00
James E. Blair d5f7b74588 Add proc to bubblewrap
And set the AFS pag.

We would like to use AFS within our playbooks (generally in trusted
jobs on the executor).  Ideally, such usage should be, like everything
else in bubblewrap, completely separate from any other processes.
However, by default OpenAFS stores authentication credentials by UID,
meaning that once any process obtained tokens, any other process on
the executor would be able to use them.

Fortunately, the concept of a PAG (process authentication group) helps
us out here.  That scopes tokens to a single process and its children.

Normally this is done by PAM when a user logs in, but there is an ioctl
that we can use to request a new PAG at any time.  It is this method that
we use to ensure each ansible process runs in its own PAG.

When a new PAG is created, it is actually bound to the *thread* that
created it.  Because of this, we don't need to be concerned with thread
synchronization around PAG creation.  This is useful in the executor which
has potentially hundreds of threads in various stages of preparing to
execute a subprocess.  It is sufficient to request the new PAG at any time
before the Popen call, and that thread will use it during the next
invocation.

The --proc argument is added to the bubblewrap invocation in order to
permit aklog to run (it needs to access /proc/fs/openafs/afs_ioctl
in order to store the tokens).

Change-Id: I2687629f964af11c9da261875f2ec735082b8836
2017-08-24 16:37:54 -07:00
James E. Blair d6a71ca2b4 Write secrets to tmpfs
So that we may avoid writing the decrypted contents of secrets to
disk, write them to a file in a tmpfs.

Change-Id: I7c029b67d0fc2fa3827dc811137dd4f3a90706d8
2017-08-19 08:08:19 -07:00
James E. Blair ce56ff9756 Add wrapper driver execution context
We recently began altering the mount map used by the wrapper driver
for each execution run (so that we can only include the current
playbook).  However, the setMountsMap method operates on the global
driver object rather than an object more closely bound to the lifetime
of the playbook run.  The fact that this works at all is just luck
(executing process is slow enough that hitting a race condition where
the wrong directories are mounted is unlikely).

To correct this, add a new layer which contains the context for the
current playbook execution.

Change-Id: I3a06f19e88435a49c7b9aea4e1221b812f5a43d0
2017-08-18 16:35:12 -07:00
Paul Belanger 5d993ed71d Bindmount /etc/lsb-release into bubblewrap
Things like pip use lsb_release, so it is helpful to include this in
bubblewrap.  This conditionally includes similar files on both
debuntu and fedora.

Change-Id: Ibfed3ace26163da6484966e348e757f7268811f0
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Co-Authored-By: James E. Blair <jeblair@redhat.com>
2017-08-10 16:45:31 -07:00
James E. Blair 892cca6afa Bind secrets to their playbooks
Secrets are proving less useful than originally hoped because they
can not be effectively used in any jobs with untrusted children.

This change binds the secrets to the playbooks which use them, so
that child jobs are unable to access the secrets.  This allows us
to create jobs with pre/post playbooks which use secrets which
are suitable for other jobs to inherit from.

Change-Id: I67dd12563f3abd242d6356675afed1de0cb144cf
2017-08-10 09:13:46 -07:00
Monty Taylor 01380dd885
Change name and document the bind_mount config paths
The content in these can be a file or a directory - so _dirs is
confusing. Change it to _paths and document it.

Change-Id: Ida38766cd3d440d75a6dc55035a54e0804e03760
2017-07-28 17:30:45 -05:00
Monty Taylor b41a5d9e8f
Replace singleton lists with None defaults
Hit an issue trying to run zuul-bwrap locally and I didn't pass
--ro-bind or --rw-bind which meant setMountsMap was being passed None
values. In fixing that, it seems to me that perhaps the list values were
not intentionally singletons.

However - it's possible they were and I'm breaking some intended logic
here.

Change-Id: I3a30bd3d4439c27483c45f86d3d9ae1741a40a38
2017-07-28 16:04:19 -05:00
Jenkins 6d9385829b Merge "Use mypy to do static type checking" into feature/zuulv3 2017-07-28 03:58:33 +00:00
Monty Taylor fb8f5a44bd
Use mypy to do static type checking
python3 includes support for optional type annotations which can be used by
static analysis tools to perform type checking. The mypy tool is a
static type checking tool that can also infer type information in many
cases, but which will use explicit type information if it is present.

Add mypy to test-requirements and to the pep8 job so that our pep8 job
can do more analysis work and less with the code style.

To support this, there were a few places in the current codebase that
needed an explicit type hint. For variables/attributes in 3.5 this is done via
comments. There is a conditional import that was confusion that just got
marked with an 'ignore'.

Our ansible action and lookup plugins confuse mypi with the way they
import the ansible base classes. That's ok - they confuse us with that
too. The .pyi files are 'typeshed' files, which are a way that one can
provide static type annotations without putting the information into the
file itself. mypy will always prefer a .pyi file over a .py file (since
the point of them is to be external annotion/interface description) So
in order to get mypy to not barf on the ansible import weirdness, just
add a corresponding empty .pyi file. We could potentially actually put
interface descriptions in them - but I don't think there is very much
value in that.

It should be amusing to at least someone that we have to flake8: noqa
an import from typing that was done to provide a type hint in a comment.

Change-Id: I6c4ac3dcfc6fd990e6c6886749de147ad28389d1
2017-07-27 14:34:07 -05:00
Jamie Lennox 7655b5550f Allow loading additional variables file for site config
It would be useful to allow deployment specific configuration that can
be fed into the project-config deployments so that we can customize
things like host ip without having to change job definitions for each
site.

Also, add a method to display the build log from a failed assertion in
the Ansible test (this was used in the development of the tests for
this change).

Change-Id: I87e8bffc540bcafab543c46244f3d5327b56fcae
Co-Authored-By: James E. Blair <jeblair@redhat.com>
2017-07-25 07:27:19 -07:00
James E. Blair 69eab24d1d Remove state_dir from setMountsMap
The setMountsMap command required the state_dir argument, presumably
so that the zuul ansible path (ie, our custom modules) is available.

Unfortunately, it set it as a read-write bind, not read-only.  We
certainly don't want jobs (even trusted jobs) modifying the ansible
code that we run.

Switch it to a read-only bind mount.

Also, remove it from special handling inside of the setMountsMap
method and instead, handle it on the executor site for increased
visibility.

Finally, add options to the zuul-bwrap command to set the ro and
rw binds to make interactive testing easier.

Change-Id: I4a0fdae546a2307d78a5c29b5a62a6d223ecb9e9
2017-07-24 14:45:31 -07:00
Tristan Cacqueray a19e8c57c7 Add /etc/hosts and /etc/nsswitch.conf to the bubblewrap
This change adds dns resolution helpers to the bubblewrap so that
hosts locally defined are resolvable in executor playbooks.

Change-Id: I5efad8749ff25cdbe6a142f9616422d96b7bbf33
2017-07-13 06:29:34 +00:00
Tobias Henkel 7206a511e4 Optionally bind /lib64
On some systems like alpine the /lib64 directory doesn't exist. Bind that
conditionally.

Change-Id: I504f140524421770b2512182e83c7da1e89e3378
2017-07-07 15:33:54 +02:00
James E. Blair 2ee4770337 Don't automatically mount user home in executor
We're starting to treat the work directory as a substitute home
directory (we put .ssh/ into it, for example), and we set $HOME
to that directory.  Complete this process by updating our bwrap
passwd entry to point to that as the home directory and stop
mounting the real home dir.

Change-Id: I0fdb1913634d3902cac58112c5d683f12675c6f7
2017-06-28 17:39:18 -07:00
Jenkins 34de171669 Merge "executor: run trusted playbook in a bubblewrap" into feature/zuulv3 2017-06-26 21:25:48 +00:00
Jenkins a7516afe58 Merge "bubblewrap: adds --die-with-parent option" into feature/zuulv3 2017-06-26 21:25:22 +00:00
Jenkins e9c12ee0ce Merge "Remove use of six library" into feature/zuulv3 2017-06-20 16:33:21 +00:00
Tobias Henkel 88e0305d52 Add linebreak to generated passwd/group file
For running in bwrap the /etc/passwd and /etc/group files are
generated on the fly to only show the executing user. This needs to
add a linebreak at the end. Otherwise ssh (as well as getent) cannot
read the file. In case of ssh this results in the error 'No user
exists for uid x'.

Change-Id: I0e75dd423f2ffb93da1de4dfc064ff22991f1793
2017-06-20 12:09:06 +02:00
Monty Taylor b934c1a052
Remove use of six library
It exists only for py2/py3 compat. We do not need it any more.

This will explicitly break Zuul v3 for python2, which is different than
simply ceasing to test it and no longer declaring we support it. Since
we're not testing it any longer, it's bound to degrade overtime without
us noticing, so hopefully a clean and explicit break will prevent people
from running under python2 and it working for a minute, then breaking
later.

Change-Id: Ia16bb399a2869ab37a183f3f2197275bb3acafee
2017-06-19 10:34:57 -05:00
Tristan Cacqueray 44aef15d6e executor: run trusted playbook in a bubblewrap
This change renames untrusted_wrapper to execution_wrapper and uses
bubblewrap for both trusted and untrusted playbooks by default.

This change adds new options to the zuul.conf executor section to let
operators define what directories to mount ro or rw for both context:
* trusted_ro_dirs/trusted_rw_dirs, and
* untrusted_ro_dirs/untrusted_rw_dirs

Change-Id: I9a8a74a338a8a837913db5e2effeef1bd949a49c
Story: 2001070
Task: 4687
2017-06-17 02:43:19 +00:00
Tristan Cacqueray 2438860823 bubblewrap: adds --die-with-parent option
This change ensures that no processes leak from the bubblewrapdriver.

Change-Id: Ica388ad2595cbd237d074fd54cc99d1685f6e729
2017-06-17 02:43:19 +00:00
Jenkins c95cf7fb80 Merge "Default bubblewrap to work_root" into feature/zuulv3 2017-06-15 17:10:57 +00:00
Jamie Lennox 1ef9ca67ef Show debug logging when running zuul-bwrap
If you've gotten to the point of running zuul-bwrap manually you're
almost certainly debugging a problem and so having the debug output here
helps a lot.

Change-Id: I770b5466ad15356570572b50dd64a0252ebb3b06
2017-06-14 11:08:16 +10:00
Paul Belanger bcdc4d0939
Default bubblewrap to work_root
Default chdir to jobdir.work_dir for bubblewrap and start
running our commands from there.

Change-Id: Ied3d13bc4257c669a6bbb30750f154dcf5e3b970
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-06-12 17:22:40 -04:00
Paul Belanger 9d9023f254 Add untrusted-projects ansible test
We want to properly flex our bubblewrap implementation, this job does
so.

Change-Id: I6647d71434a8d8f6621d3fd34883683ef149775a
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-06-01 18:47:18 -07:00
Clint Byrum 5870ccae62 Add support for bwrap
This will be the minimum "batteries included" bubblwrap driver. It does
not do any MAC configuration, since these vary by system. Operators
may wish to wrap it further in a MAC wrapper driver.

Because we set bubblewrap as the default wrapper, test_playbooks tests
it. However, it lacks a negative test, so we won't know if we're not
actually containing things.

Users who don't have bubblewrap or don't wish to use it can set the
untrusted_wrapper to 'nullwrap' which will just execute things as
they're done before this change.

Change-Id: I84dd7c8cc55d2110b58609784007ffda0d135716
Story: 2000910
Task: 3540
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-06-01 09:26:45 -07:00