Since we are working towards python3 support, lets rename nodepool.py
to launcher.py to make relative imports nicer, otherwise we'd have to
use:
from . import foo
Change-Id: Ic38b6a8c2bf25d53625e159cb135b71d383b700c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
In order to support putting less things into images via puppet in Infra,
we'd like to be able to pre-populate our clouds with keypairs for the
infra-root accounts and have nova add those at boot time.
Change-Id: I9e2c990040342de722f68de09f273005f57a699f
It's possible that it's easier for a nodepool user to just specify a
name or id of a flavor in their config instead of the combo of min-ram
and name-filter.
In order to not have two name related items, and also to not have the
pure flavor-name case use a term called "name-filter" - change
name-filter to flavor-name, and introduce the semantics that if
flavor-name is given by itself, it will look for an exact match on
flavor name or id, and if it's given with min-ram it will behave as
name-filter did already.
Change-Id: I8b98314958d03818ceca5abf4e3b537c8998f248
This was a temporary measure to keep production nodepool from
deleting nodes created by v3 nodepool. We don't need to carry
it over.
This is an alternative to: https://review.openstack.org/449375
Change-Id: Ib24395e30a118c0ea57f8958a8dca4407fe1b55b
The nodepool_id feature may need to be removed. I've kept it to simplify
merging both now and if we do it again later.
A couple of the tests are disabled and need reworking in a subsquent
commit.
Change-Id: I948f9f69ad911778fabb1c498aebd23acce8c89c
Nova has an API call that can fetch the list of available AZs. Use it to
provide a default list so that we can provide sane choices to the
scheduler related to multi-node requests rather than just letting nova
pick on a per-request basis.
Change-Id: I1418ab8a513280318bc1fe6e59301fda5cf7b890
shade/occ have a force-ipv4 setting which can be used to change
autodetected behavior, but also have detection for ipv6 viability.
This makes us aggressively use IPv6 and only us v4 if v6 is not
available or has been explicitly disabled. Yay us.
Incidentally, this should also help people use zuul in places that are
completely non-public - as a zuul running in a cloud with a private
network on it and spinning up nodes that only have private networks
means public_v4 won't really have anything in it - but clouds.yaml
supports a private=True setting which will cause the private ip to be
listed as the ip that is desired.
Change-Id: I2b4d992e3b21c00cefe98023267347c02dd961dc
We weren't doing anything with statsd in tests. Port over the
fake statsd from Zuul and use it to verify that we exit some
stats.
Fix parts of the stats emission that were broken.
Change-Id: I027e67b928bd28372bef8ab147c7ed5841009caf
As we move forward with zuulv3, we no longer need to ability to SSH
into a node from nodepool-launcher. This means we can remove SSH
private keys from production server. Now we only keyscan the node and
pass the info to zuul to do SSH operations.
We also create out own socket now for paramiko, so we can better
control the exception handling.
Change-Id: I123631aa41fd3db374ef78cf97a8b8afde93f699
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
We currently don't read anything from the secure file, so requiring
it seems pointless and confusing.
Change-Id: I1ab809d41bbfe709cd4ee34cbc9c481eed993868
There is a bug in the request handler at quota where if: the request handler
runs but must pause due to quota while still needing more than one node, and
then a single node becomes available and the handler runs again and causes
a node to be launched but then must wait for another node to become available,
the handler will never unpause.
This is because nodes that it launches are not added to the handler's nodeset
until after the entire request is handled (they are added by the poll method).
However, nodes that are allocated to the request from ready node stock are
added to the nodeset. The current nodeset is used to determine whether more
nodes are needed. Because the nodes from the recent launches are not part of
the nodeset, they are still counted as being "needed", and so the request
handler continues to wait for more slots to become available.
The fix is to add the newly requested node to the node set immediately
when it is requested rather than when it becomes READY in the poll()
method. This should be safe since any node failures causes the entire
request to be failed.
Co-Authored-By: David Shrewsbury <shrewsbury.dave@gmail.com>
Change-Id: I88c682807b395fc549f7c698d0c42c888dab2bc2
Found an issue where we were not unlocking the node request if it
disappeared on us. This caused the request lock cleanup to fail b/c
it remained lock.
Also, let's catch cleanup errors individually so that each phase has
a chance to run, independent of errors from other phases.
Also add recursive=True to the request lock delete.
Change-Id: I12c79b7725460eae5a27063523f3fa2e19e6bc59
After some discussion, it was decided to create a 2nd thread
specifically to cleanup our nodes, which could be less agressive then
our DeleteNodeWorker interval. This will reduce the pressure we place
on clouds looking for leaked nodes.
Change-Id: I3f1a482eaa43ea7943cfa5d8b74530cd34d251b3
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
In a follow patch, we'll be spliting DeleteNodeWorker into 2 threads,
one more agressive then another. BaseCleanupWorker allows us to share
functions between them.
Change-Id: I82016e98cb6fc1a8f024dfe30938eb0097e8ce98
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
Since AZ may not be available immediately after the create request.
Fill it in after it becomes active.
Change-Id: Id88c23b73ef6e28872c9083e57e70f9b23064422
We were only unpausing the paused handler if we created a new
node. We should also unpause when we grab an existing ready node.
Change-Id: Ida416a0cf50572b3f9510d74e52efef958c3af5b
Entirely possible we could end up in a situation where a node has been
allocated to a request (it's allocated_to attribute is set), but the
request has gone missing. This would leave the node as unavailable for
other requests. Add a cleanup phase that resets the allocation.
Change-Id: Ie0e1799c97f0d0e1b69d8d5d8551a831f1ca1bbc
Terminating nodepool-launcher could leave requests in the PENDING
state. We were never attempting to rehandle these, so they were
effectively lost. This adds code to reset them to REQUESTED and
allows them to be processed as new requests. Any nodes allocated
to them from the previous handling will be deallocated and will
effectively become available for any requests.
Change-Id: I977e3a695130e7d229fbd49292852ab7e2d75018
This logging line is very noisy when instances without this attribute
exist in the provider. Since we don't really care about those instances,
don't bother logging this.
Change-Id: I6c4811b574e32356c755db5ecdda9e18113d6786
We've been seeing some random test failures where paused handlers
never unpause. I believe this may be the cause. When looping through
the request's node types, we never took into consideration nodes that
we've already put into our node set (if it had paused to wait for
nodes). This would cause the handler code to try to grab more nodes
than was required to satisfy the request. Since some of the tests
limit max-servers to a very low number, this could cause the test to
hang.
Change-Id: Ifb87563061de152ee2407b02845044ab06648a7c
Like the previous commit, we can move this process into zuulv3 and use
ansible.
Change-Id: I49f84c3e633a601f05977cc9dca5a5b37769ed2f
Signed-off-by: Paul Belanger <pabelanger@redhat.com>