Commit Graph

462 Commits

Author SHA1 Message Date
Paul Belanger 841f120ff6
Rename nodepool.py to launcher.py
Since we are working towards python3 support, lets rename nodepool.py
to launcher.py to make relative imports nicer, otherwise we'd have to
use:

  from . import foo

Change-Id: Ic38b6a8c2bf25d53625e159cb135b71d383b700c
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-05-17 15:19:47 -04:00
Jenkins dabb6d0a96 Merge "Fix CleanupWorker exception messages" into feature/zuulv3 2017-05-12 14:33:16 +00:00
David Shrewsbury 566a690e9e Fix CleanupWorker exception messages
Change-Id: Ia996adf642e39aa9a0ec7ee54e3a35ac8875d85b
2017-04-28 08:44:37 -04:00
Monty Taylor 8037855400 Add support for specifying key-name per label
In order to support putting less things into images via puppet in Infra,
we'd like to be able to pre-populate our clouds with keypairs for the
infra-root accounts and have nova add those at boot time.

Change-Id: I9e2c990040342de722f68de09f273005f57a699f
2017-04-27 13:49:37 -07:00
Monty Taylor 642f14c076 Add ability to select flavor by name or id
It's possible that it's easier for a nodepool user to just specify a
name or id of a flavor in their config instead of the combo of min-ram
and name-filter.

In order to not have two name related items, and also to not have the
pure flavor-name case use a term called "name-filter" - change
name-filter to flavor-name, and introduce the semantics that if
flavor-name is given by itself, it will look for an exact match on
flavor name or id, and if it's given with min-ram it will behave as
name-filter did already.

Change-Id: I8b98314958d03818ceca5abf4e3b537c8998f248
2017-04-27 13:44:25 -07:00
Jenkins ced07f8d43 Merge "Remove unused timing constants" into feature/zuulv3 2017-04-03 14:07:14 +00:00
David Shrewsbury 92f375c70b Remove support for nodepool_id
This was a temporary measure to keep production nodepool from
deleting nodes created by v3 nodepool. We don't need to carry
it over.

This is an alternative to: https://review.openstack.org/449375

Change-Id: Ib24395e30a118c0ea57f8958a8dca4407fe1b55b
2017-03-30 12:08:04 -04:00
Jenkins 73f3b56376 Merge "Merge branch 'master' into feature/zuulv3" into feature/zuulv3 2017-03-30 16:03:36 +00:00
Jenkins 0e3eeeb1a5 Merge "Fetch list of AZs from nova if it's not configured" into feature/zuulv3 2017-03-30 15:31:39 +00:00
Joshua Hesketh 94f33cb666 Merge branch 'master' into feature/zuulv3
The nodepool_id feature may need to be removed. I've kept it to simplify
merging both now and if we do it again later.

A couple of the tests are disabled and need reworking in a subsquent
commit.

Change-Id: I948f9f69ad911778fabb1c498aebd23acce8c89c
2017-03-30 21:46:15 +11:00
Monty Taylor 19e8f2788c
Fetch list of AZs from nova if it's not configured
Nova has an API call that can fetch the list of available AZs. Use it to
provide a default list so that we can provide sane choices to the
scheduler related to multi-node requests rather than just letting nova
pick on a per-request basis.

Change-Id: I1418ab8a513280318bc1fe6e59301fda5cf7b890
2017-03-29 13:09:50 -05:00
Tobias Henkel 8d572b28bd Remove unused timing constants
Change-Id: I83b846a15e1f680409af1966a86a75b6cde4e0db
2017-03-28 14:59:27 +02:00
Jenkins bd10cff49c Merge "Remove ipv6-preferred and rely on interface_ip" into feature/zuulv3 2017-03-27 20:45:24 +00:00
Jenkins 440147835d Merge "Exercise statsd in tests and fix" into feature/zuulv3 2017-03-27 20:45:17 +00:00
Monty Taylor 34cabe207a Remove ipv6-preferred and rely on interface_ip
shade/occ have a force-ipv4 setting which can be used to change
autodetected behavior, but also have detection for ipv6 viability.
This makes us aggressively use IPv6 and only us v4 if v6 is not
available or has been explicitly disabled. Yay us.

Incidentally, this should also help people use zuul in places that are
completely non-public - as a zuul running in a cloud with a private
network on it and spinning up nodes that only have private networks
means public_v4 won't really have anything in it - but clouds.yaml
supports a private=True setting which will cause the private ip to be
listed as the ip that is desired.

Change-Id: I2b4d992e3b21c00cefe98023267347c02dd961dc
2017-03-27 11:35:25 -07:00
James E. Blair 1a1521b489 Exercise statsd in tests and fix
We weren't doing anything with statsd in tests.  Port over the
fake statsd from Zuul and use it to verify that we exit some
stats.

Fix parts of the stats emission that were broken.

Change-Id: I027e67b928bd28372bef8ab147c7ed5841009caf
2017-03-27 11:35:24 -07:00
Jenkins ec44794836 Merge "Update nodepool config syntax" into feature/zuulv3 2017-03-27 18:33:26 +00:00
James E. Blair dcc3b5e071 Update nodepool config syntax
This implements the changes described in:

http://lists.openstack.org/pipermail/openstack-infra/2017-January/005018.html

It also removes some, but not all, extraneous keys from test config files.

Change-Id: Iebc941b4505d6ad46c882799b6230eb23545e5c0
2017-03-27 09:34:02 -07:00
Jenkins bf57e0f69a Merge "Do not require secure file for nodepoold" into feature/zuulv3 2017-03-27 16:23:01 +00:00
Jenkins a7887180dc Merge "Add check for valid zk attribute before disconnect" into feature/zuulv3 2017-03-27 16:22:59 +00:00
Jenkins a953afd751 Merge "Remove SSH support from nodepool" into feature/zuulv3 2017-03-27 15:22:34 +00:00
Paul Belanger d0c25fc333 Remove SSH support from nodepool
As we move forward with zuulv3, we no longer need to ability to SSH
into a node from nodepool-launcher. This means we can remove SSH
private keys from production server. Now we only keyscan the node and
pass the info to zuul to do SSH operations.

We also create out own socket now for paramiko, so we can better
control the exception handling.

Change-Id: I123631aa41fd3db374ef78cf97a8b8afde93f699
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-24 11:44:58 -04:00
David Shrewsbury 7c1c7ed0c6 Do not require secure file for nodepoold
We currently don't read anything from the secure file, so requiring
it seems pointless and confusing.

Change-Id: I1ab809d41bbfe709cd4ee34cbc9c481eed993868
2017-03-23 08:37:06 -04:00
David Shrewsbury e9272a8b98 Add check for valid zk attribute before disconnect
Change-Id: Iaecdd25c23d6524986adeb9a4edfffbbf2c014e5
2017-03-23 08:25:09 -04:00
James E. Blair b48e8ad4ec Fix test_node_assignment_at_quota
There is a bug in the request handler at quota where if: the request handler
runs but must pause due to quota while still needing more than one node, and
then a single node becomes available and the handler runs again and causes
a node to be launched but then must wait for another node to become available,
the handler will never unpause.

This is because nodes that it launches are not added to the handler's nodeset
until after the entire request is handled (they are added by the poll method).
However, nodes that are allocated to the request from ready node stock are
added to the nodeset.  The current nodeset is used to determine whether more
nodes are needed.  Because the nodes from the recent launches are not part of
the nodeset, they are still counted as being "needed", and so the request
handler continues to wait for more slots to become available.

The fix is to add the newly requested node to the node set immediately
when it is requested rather than when it becomes READY in the poll()
method. This should be safe since any node failures causes the entire
request to be failed.

Co-Authored-By: David Shrewsbury <shrewsbury.dave@gmail.com>
Change-Id: I88c682807b395fc549f7c698d0c42c888dab2bc2
2017-03-21 12:55:47 +00:00
David Shrewsbury 0e9188d1b5 Unlock request if it disappears
Found an issue where we were not unlocking the node request if it
disappeared on us. This caused the request lock cleanup to fail b/c
it remained lock.

Also, let's catch cleanup errors individually so that each phase has
a chance to run, independent of errors from other phases.

Also add recursive=True to the request lock delete.

Change-Id: I12c79b7725460eae5a27063523f3fa2e19e6bc59
2017-03-20 09:35:16 -04:00
Jenkins 44acc6e5f6 Merge "Split DeleteNodeWorker into two threads" into feature/zuulv3 2017-03-17 21:30:04 +00:00
Jenkins 61aa026981 Merge "Create BaseCleanupWorker class" into feature/zuulv3 2017-03-17 21:28:40 +00:00
Jenkins 1b0b0e6ad4 Merge "Rename NodeCleanupWorker to DeletedNodeWorker" into feature/zuulv3 2017-03-17 21:26:03 +00:00
Paul Belanger 7d2c51f164 Split DeleteNodeWorker into two threads
After some discussion, it was decided to create a 2nd thread
specifically to cleanup our nodes, which could be less agressive then
our DeleteNodeWorker interval.  This will reduce the pressure we place
on clouds looking for leaked nodes.

Change-Id: I3f1a482eaa43ea7943cfa5d8b74530cd34d251b3
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-17 10:04:22 -04:00
Paul Belanger 3f8c35397f Create BaseCleanupWorker class
In a follow patch, we'll be spliting DeleteNodeWorker into 2 threads,
one more agressive then another. BaseCleanupWorker allows us to share
functions between them.

Change-Id: I82016e98cb6fc1a8f024dfe30938eb0097e8ce98
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-17 09:57:58 -04:00
Paul Belanger f244f23f96 Rename NodeCleanupWorker to DeletedNodeWorker
Change-Id: I9916ffac393571da164161db6fd377b15fbc76c6
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-17 09:57:32 -04:00
Jenkins 02d81cb6b8 Merge "Don't try and delete nodes with no external_id" into feature/zuulv3 2017-03-16 23:59:47 +00:00
David Shrewsbury 774f38ef35 Set node AZ after we're done waiting for it.
Since AZ may not be available immediately after the create request.
Fill it in after it becomes active.

Change-Id: Id88c23b73ef6e28872c9083e57e70f9b23064422
2017-03-16 17:46:23 -04:00
David Shrewsbury 61e01cd291 Unpause when we grab a pre-ready node
We were only unpausing the paused handler if we created a new
node. We should also unpause when we grab an existing ready node.

Change-Id: Ida416a0cf50572b3f9510d74e52efef958c3af5b
2017-03-16 13:19:27 -04:00
Jenkins e1c26d56b5 Merge "Deallocate ready nodes with no requests" into feature/zuulv3 2017-03-16 14:42:07 +00:00
Jenkins 08e25490bf Merge "Reset lost requests" into feature/zuulv3 2017-03-16 14:40:54 +00:00
Jenkins 7be911c167 Merge "Populate requestor for min-ready requests" into feature/zuulv3 2017-03-15 19:44:53 +00:00
David Shrewsbury 4de321c0a5 Deallocate ready nodes with no requests
Entirely possible we could end up in a situation where a node has been
allocated to a request (it's allocated_to attribute is set), but the
request has gone missing. This would leave the node as unavailable for
other requests. Add a cleanup phase that resets the allocation.

Change-Id: Ie0e1799c97f0d0e1b69d8d5d8551a831f1ca1bbc
2017-03-15 14:31:01 -04:00
David Shrewsbury 6fec0c71a0 Reset lost requests
Terminating nodepool-launcher could leave requests in the PENDING
state. We were never attempting to rehandle these, so they were
effectively lost. This adds code to reset them to REQUESTED and
allows them to be processed as new requests. Any nodes allocated
to them from the previous handling will be deallocated and will
effectively become available for any requests.

Change-Id: I977e3a695130e7d229fbd49292852ab7e2d75018
2017-03-15 14:04:22 -04:00
Jenkins f03920ea44 Merge "Stop writing nodepool bash variable on nodes" into feature/zuulv3 2017-03-15 14:52:20 +00:00
Jenkins c1b6c49aa1 Merge "Remove ready-script support" into feature/zuulv3 2017-03-15 14:52:14 +00:00
David Shrewsbury 6e0a65ac4a Populate requestor for min-ready requests
It sometimes helps to be able to easily identify these requests.

Change-Id: I3c33c5bf7a984c95c954724443472bf9f354b474
2017-03-15 09:55:39 -04:00
David Shrewsbury bb8ac0bd89 Remove noisy log line
This logging line is very noisy when instances without this attribute
exist in the provider. Since we don't really care about those instances,
don't bother logging this.

Change-Id: I6c4811b574e32356c755db5ecdda9e18113d6786
2017-03-15 09:48:42 -04:00
Jamie Lennox f63cac138e Remove remaining apscheduler variables
Remove the last unused apsched variable as it is not being used anymore.

Change-Id: I3155197de1b3e5763e05893de46eee7ed3043f93
2017-03-15 13:12:20 +11:00
Jenkins 99e8b872a4 Merge "Fix for unpaused request handlers" into feature/zuulv3 2017-03-14 20:58:55 +00:00
Jenkins 98ab8df294 Merge "Fix race on node state check in node cleanup" into feature/zuulv3 2017-03-14 20:58:18 +00:00
Jenkins 232c8b9c9a Merge "Remove the --no-delete option from nodepool" into feature/zuulv3 2017-03-14 20:24:19 +00:00
David Shrewsbury c4bcfd8538 Fix for unpaused request handlers
We've been seeing some random test failures where paused handlers
never unpause. I believe this may be the cause. When looping through
the request's node types, we never took into consideration nodes that
we've already put into our node set (if it had paused to wait for
nodes). This would cause the handler code to try to grab more nodes
than was required to satisfy the request. Since some of the tests
limit max-servers to a very low number, this could cause the test to
hang.

Change-Id: Ifb87563061de152ee2407b02845044ab06648a7c
2017-03-14 15:56:03 -04:00
Paul Belanger ad68eb827d Stop writing nodepool bash variable on nodes
Like the previous commit, we can move this process into zuulv3 and use
ansible.

Change-Id: I49f84c3e633a601f05977cc9dca5a5b37769ed2f
Signed-off-by: Paul Belanger <pabelanger@redhat.com>
2017-03-14 12:40:44 -04:00