Commit Graph

119 Commits

Author SHA1 Message Date
Brad Davidson
e61fde93c1 Fix MemberList error handling and incorrect etcd-arg passthrough
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 22:04:30 -07:00
Brad Davidson
91afb38799 Retry cluster join on "too many learners" error
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-28 11:28:33 -07:00
Brad Davidson
d95980bba3 Lock bootstrap data with empty key to prevent conflicts
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-05 10:56:57 -07:00
Brad Davidson
b010db0cff Ensure that loopback is used for the advertised address when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-04-03 17:01:43 -07:00
Brad Davidson
0c302f4341 Fix etcd member deletion
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-14 09:39:41 -08:00
Brad Davidson
3d146d2f1b Allow for multiple sets of leader-elected controllers
Addresses an issue where etcd controllers did not run on etcd-only nodes

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-10 10:46:48 -08:00
Brad Davidson
a298bfdb18 Add jitter to scheduled snapshots and retry harder on conflicts
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-01-11 14:32:03 -08:00
Derek Nola
06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Derek Nola
4c0bc8c046
Update etcd error to match correct url (#5909)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-07-29 09:40:53 -07:00
Brad Davidson
5eaa0a9422 Replace getLocalhostIP with Loopback helper method
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00
Brad Davidson
1674b9d640 Raise etcd connection test timeout to 30 seconds
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 13:23:19 -07:00
Brad Davidson
ffe72eecc4 Address issues with etcd snapshots
* Increase the default snapshot timeout. The timeout is not currently
  configurable from Rancher, and larger clusters are frequently seeing
  uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
  command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
  etcd snapshot cron schedules from stacking up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 14:41:38 -07:00
Brad Davidson
6fad63583b Only listen on loopback when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 11:25:54 -07:00
Brad Davidson
fb0a342a20 Sanitize filenames for use in configmap keys
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.

For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:54:26 -07:00
Brad Davidson
ce5b9347c9 Replace DefaultProxyDialerFn dialer injection with EgressSelector support
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-29 17:54:36 -07:00
Brad Davidson
418c3fa858
Fix issue with datastore corruption on cluster-reset (#5515)
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset

Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-27 13:44:15 -07:00
Brad Davidson
f2ceeb01d9
Fix issue with long-running apiserver endpoints watch (#5478)
Use ListWatch helpers to retry when the watch channel is closed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-21 09:24:34 -07:00
Brad Davidson
7760e2177a Bump etcd to 3.5.3-k3s1
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:53:18 -07:00
Brad Davidson
b12cd62935 Move IPv4/v6 selection into helpers
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:02:42 -07:00
Roberto Bonafiglia
9c9adda61b Added default endpoint for IPv6
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-04-14 09:58:40 +02:00
Brad Davidson
f37e7565b8 Move the apiserver addresses controller into the etcd package
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.

Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 11:28:15 -07:00
Brad Davidson
2a429aac65 Fix crash on early snapshot
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 09:23:34 -07:00
Roberto Bonafiglia
4afeb9c5c7
Merge pull request #5325 from rbrtbnfgl/fix-etcd-ipv6-url
Fixed etcd URL in case of IPv6 address
2022-04-05 09:55:42 +02:00
Roberto Bonafiglia
0746dde758 Fixed http URL on etcd
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 14:24:59 +02:00
Roberto Bonafiglia
06c779c57d Fixed loadbalancer in case of IPv6 addresses
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 11:49:30 +02:00
Brad Davidson
62cc1ed24f Skip setting up client tls when etcd server does not have tls enabled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-30 01:03:41 -07:00
Roberto Bonafiglia
dda409b041 Updated localhost address on IPv6 only setup
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-29 09:35:54 +02:00
Brad Davidson
1339626a5b Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:27:59 -07:00
Roberto Bonafiglia
2285aa699b Fixed etcd URL in case of IPv6 address
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-23 15:35:51 +01:00
Brad Davidson
078da46532 Close additional leaked GPRC clients
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-15 18:07:55 -07:00
Derek Nola
1f7abe5dbb
Testing directory and documentation rework. (#5256)
* Removed vagrant folder
* Fix comments around E2E ENVs
* Eliminate testutil folder
* Convert flock integration test to unit test
* Point to other READMEs

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-03-15 10:29:56 -07:00
Luther Monson
9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson
9a48086524 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson
e4846c92b4 Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson
555087b9b8 Add function to clear local alarms on etcd startup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 11:56:52 -08:00
Brad Davidson
5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson
2989b8b2c5 Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Derek Nola
e28be2912c
Migrate Ginkgo testing framework to V2, consolidate integration tests (#5097)
* Upgrade and convert ginkgo from v1 to v2
* Move all integration tests into integration folder
* Update TESTING.md

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-02-09 08:22:53 -08:00
Hussein Galal
13728058a4
Add k3s etcd restoration integration test (#5014)
* Add k3s etcd restoration test

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix tests and rebase

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Reorganizing the tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fixing comments

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix etcd restore

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* dont check for errors when restoring

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use eventually to test for restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2022-02-08 21:24:34 +02:00
Brian Downs
effcb15adb
Adds the ability to compress etcd snapshots (#4866) 2022-01-14 10:31:22 -07:00
Derek Nola
2ac8df3602
Integration tests utilities improvements (#4832)
* Remove sudo commands from integration tests

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Added cleanup fucntion

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Implement better int cleanup

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Rename test utils

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Enable K3sCmd to be a single string

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Removed parsePod function

Signed-off-by: Derek Nola <derek.nola@suse.com>

* codespell

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Revert startup timeout

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Reorder sonobuoy tests, drop concurrent tests to 3

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Disable etcd

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Skip parallel testing for etcd

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-01-06 08:05:56 -08:00
Brad Davidson
a5c6e6a68a Fix panic checking name of uninitialized etcd member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-21 23:38:20 -08:00
Hussein Galal
d71b335871
Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-12-14 02:04:39 +02:00
Brian Downs
a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Manuel Buil
1b3187ea07 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-07 23:09:05 +01:00
Derek Nola
d05c334a78
Improved cleanup for etcd unit test (#4537)
* Improved cleanup for etcd unit test

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-11-29 14:46:58 -08:00
Chris Kim
ae4a1a144a
etcd snapshot functionality enhancements (#4453)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-29 10:30:04 -08:00
Chris Kim
f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs
adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
Derek Nola
7c3f21e581
K3s Integration test fixes (#4341)
* Move tests into sub folders
* Updated documentation
* Prevent infinite loop is user has not made k3s

Signed-off-by: dereknola <derek.nola@suse.com>
2021-10-28 12:35:28 -07:00