Commit Graph

93 Commits

Author SHA1 Message Date
Brad Davidson
0c302f4341 Fix etcd member deletion
Turns out etcd-only nodes were never running **any** of the controllers,
so allowing multiple controllers didn't really fix things.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-14 09:39:41 -08:00
Brad Davidson
3d146d2f1b Allow for multiple sets of leader-elected controllers
Addresses an issue where etcd controllers did not run on etcd-only nodes

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-10 10:46:48 -08:00
Brad Davidson
a298bfdb18 Add jitter to scheduled snapshots and retry harder on conflicts
Also ensure that the snapshot job does not attempt to trigger multiple concurrent runs, as this is not supported.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-01-11 14:32:03 -08:00
Derek Nola
06d81cb936
Replace deprecated ioutil package (#6230)
* Replace ioutil package
* check integration test null pointer
* Remove rotate retries

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-10-07 17:36:57 -07:00
Derek Nola
4c0bc8c046
Update etcd error to match correct url (#5909)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-07-29 09:40:53 -07:00
Brad Davidson
5eaa0a9422 Replace getLocalhostIP with Loopback helper method
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00
Brad Davidson
1674b9d640 Raise etcd connection test timeout to 30 seconds
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 13:23:19 -07:00
Brad Davidson
ffe72eecc4 Address issues with etcd snapshots
* Increase the default snapshot timeout. The timeout is not currently
  configurable from Rancher, and larger clusters are frequently seeing
  uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
  command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
  etcd snapshot cron schedules from stacking up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 14:41:38 -07:00
Brad Davidson
6fad63583b Only listen on loopback when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 11:25:54 -07:00
Brad Davidson
fb0a342a20 Sanitize filenames for use in configmap keys
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.

For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:54:26 -07:00
Brad Davidson
ce5b9347c9 Replace DefaultProxyDialerFn dialer injection with EgressSelector support
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-29 17:54:36 -07:00
Brad Davidson
418c3fa858
Fix issue with datastore corruption on cluster-reset (#5515)
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset

Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-27 13:44:15 -07:00
Brad Davidson
7760e2177a Bump etcd to 3.5.3-k3s1
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:53:18 -07:00
Brad Davidson
b12cd62935 Move IPv4/v6 selection into helpers
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-15 01:02:42 -07:00
Roberto Bonafiglia
9c9adda61b Added default endpoint for IPv6
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-04-14 09:58:40 +02:00
Brad Davidson
f37e7565b8 Move the apiserver addresses controller into the etcd package
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.

Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 11:28:15 -07:00
Brad Davidson
2a429aac65 Fix crash on early snapshot
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-04-07 09:23:34 -07:00
Roberto Bonafiglia
4afeb9c5c7
Merge pull request #5325 from rbrtbnfgl/fix-etcd-ipv6-url
Fixed etcd URL in case of IPv6 address
2022-04-05 09:55:42 +02:00
Roberto Bonafiglia
0746dde758 Fixed http URL on etcd
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-31 14:24:59 +02:00
Brad Davidson
62cc1ed24f Skip setting up client tls when etcd server does not have tls enabled
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-30 01:03:41 -07:00
Roberto Bonafiglia
dda409b041 Updated localhost address on IPv6 only setup
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-29 09:35:54 +02:00
Brad Davidson
1339626a5b Defragment etcd datastore before clearing alarms
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-28 09:27:59 -07:00
Roberto Bonafiglia
2285aa699b Fixed etcd URL in case of IPv6 address
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-03-23 15:35:51 +01:00
Brad Davidson
078da46532 Close additional leaked GPRC clients
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-15 18:07:55 -07:00
Luther Monson
9a849b1bb7
[master] changing package to k3s-io (#4846)
* changing package to k3s-io

Signed-off-by: Luther Monson <luther.monson@gmail.com>

Co-authored-by: Derek Nola <derek.nola@suse.com>
2022-03-02 15:47:27 -08:00
Brad Davidson
9a48086524 Ignore cluster membership errors when reconciling from temp etcd
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson
e4846c92b4 Move temporary etcd startup into etcd module
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 20:25:20 -08:00
Brad Davidson
555087b9b8 Add function to clear local alarms on etcd startup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-03-01 11:56:52 -08:00
Brad Davidson
5014c9e0e8 Fix adding etcd-only node to existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 19:56:08 -08:00
Brad Davidson
2989b8b2c5 Remove unnecessary copies of runtime struct
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-02-28 12:05:16 -08:00
Brian Downs
effcb15adb
Adds the ability to compress etcd snapshots (#4866) 2022-01-14 10:31:22 -07:00
Brad Davidson
a5c6e6a68a Fix panic checking name of uninitialized etcd member
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-21 23:38:20 -08:00
Hussein Galal
d71b335871
Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-12-14 02:04:39 +02:00
Brian Downs
a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Chris Kim
ae4a1a144a
etcd snapshot functionality enhancements (#4453)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-29 10:30:04 -08:00
Chris Kim
f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs
adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
galal-hussein
ab3d25a2c5 Update peer address when running cluster-reset
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-25 15:43:27 -07:00
Brian Downs
0452f017c1
Add etcd s3 timeout (#4207) 2021-10-15 10:24:14 -07:00
Brad Davidson
5a923ab8dc Add containerd ready channel to delay etcd node join
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-10-14 14:03:52 -07:00
Brian Downs
ac7a8d89c6
Add ability to reconcile bootstrap data between datastore and disk (#3398) 2021-10-07 12:47:00 -07:00
Hussein Galal
7826407a2e
Make sure there are no duplicates in etcd member list (#4025)
* Make sure there are no duplicates in etcd member list

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix node names with hyphens

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use full server name for etcd node name

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-09-18 00:51:18 +02:00
Brad Davidson
086ca8ba6a Fix premature etcd shutdown when joining an existing cluster
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-15 10:35:07 -07:00
Chris Kim
928b8531c3
[master] Add etcd-member-management controller to K3s (#4001)
* Initial leader elected etcd member management controller
* Bump etcd to v3.5.0-k3s2

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-09-14 08:20:38 -07:00
Brad Davidson
b4d8c641c6 Add exposed metrics listener instead of replacing loopback listener
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 09:39:39 -07:00
Brad Davidson
29c8b238e5 Replace klog with non-exiting fork
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 09:36:16 -07:00
Darren Shepherd
741ba95b04 Migrate sqlite data to etcd when initializing the cluster
Signed-off-by: Darren Shepherd <darren@rancher.com>
2021-09-09 10:24:02 -07:00
Devin Buhl
a1ec43e0b7
feat: add option to disable s3 over https
Signed-off-by: Devin Buhl <devin.kray@gmail.com>
2021-09-05 12:03:49 -04:00
Brad Davidson
b8add39b07 Bump kine for metrics/tls changes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-01 01:51:30 -07:00
Brad Davidson
e95b75409a Fix lint failures
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-08-20 18:47:16 -07:00