Commit Graph

84 Commits

Author SHA1 Message Date
Brad Davidson
5ca206ad3b Fix handling of agent-token fallback to token
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-07 09:56:37 -08:00
Brad Davidson
e7464a17f7 Fix use of agent creds for secrets-encrypt and config validate
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-01-06 12:55:18 -08:00
Brian Downs
3ae550ae51
Update bootstrap logic to output all changed files on disk (#4800) 2021-12-21 14:28:32 -07:00
Brad Davidson
8ad7d141e8 Close etcd clients to avoid leaking GRPC connections
If you don't explicitly close the etcd client when you're done with it,
the GRPC connection hangs around in the background. Normally this is
harmelss, but in the case of the temporary etcd we start up on 2399 to
reconcile bootstrap data, the client will start logging errors
afterwards when the server goes away.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-12-17 23:55:17 -08:00
Derek Nola
17eebe0563
Fix cold boot and reconcilation on secondary servers (#4747)
* Enable reconcilation on secondary servers

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Remove unused code

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Attempt to reconcile with datastore first

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Added warning on failure

Signed-off-by: Derek Nola <derek.nola@suse.com>

* Update warning

Signed-off-by: Derek Nola <derek.nola@suse.com>

* golangci-lint fix

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-15 15:38:50 -08:00
Hussein Galal
d71b335871
Fix snapshot restoration on fresh nodes (#4737)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-12-14 02:04:39 +02:00
Brian Downs
bf4e037fcf
Resolve Bootstrap Migration Edge Case (#4730) 2021-12-13 13:02:30 -07:00
Brian Downs
a6fe2c0bc5
Resolve restore bootstrap (#4704) 2021-12-09 14:54:27 -07:00
Manuel Buil
1e0696628e
Merge pull request #4581 from manuelbuil/checking-HA-parameters
Verify new control plane nodes joining the cluster share the same config as cluster members
2021-12-08 10:49:28 +01:00
Derek Nola
bcb662926d
Secrets-encryption rotation (#4372)
* Regular CLI framework for encrypt commands
* New secrets-encryption feature
* New integration test
* fixes for flaky integration test CI
* Fix to bootstrap on restart of existing nodes
* Consolidate event recorder

Signed-off-by: Derek Nola <derek.nola@suse.com>
2021-12-07 14:31:32 -08:00
Manuel Buil
1b3187ea07 Check HA network parameters
Signed-off-by: Manuel Buil <mbuil@suse.com>
2021-12-07 23:09:05 +01:00
Hussein Galal
77fd3e99ec
Add cert rotation command (#4495)
* Add cert rotation command

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* add function to check for dynamic listener file

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* Add dynamiclistener cert rotation support

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to the cert rotation

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix ci tests

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to certificate rotation command

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

Co-authored-by: Brian Downs <brian.downs@gmail.com>
2021-12-02 23:19:16 +02:00
Chris Kim
ae4a1a144a
etcd snapshot functionality enhancements (#4453)
Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-29 10:30:04 -08:00
Chris Kim
f18b3252c0
[master] Add etcd extra args support for K3s (#4463)
* Add etcd extra args support for K3s

Signed-off-by: Chris Kim <oats87g@gmail.com>

* Add etcd custom argument integration test

Signed-off-by: Chris Kim <oats87g@gmail.com>

* go generate

Signed-off-by: Chris Kim <oats87g@gmail.com>
2021-11-11 21:03:15 -08:00
Brian Downs
adaeae351c
update bootstrap logic (#4438)
* update bootstrap logic resolving a startup bug and account for etcd
2021-11-10 05:33:42 -07:00
Brian Downs
0a0b915921
reset buffer after use (#4279) 2021-10-22 15:56:01 -07:00
Brian Downs
34080b23b1
Copy old bootstrap buffer data for use during migration (#4215) 2021-10-15 10:17:29 -07:00
Hussein Galal
b282528ee2
Display cluster tls error only in debug mode (#4124)
* Display cluster tls error only in debug mode

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-10-13 00:00:28 +02:00
Derek Nola
feec44572d
Improve error message when using a "K10" prefixed token (#4180)
* Add new error message with a K10 prefixed secret token

Signed-off-by: dereknola <derek.nola@suse.com>
2021-10-11 10:00:22 -07:00
Brian Downs
ac7a8d89c6
Add ability to reconcile bootstrap data between datastore and disk (#3398) 2021-10-07 12:47:00 -07:00
Brad Davidson
29c8b238e5 Replace klog with non-exiting fork
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-10 09:36:16 -07:00
Brad Davidson
cf12a13175 Add missing node name entry to apiserver SAN list
Also honor node-ip when adding the node address to the SAN list, instead
of hardcoding the autodetected IP address.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-01 13:22:32 -07:00
Brad Davidson
b8add39b07 Bump kine for metrics/tls changes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-09-01 01:51:30 -07:00
Brad Davidson
dc14f370c4 Update wrangler to v0.8.5
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-08-20 18:47:16 -07:00
galal-hussein
20a48734c2 more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:42:05 +02:00
galal-hussein
7ebcc4b134 more fixes
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:39:44 +02:00
galal-hussein
b4401296ec replace error with warn in delete
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:18:56 +02:00
galal-hussein
2f82bfcf67 fix warning msg
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 22:05:43 +02:00
galal-hussein
b377839148 migrate old token key format
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 20:59:57 +02:00
galal-hussein
997ed7b9b4 simplifying the code
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 19:56:19 +02:00
galal-hussein
ad17292fa8 migrate empty string key properly
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 19:21:38 +02:00
galal-hussein
a65e5b6466 Fix multiple bootstrap keys found
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-21 02:50:42 +02:00
Hussein Galal
a939decf01
fix a runtime core panic (#3627)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-07-13 23:33:07 +02:00
Brian Downs
238dc2086e
prevent snapshot save when snapshots are disabled (#3475)
* prevent snapshot save when snapshots are disabled
2021-07-09 10:22:49 -07:00
Brad Davidson
cbfe673c43 Fix spelling to satisfy codespell check
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-07-01 13:29:03 -07:00
Brad Davidson
246b378a27 Bump kine to resolve race condition and unrevisioned delete
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-06-30 09:54:46 -07:00
Hussein Galal
136dddca11
Fix storing bootstrap data with empty token string (#3422)
* Fix storing bootstrap data with empty token string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* delete node password secret after restoration

fixes to bootstrap key

vendor update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* typos

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Removing dynamic listener file after restoration

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-06-22 22:42:34 +02:00
Brad Davidson
f6cec4e75d Add kubernetes.default.svc to serving certs
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-06-08 12:55:20 -07:00
Brian Downs
afd506a595 fix possible race where bootstrap data might not save
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-06-04 15:05:47 -07:00
Hussein Galal
948295e8e8
Fix cluster restoration in rke2 (#3295)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-05-11 00:06:33 +02:00
Hussein Galal
f410fc7d1e
Invoke cluster reset function when only reset flag is passed (#3276)
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-05-05 17:40:04 +02:00
Brian Downs
c5ad71ce0b
Collect and Store etcd Snapshots and Metadata (#3239)
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00
Brian Downs
4a49b9e40b delete nocluster file and remove build tag
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-04-07 12:16:28 -07:00
Brian Downs
400a632666 put etcd bootstrap save call in goroutine and update comment
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-03-17 14:33:00 -07:00
Hussein Galal
73df65d93a
remove etcd data dir when etcd is disabled (#3059)
* remove etcd data dir when etcd is disabled

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use debug instead of info logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-16 18:14:43 +02:00
Brian Downs
7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson
c0d129003b Handle loadbalancer port in TIME_WAIT
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.

The configurable Listen call wants a context, so plumb that through as
well.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-03-08 17:05:25 -08:00
Brad Davidson
7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs
4d1f9eda9d
Etcd Snapshot/Restore to/from S3 Compatible Backends (#2902)
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
2021-03-03 11:14:12 -07:00
galal-hussein
ef999f0b4f change error to warn when removing self from etcd members
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-02 00:19:57 +02:00