Commit Graph

87 Commits

Author SHA1 Message Date
Brian Downs
254b52077e add retention default and wire in s3 prune
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-18 13:57:40 -07:00
Brian Downs
6ee28214fa
Add the ability to prune etcd snapshots (#3310)
* add prune subcommand to force rentention policy enforcement
2021-05-13 13:36:33 -07:00
Brian Downs
bcd8b67db4
Add the ability to list etcd snapshots (#3303)
* add ability to list local and s3 etcd snapshots
2021-05-11 16:59:33 -07:00
Brian Downs
e998cd110d
Add the ability to delete an etcd snapshot locally or from S3 (#3277)
* Add the ability to delete a given set of etcd snapshots from the CLI for locally stored and S3 store snapshots.
2021-05-07 16:10:04 -07:00
Brian Downs
beb0d8397a reference node name when needed
Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-05-04 10:03:28 -07:00
Brian Downs
c5ad71ce0b
Collect and Store etcd Snapshots and Metadata (#3239)
* Add the ability to store local etcd snapshots and etcd snapshots stored in an S3 compatible object store in a ConfigMap.
2021-04-30 18:26:39 -07:00
Brian Downs
66ed6efd57 Resolve local retention issue when S3 in use.
Remove early return preventing local retention policy to be enforced
resulting in N number of snapshots being stored.

Signed-off-by: Brian Downs <brian.downs@gmail.com>
2021-04-14 10:40:08 -07:00
Hussein Galal
73df65d93a
remove etcd data dir when etcd is disabled (#3059)
* remove etcd data dir when etcd is disabled

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix comment

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use debug instead of info logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-16 18:14:43 +02:00
Brian Downs
7c99f8645d
Have Bootstrap Data Stored in etcd at Completed Start (#3038)
* have state stored in etcd at completed start and remove unneeded code
2021-03-11 13:07:40 -07:00
Brad Davidson
c0d129003b Handle loadbalancer port in TIME_WAIT
If the port wanted by the client load balancer is in TIME_WAIT, startup
will fail. Set SO_REUSEPORT so that it can be listened on again
immediately.

The configurable Listen call wants a context, so plumb that through as
well.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-03-08 17:05:25 -08:00
Brad Davidson
7cdfaad6ce
Always use static ports for client load-balancers (#3026)
* Always use static ports for the load-balancers

This fixes an issue where RKE2 kube-proxy daemonset pods were failing to
communicate with the apiserver when RKE2 was restarted because the
load-balancer used a different port every time it started up.

This also changes the apiserver load-balancer port to be 1 below the
supervisor port instead of 1 above it. This makes the apiserver port
consistent at 6443 across servers and agents on RKE2.

Additional fixes below were required to successfully test and use this change
on etcd-only nodes.

* Actually add lb-server-port flag to CLI
* Fix nil pointer when starting server with --disable-etcd but no --server
* Don't try to use full URI as initial load-balancer endpoint
* Fix etcd load-balancer pool updates
* Update dynamiclistener to fix cert updates on etcd-only nodes
* Handle recursive initial server URL in load balancer
* Don't run the deploy controller on etcd-only nodes
2021-03-06 02:29:57 -08:00
Brian Downs
4d1f9eda9d
Etcd Snapshot/Restore to/from S3 Compatible Backends (#2902)
* Add functionality for etcd snapshot/restore to and from S3 compatible backends.
* Update etcd restore functionality to extract and write certificates and configs from snapshot.
2021-03-03 11:14:12 -07:00
galal-hussein
d6124981d5 remove etcd member if disable etcd is passed
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-03-01 23:50:50 +02:00
Hussein Galal
5749f66aa3
Add disable flags for control components (#2900)
* Add disable flags to control components

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes to disable flags

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add comments to functions

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Fix joining problem

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* golint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix ticker

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix role labels

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2021-02-12 17:35:57 +02:00
Brad Davidson
071de833ae Fix typo in field tag
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2021-01-22 19:38:37 -08:00
Yuriy
06fda7accf
Add functionality to bind custom IP address for Etcd metrics endpoint (#2750)
* Add functionality to bind custom IP address for Etcd metrics endpoint

Signed-off-by: yuriydzobak <yurii.dzobak@lotusflare.com>
2021-01-22 17:40:48 -08:00
Brian Downs
13229019f8
Add ability to perform an etcd on-demand snapshot via cli (#2819)
* add ability to perform an etcd on-demand snapshot via cli
2021-01-21 14:09:15 -07:00
MonzElmasry
86f68d5d62
change etcd dir permission if it exists
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2021-01-08 23:47:36 +02:00
Brad Davidson
8e4d3e645b Restore legacy master role for etcd nodes
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-12-15 15:15:46 -08:00
Brad Davidson
63f2211b31 deprecate the "node-role.kubernetes.io/master" label / taint
Related to https://github.com/kubernetes/kubernetes/pull/95382

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-12-08 22:51:34 -08:00
Hussein Galal
fadc5a8057
Add tombstone file to etcd and catch errc etcd channel (#2592)
* Add tombstone file to embedded etcd

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more fixes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* more changes

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* gofmt and goimports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod update

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go lint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go lint

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go mod tidy

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-12-07 22:30:44 +02:00
Menna Elmasry
523ccaf3f2
Merge pull request #2448 from MonzElmasry/new_b
Make etcd use node private ip
2020-10-29 00:23:56 +02:00
MonzElmasry
e8436cc76b
Make etcd use node private ip
Signed-off-by: MonzElmasry <menna.elmasry@rancher.com>
2020-10-28 23:45:24 +02:00
Hussein Galal
fcd18d1b6e
skip node delete from removed member (#2413)
* skip node delete from removed member

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use grpc errors

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* exit if node is the etcd that being removed

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-10-28 18:32:51 +02:00
Brad Davidson
de18528412
Make etcd voting members responsible for managing learners (#2399)
* Set etcd timeouts using values from k8s instead of etcdctl
  Fix for one of the warnings from #2303
* Use etcd zap logger instead of deprecated capsnlog
  Fix for one of the warnings from #2303
* Remove member self-promotion code paths
* Add learner promotion tracking code
* Fix RaftAppliedIndex progress check
* Remove ErrGRPCKeyNotFound check
  This is not used by v3 API - it just returns a response with 0 KVs.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-10-27 11:06:26 -07:00
Hussein Galal
373449ec0a
Allow for multiple etcd snapshot restoration (#2307)
* add reset tmp file

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* go imports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix multiple lines string

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix typo

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use resetFile function

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-30 02:53:31 +02:00
Brad Davidson
703ba5cde7 Add a bunch of doc comments
Also change identical error messages to clarify where problems are
occurring.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
f59e8fc21b Fix etcd directory permissions
Silences warning on startup about insecure directory permissions

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
ee99660a96 Rename etcd directory helpers to reduce confusion about which datadir we're talking about
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:10:00 -07:00
Brad Davidson
97eb28a01a Remove unnecessary listener arg from managed DB setup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2020-09-27 03:09:45 -07:00
Brad Davidson
42bba04651
Skip etcd snapshots if the local endpoint is still a learner (#2295)
* Don't take snapshots if the local endpoint is still a learner
* Configure timeouts for etcd client dialer
2020-09-21 20:23:18 -07:00
Brian Downs
ba70c41cce
Initial Logging Output Update (#2246)
This attempts to update logging statements to make them consistent
through out the code base. It also adds additional context to messages
where possible, simplifies messages, and updates level where necessary.
2020-09-21 09:56:03 -07:00
Hussein Galal
46fe57d7e9
reset etcd name on cluster reset (#2284)
* reset etcd name on cluster reset

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* gofmt

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-09-19 03:09:36 +02:00
Brian Downs
866dc94cea
Galal hussein etcd backup restore (#2154)
* Add etcd snapshot and restore

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix error logs

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* goimports

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* fix flag describtion

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Add disable snapshot and retention

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* use creation time for snapshot retention

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* unexport method, update var name

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* adjust snapshot flags

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update var name, string concat

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* revert previous change, create constants

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* updates

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* type assertion error checking

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* pr remediation

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* updates

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* updates

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* simplify logic, remove unneeded function

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update flags

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update flags

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add comment

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* exit on restore completion, update flag names, move retention check

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* exit on restore completion, update flag names, move retention check

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* exit on restore completion, update flag names, move retention check

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update disable snapshots flag and field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* move function

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update var and field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update var and field names

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update defaultSnapshotIntervalMinutes to 12 like rke

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update directory perms

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update etc-snapshot-dir usage

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update interval to 12 hours

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* fix usage typo

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* add cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* wire in cron

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update deps target to work, add build/data target for creation, and generate

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* remove dead make targets

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* error handling, cluster reset functionality

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* error handling, cluster reset functionality

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* update

Signed-off-by: Brian Downs <brian.downs@gmail.com>

* remove intermediate dapper file

Signed-off-by: Brian Downs <brian.downs@gmail.com>

Co-authored-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-08-28 16:57:40 -07:00
Hussein Galal
169ee63907
Add etcd members as learners (#2066)
* Add etcd members as learners

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>

* Ignore errors in promote member

Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
2020-07-29 22:52:49 +02:00
galal-hussein
c580a8b528 Add heartbeat interval and election timeout 2020-06-06 16:39:42 -07:00
Darren Shepherd
6b5b69378f Add embedded etcd support
This is replaces dqlite with etcd.  The each same UX of dqlite is
followed so there is no change to the CLI args for this.
2020-06-06 16:39:41 -07:00