Reduces code complexity a bit and ensures we don't have to handle closed watch channels on our own
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Before this change, kube-router was always assuming that IPv4 is
enabled, which is not the case in IPv6-only clusters. To enable network
policies in IPv6-only, we need to explicitly let kube-router know when
to disable IPv4.
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
This gives nicer errors from Kubernetes components during startup, and
reduces LOC a bit by using the upstream responsewriters module instead
of writing the headers and body by hand.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Bump etcd to v3.5.4-k3s1
* Fix issue with datastore corruption on cluster-reset
* Disable unnecessary components during cluster reset
Disable control-plane components and the tunnel setup during
cluster-reset, even when not doing a restore. This reduces the amount of
log clutter during cluster reset/restore, making any errors encountered
more obvious.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Update docs to include s390x arch
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x drone pipeline
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Install trivy linux arch only for amd64
This is done so that trivy is not installed for s390x arch
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x arch if condition for Dockerfile.test
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x arch in install script
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add s390x GOARCH in build script
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Add SUFFIX s390x in scripts
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Skip image scan for s390x arch
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Update klipper-lb to version v0.3.5
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Update traefik version to v2.6.2
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Update registry to v2.8.1 in tests which supports s390x
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
* Skip compact tests for s390x arch
This is done because compact test require a previous k3s version which supports s390x and it is not available
Signed-off-by: Venkata Krishna Rohit Sakala <rohitsakala@gmail.com>
Problem:
Specifying extra arguments for the API server for example is not supported as
the arguments get stored in a map before being passed to the API server.
Solution:
Updated the GetArgs function to store the arguments in a map that can have
multiple values. Some more logic is added so that repeated extra arguments
retain their order when sorted whilst overall the arguments can still be
sorted for improved readability when logged.
Support has been added for prefixing and suffixing default argument values
by using -= and += when specifying extra arguments.
Signed-off-by: Terry Cain <terry@terrys-home.co.uk>
This requires a further set of gofmt -s improvements to the
code, but nothing major. golangci-lint 1.45.2 brings golang 1.18
support which might be needed in the future.
Signed-off-by: Dirk Müller <dirk@dmllr.de>
This controller only needs to run when using managed etcd, so move it in
with the rest of the etcd stuff. This change also modifies the
controller to only watch the Kubernetes service endpoint, instead of
watching all endpoints in the entire cluster.
Fixes an error message revealed by use of a newer grpc client in
Kubernetes 1.24, which logs an error when the Put to etcd failed because
kine doesn't support the etcd Put operation. The controller shouldn't
have been running without etcd in the first place.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't attempt to retrieve snapshot metadata configmap if the apiserver
isn't available. This could be triggered if the cron expression caused a
snapshot to be triggered before the apiserver is up.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This is required to make the websocket tunnel server functional on
etcd-only nodes, and will save some code on the RKE2 side once pulled
through.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This change allows to define two cluster CIDRs for compatibility with
Kubernetes dual-stuck, with an assumption that two CIDRs are usually
IPv4 and IPv6.
It does that by levearaging changes in out kube-router fork, with the
following downstream release:
https://github.com/k3s-io/kube-router/releases/tag/v1.3.2%2Bk3s
Signed-off-by: Michal Rostecki <vadorovsky@gmail.com>
Ideally we'd have fully fleshed out support for it (i.e. #5011), but
that's a potentially breaking change and taking a little while to merge.
This is a much simpler change which won't break anything, but will allow
a "Type": "wireguard" reference in the "--flannel-conf" custom config
file to work.
Signed-off-by: Euan Kemp <euank@euank.com>
Improve feedback when running secrets-encrypt commands on etcd-only nodes, and
allow etcd-only nodes to properly restart when effecting rotation.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Don't hardcode the event namespace when creating event recorders; some controllers want to create events in other namespaces.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Closing idle connections isn't guaranteed to close out a pooled connection to a
loadbalancer endpoint that has been removed. Instead, ensure that requests used
to wait for the apiserver to become ready aren't reused.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
This allows secondary etcd nodes to bootstrap the kubelet before an
apiserver joins the cluster. Rancher waits for all the etcd nodes to
come up before adding the control-plane nodes, so this needs to be
handled properly.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Removed vagrant folder
* Fix comments around E2E ENVs
* Eliminate testutil folder
* Convert flock integration test to unit test
* Point to other READMEs
Signed-off-by: Derek Nola <derek.nola@suse.com>
Fixes issue with secrets-encrypt rotate not having any etcd endpoints
available on nodes without a local etcd server.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
adds a new optional node label
"svccontroller.k3s.cattle.io/lbpool=<pool>" that can be set on nodes.
ServiceType: LoadBalancer services can then specify a matching label,
which will schedule the DaemonSet only on specified nodes. This allows
operators to specify different pools of nodes that can serve different
LoadBalancer services on the same ports.
Signed-off-by: robertlestak <robert.lestak@umusic.com>
Reuse the existing etcd library code to start up the temporary etcd
server for bootstrap reconcile. This allows us to do proper
health-checking of the datastore on startup, including handling of
alarms.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Several types contained redundant references to ControlRuntime data. Switch to consistently accessing this via config.Runtime instead.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Upgrade and convert ginkgo from v1 to v2
* Move all integration tests into integration folder
* Update TESTING.md
Signed-off-by: Derek Nola <derek.nola@suse.com>
Before this change, we were copying a part of kube-router code to
pkg/agent/netpol directory with modifications, from which the biggest
one was consumption of k3s node config instead of kube-router config.
However, that approach made it hard to follow new upstream versions.
It's possible to use kube-router as a library, so it seems like a better
way to do that.
Instead of modifying kube-router network policy controller to comsume
k3s configuration, this change just converts k3s node config into
kube-router config. All the functionality of kube-router except netpol
is still disabled.
Signed-off-by: Michal Rostecki <mrostecki@opensuse.org>
Signed-off-by: Manuel Buil <mbuil@suse.com>
* Remove sudo commands from integration tests
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Added cleanup fucntion
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Implement better int cleanup
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Rename test utils
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Enable K3sCmd to be a single string
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Removed parsePod function
Signed-off-by: Derek Nola <derek.nola@suse.com>
* codespell
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Revert startup timeout
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Reorder sonobuoy tests, drop concurrent tests to 3
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Disable etcd
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Skip parallel testing for etcd
Signed-off-by: Derek Nola <derek.nola@suse.com>
If you don't explicitly close the etcd client when you're done with it,
the GRPC connection hangs around in the background. Normally this is
harmelss, but in the case of the temporary etcd we start up on 2399 to
reconcile bootstrap data, the client will start logging errors
afterwards when the server goes away.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Enable reconcilation on secondary servers
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Remove unused code
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Attempt to reconcile with datastore first
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Added warning on failure
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Update warning
Signed-off-by: Derek Nola <derek.nola@suse.com>
* golangci-lint fix
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Regular CLI framework for encrypt commands
* New secrets-encryption feature
* New integration test
* fixes for flaky integration test CI
* Fix to bootstrap on restart of existing nodes
* Consolidate event recorder
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Enabled skipping of unkown flags from config in parser
* Added new unit test, expanded existing
* Add warning back in, without value
Signed-off-by: Derek Nola <derek.nola@suse.com>
* Add etcd extra args support for K3s
Signed-off-by: Chris Kim <oats87g@gmail.com>
* Add etcd custom argument integration test
Signed-off-by: Chris Kim <oats87g@gmail.com>
* go generate
Signed-off-by: Chris Kim <oats87g@gmail.com>
Problem:
Before, to customize CoreDNS, one had to edit the default configmap,
which gets re-written on every K3s server restart.
Solution:
Mount an additional coredns-custom configmap into the CoreDNS container
and import overrides and additional server blocks from the included
files.
Signed-off-by: Thorsten Klein <iwilltry42@gmail.com>
Since we now start the server's agent sooner and in the background, we
may need to wait longer than 30 seconds for the apiserver to become
ready on downstream projects such as RKE2.
Since this essentially just serves as an analogue for the server's
apiReady channel, there's little danger in setting it to something
relatively high.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
Using MAINPID breaks systemd's exit detection, as it stops watching the
original pid, but is unable to watch the new pid as it is not a child
of systemd itself. The best we can do is just notify when execing the child
process.
We also need to consolidate forking into a sigle place so that we don't
end up with multiple levels of child processes if both redirecting log
output and reaping child processes.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Make sure there are no duplicates in etcd member list
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* fix node names with hyphens
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* use full server name for etcd node name
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
Works around issue with Job controller not tracking job pods that
are in CrashloopBackoff during upgrade from 1.21 to 1.22.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Update the default containerd config template with support for adding extra container runtimes. Add logic to discover nvidia container runtimes installed via the the gpu operator or package manager.
Signed-off-by: Joe Kralicky <joe.kralicky@suse.com>
Also honor node-ip when adding the node address to the SAN list, instead
of hardcoding the autodetected IP address.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Added test runner and build files
* Changes to int test to output junit results.
* Updated documentation, removed comments
Signed-off-by: dereknola <derek.nola@suse.com>
Required to support apiextensions.v1 as v1beta1 has been deleted. Also
update helm-controller and dynamiclistener to track wrangler versions.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
* Reset load balancer state during restoraion
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Reset load balancer state during restoraion
Signed-off-by: galal-hussein <hussein.galal.ahmed.11@gmail.com>
* Fix URL pruning when joining an etcd member
Problem:
Existing member clientURLs were checked if they contain the joining
node's IP. In some edge cases this would prune valid URLs when the
joining IP is a substring match of the only existing member's IP.
Because of this, it was impossible to e.g. join 10.0.0.2 to an existing
node that has an IP of 10.0.0.2X or 10.0.0.2XX:
level=fatal msg="starting kubernetes: preparing server: start managed database:
joining etcd cluster: etcdclient: no available endpoints"
Solution:
Fixed by properly parsing the URLs and comparing the IPs for equality
instead of substring match.
Signed-off-by: Malte Starostik <info@stellaware.de>
* switch image names to the ones with the prefix mirrored
* bump rancher/mirrored-coredns-coredns to 1.8.4
Signed-off-by: Jiaqi Luo <6218999+jiaqiluo@users.noreply.github.com>
Sync DisableKubeProxy from cfg into control before sending control to clients,
as it may have been modified by a startup hook.
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>