Commit Graph

1005 Commits

Author SHA1 Message Date
Brad Davidson
cf66559940 Print stack on panic
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-08-05 02:39:25 -07:00
Roberto Bonafiglia
abdf0c7319 Fix comments and add check in case of IPv6 only node
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-08-04 09:54:45 +02:00
Roberto Bonafiglia
d90ba30353 Added NodeIP autodect in case of dualstack connection
Signed-off-by: Roberto Bonafiglia <roberto.bonafiglia@suse.com>
2022-08-04 09:54:45 +02:00
Derek Nola
1c17f05b8e
Fix secrets reencryption for 8K+ secrets (#5936)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-08-02 14:08:06 -07:00
Derek Nola
118a68c913
Updates to CLI flag grouping + deprecated flag warnings. (#5937)
* Consolidate data dir flag
* Group cluster flags together
* Reorder and group agent flags
* Add additional info around vmodule flag
* Hide deprecated flags, and add warning about their removal

Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-08-02 13:51:16 -07:00
Vladimir Kochnev
13af0b1d88 Save agent token to /var/lib/rancher/k3s/server/agent-token
Having separate tokens for server and agent nodes is a nice feature.

However, passing server's plain `K3S_AGENT_TOKEN` value
to `k3s agent --token` without CA hash is insecure when CA is
self-signed, and k3s warns about it in the logs:

```
Cluster CA certificate is not trusted by the host CA bundle, but the token does not include a CA hash.
Use the full token from the server's node-token file to enable Cluster CA validation.
```

Okay so I need CA hash but where should I get it?

This commit attempts to fix this issue by saving agent token value to
`agent-token` file with CA hash appended.

Signed-off-by: Vladimir Kochnev <hashtable@yandex.ru>
2022-08-01 14:11:50 -07:00
Derek Nola
4c0bc8c046
Update etcd error to match correct url (#5909)
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-07-29 09:40:53 -07:00
Brad Davidson
db2ba7b61d Don't enable unprivileged ports and icmp on old kernels
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-28 14:33:20 -07:00
Brad Davidson
5eaa0a9422 Replace getLocalhostIP with Loopback helper method
Requires tweaking existing method signature to allow specifying whether or not IPv6 addresses should be return URL-safe.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00
Brad Davidson
84fb8787f2 Add service-cluster-ip-range to controller-manager args
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 16:51:57 -07:00
Brad Davidson
bd5fdfce33 Fix server systemd detection
* Use INVOCATION_ID to detect execution under systemd, since as of a9b5a1933f NOTIFY_SOCKET is now cleared by the server code.
* Set the unit type to notify by default for both server and agent, which is what Rancher-managed installs have done for a while.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 13:42:20 -07:00
Brad Davidson
1674b9d640 Raise etcd connection test timeout to 30 seconds
Addressess issue where the compact may take more than 10 seconds on slower disks. These disks probably aren't really suitable for etcd, but apparently run fine otherwise.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-21 13:23:19 -07:00
Brad Davidson
ffe72eecc4 Address issues with etcd snapshots
* Increase the default snapshot timeout. The timeout is not currently
  configurable from Rancher, and larger clusters are frequently seeing
  uploads fail at 30 seconds.
* Enable compression for scheduled snapshots if enabled on the
  command-line. The CLI flag was not being passed into the etcd config.
* Only set the S3 content-type to application/zip if the file is zipped.
* Don't run more than one snapshot at once, to prevent misconfigured
  etcd snapshot cron schedules from stacking up.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 14:41:38 -07:00
Brad Davidson
167ed19d22 Fix deletion of svclb DaemonSet when Service is deleted
87e1806697 removed the OwnerReferences
field from the DaemonSet, which makes sense since the Service may now be
in a different namespace than the DaemonSet and cross-namespace owner
references are not supported.  Unfortunately, we were relying on
garbage collection to delete the DameonSet, so this started leaving
orphaned DaemonSets when Services were deleted.

We don't want to add an a Service OnRemove handler, since this will add
finalizers to all Services, not just LoadBalancers services, causing
conformance tests to fail. Instead, manage our own finalizers, and
restore the DaemonSet removal Event that was removed by the same commit.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:27:55 -07:00
Brad Davidson
fc1c100ffd Remove legacy bidirectional datastore sync code
Since #4438 removed 2-way sync and treats any changed+newer files on disk as an error, we no longer need to determine if files are newer on disk/db or if there is a conflicting mix of both. Any changed+newer file is an error, unless we're doing a cluster reset in which case everything is unconditionally replaced.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:30 -07:00
Brad Davidson
83420ef78e Fix fatal error when reconciling bootstrap data
Properly skip restoring bootstrap data for files that don't have a path
set because the feature that would set it isn't enabled.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-12 12:10:30 -07:00
Brad Davidson
d2089872bb Fix issue with containerd stats missing from cadvisor metrics
cadvisor still doesn't pull stats via CRI yet, so we have to continue to use the deprecated arg.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-08 11:03:02 -07:00
Brad Davidson
afee83dda2 Bump remotedialer
Includes fix for recently identified memory leak.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-07 12:22:37 -07:00
Brad Davidson
961c8274a9 Don't crash when service IPFamiliyPolicy is not set
Service.Spec.IPFamilyPolicy may be a nil pointer on freshly upgraded clusters.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-01 00:07:50 -07:00
Brad Davidson
ff6c233e41 Fix egress selector proxy/bind-address support
Use same kubelet-preferred-address-types setting as RKE2 to improve reliability of the egress selector when using a HTTP proxy. Also, use BindAddressOrLoopback to ensure that the correct supervisor address is used when --bind-address is set.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-07-01 00:07:35 -07:00
Brad Davidson
96162c07c5 Handle egress-selector-mode change during upgrade
Properly handle unset egress-selector-mode from existing servers during cluster upgrade.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-30 11:57:41 -07:00
Olli Janatuinen
2968a83bc0 containerd: Enable enable_unprivileged_ports and enable_unprivileged_icmp by default
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
2022-06-15 14:49:51 -07:00
Brad Davidson
6fad63583b Only listen on loopback when resetting
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 11:25:54 -07:00
Brad Davidson
3399afed83 Ensure that CONTAINERD_ variables are not shadowed by later entries
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:58:12 -07:00
Brad Davidson
fb0a342a20 Sanitize filenames for use in configmap keys
If the user points S3 backups at a bucket containing other files, those
file names may not be valid configmap keys.

For example, RKE1 generates backup files with names like
`s3-c-zrjnb-rs-6hxpk_2022-05-05T12:05:15Z.zip`; the semicolons in the
timestamp portion of the name are not allowed for use in configmap keys.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-15 10:54:26 -07:00
Derek Nola
a9b5a1933f
Delay service readiness until after startuphooks have finished (#5649)
* Move startup hooks wg into a runtime pointer, check before notifying systemd
* Switch default systemd notification to server
* Add 1 sec delay to allow etcd to write to disk
Signed-off-by: Derek Nola <derek.nola@suse.com>
2022-06-15 09:00:52 -07:00
Roberto Bonafiglia
a693071c74
Merge pull request #5552 from sjoerdsimons/sjoerd/flannel-wireguard-mode
Add cli flag for flannel wireguard mode
2022-06-15 14:28:21 +02:00
Darren Shepherd
e6009b1edf Introduce servicelb-namespace parameter
This parameter controls which namespace the klipper-lb pods will be create.
It defaults to kube-system so that k3s does not by default create a new
namespace. It can be changed if users wish to isolate the pods and apply
some policy to them.

Signed-off-by: Darren Shepherd <darren@acorn.io>
2022-06-14 15:48:58 -07:00
Darren Shepherd
f4cc1b8788 Move all klipper-lb daemonset to common namespace for PodSecurity
The baseline PodSecurity profile will reject klipper-lb pods from running.
Since klipper-lb pods are put in the same namespace as the Service this
means users can not use PodSecurity baseline profile in combination with
the k3s servicelb.

The solution is to move all klipper-lb pods to a klipper-lb-system where
the security policy of the klipper-lb pods can be different an uniformly
managed.

Signed-off-by: Darren Shepherd <darren@acorn.io>
2022-06-14 15:48:58 -07:00
Manuel Buil
d4522de06a
Merge pull request #5656 from manuelbuil/AddFlannelCniConfFile
Add FlannelCNIConf flag
2022-06-14 10:23:51 +02:00
Igor
2999289e68
add support for pprof server (#5527)
Signed-off-by: igor <igor@igor.io>
2022-06-13 22:06:55 -07:00
Brad Davidson
0581808f5c Set default egress-selector-mode to agent
... until QA flakes can be addressed.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-10 10:14:15 -07:00
Brad Davidson
b550e1183a Remove control-plane egress context and fix agent mode.
The control-plane context handles requests outside the cluster and
should not be sent to the proxy.

In agent mode, we don't watch pods and just direct-dial any request for
a non-node address, which is the original behavior.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-10 10:14:15 -07:00
Brad Davidson
d3242bea3c Refactor egress-selector pods mode to watch pods
Watching pods appears to be the most reliable way to ensure that the
proxy routes and authorizes connections.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-08 09:34:53 -07:00
Manuel Buil
c705d34804 Add FlannelConfCNI flag
Signed-off-by: Manuel Buil <mbuil@suse.com>
2022-06-08 11:03:17 +02:00
Sjoerd Simons
8643576985 Add ability to pass configuration options to flannel backend
Allow the flannel backend to be specified as
backend=option=val,option2=val2 to select a given backend with extra options.

In particular this adds the following options to wireguard-native
backend:
* Mode - flannel wireguard tunnel mode
* PersistentKeepaliveInterval- wireguard persistent keepalive interval

Signed-off-by: Sjoerd Simons <sjoerd@collabora.com>
2022-06-07 20:13:28 +02:00
Brad Davidson
491aa11e10 Revert "Give kubelet the node-ip value (#5579)"
This reverts commit aa9065749c.

Setting dual-stack node-ip does not work when --cloud-provider is set
to anything, including 'external'. Just set node-ip to the first IP, and
let the cloud provider add the other address.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-02 17:36:55 -07:00
Brad Davidson
29397b4e68 Re-add --cloud-provider=external kubelet arg
The cloud-provider arg is deprecated and cannot be set to anything other than external, but must still be used or node addresses are not set properly.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-06-01 14:23:53 -07:00
Brad Davidson
9d7230496d Add support for configuring the EgressSelector mode
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-18 13:26:10 -07:00
Manuel Buil
aa9065749c
Give kubelet the node-ip value (#5579)
* Give kubelet all node-ips

Signed-off-by: Manuel Buil <mbuil@suse.com>
Co-authored-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-18 13:21:15 -07:00
Donnie Adams
c38a8c3b43
Remove objects when removed from manifests (#5560)
* Remove objects when removed from manifests

If a user puts a file in /var/lib/rancher/k3s/server/manifests/ then the
objects contained therein are deployed to the cluster. If the objects
are removed from that file, they are not removed from the cluster.

This change tracks the GVKs in the files and will remove objects when
there are removed from the cluster.

Signed-off-by: Donnie Adams <donnie.adams@suse.com>
2022-05-18 11:05:03 -07:00
Brad Davidson
4a3d283bc1 Remove --docker/dockershim support
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson
360f18d1cf Always set pod-infra-container-image to protect it from image GC
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson
0710a7198a Remove deprecated flags from cloud-controller-manager
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson
703779c32f Remove deprecated flags from kube-apiserver
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson
551f2fa00a Remove deprecated flags from kubelet
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson
c8447dca56 Bump golang to 1.18.1
Also update all use of 'go get' => 'go install', update CI tooling for
1.18 compatibility, and gofmt everything so lint passes.

Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Brad Davidson
e6385b2341 Update CNI version in config file
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-11 14:39:07 -07:00
Manuel Buil
a3b35d21e9 Add "ipFamilyPolicy: PreferDualStack" to have dual-stack ingress support
Signed-off-by: Manuel Buil <mbuil@suse.com>
2022-05-04 17:32:34 +02:00
Brad Davidson
1d4f995edd Move auto-generated resolv.conf out of /tmp to prevent accidental cleanup
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2022-05-03 20:33:32 -07:00