3 K3s Cert Rotation
Justin J. Janes edited this page 2022-01-07 16:23:16 -08:00

[design doc Certificate Rotation]

Feature

Certificate rotation

$ k3s certificate -h
NAME:
   k3s certificate - Certificates management

USAGE:
   k3s certificate command [command options] [arguments...]

COMMANDS:
   rotate  Certificate rotation

OPTIONS:
   --debug                     (logging) Turn on debug logs [$K3S_DEBUG]
   --config FILE, -c FILE      (config) Load configuration from FILE (default: "/etc/rancher/k3s/config.yaml") [$K3S_CONFIG_FILE]
   --log value, -l value       (logging) Log to file
   --alsologtostderr           (logging) Log to standard error as well as file (if set)
   --data-dir value, -d value  (data) Folder to hold state default /var/lib/rancher/k3s or ${HOME}/.rancher/k3s if not root
   --service value, -s value   List of services to rotate certificates for. Options include (admin, api-server, controller-manager, scheduler, k3s-controller, k3s-server, cloud-controller, etcd, auth-proxy, kubelet, kube-proxy)
   --help, -h                  show help

Test Cases

(P1) Areas Expected Result Docs Needed? Test Cases Pass or Fail or Warn Notes / GH Issue
1 All certs to be rotated before they expire automatically on startup Any certificate that have less or equal 90 days left for its expiration should be successfully rotated along with its key Yes https://github.com/k3s-io/k3s/issues/4271
PREREQUISTIES: You need a custom binary from a developer that generates the certs with less than 90 days expiration date (15 minutes is a good time-frame for testing). NOTE: While testing you may see CA certs - these are currently expected to always be 10 years. Test on server:
  1. start k3s server with a custom binary that generates the certs with less than 90 days expiration date
  2. Before testing you should make sure the certs are about to expire, you can make sure of that using the following command:
cd /var/lib/rancher/k3s/server/tls
for i in `ls *.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done
3. stop k3s server and replace the binary with the release that contain cert rotation feature 4. validate by rerunning the command from step 2, you should see that all certs has been rotated successfully

Test on agent:

  1. start k3s server with a custom binary that generates the certs with less than 90 days expiration date
  2. Join a k3s agent
  3. On agent you should make sure the certs are about to expire, you can make sure of that using the following command:
 IGNORE any CA cert(s)
cd /var/lib/rancher/k3s/agent
for i in `ls *.crt` ; do echo $i; openssl x509 -enddate -noout -in $i; done
3. stop k3s server and replace the binary with the release that contain cert rotation feature and rerun the server 4. restart the agent to pull the new certs 4. validate by rerunning the command from step 3, you should see that all certs has been rotated successfully


2 Recover from expired certs Make sure that clusters with expired certs will be recovered after restarting k3s server
PREREQUISTIES: You need a custom binary from a developer that generates the certs with less than 90 days expiration date (10 minutes is a good time-frame). NOTE: While testing you may see CA certs - these are currently expected to always be 10 years. Test on server and agent:
  1. start k3s server with a custom binary that generates the certs with 10 min expiration
  2. Join a k3s agent
  3. wait until the certs are expired and validate that its expired using kubectl get nodes and by checking expiration date on both server and agents certs
 on server
 IGNORE any CA cert(s)
cd /var/lib/rancher/k3s/server/tls
for i in `ls *.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done
 on agent
 IGNORE any CA cert(s)
cd /var/lib/rancher/k3s/agent
for i in `ls *.crt`; do echo $i; openssl x509 -enddate -noout -in $i; done
4. stop k3s server and replace the binary with the release that contain cert rotation feature, also restart k3s agent to allow it to pull the new certs 5. make sure the server is functional and no longer sending cert expired errors



3 Rotate ALL certs with CLI expect that existing certs have been backed up adjacent to existing tls certs directory. New certs directory is created with all relevant files.

k3s operates normally without expired certificates in the cluster

https://github.com/k3s-io/k3s/issues/4271 Yes - needs documentation
Test on server:
  1. start k3s server wait for single node to get to ready state
  2. stop k3s, verify this by checking the status if necessary
  3. k3s certificate rotate --debug
    This command will back up tls certs to an adjacent directory /var/lib/rancher/k3s/server/tls-*
  4. The following eight files should be duplicated in both directories
    • client-ca.crt are identical
    • client-ca.key are identical
    • request-header-ca.crt are identical
    • request-header-ca.key are identical
    • server-ca.crt are identical
    • server-ca.key are identical
    • service.key are identical
  5. After verifying certs have rotated restart the server
  6. verify service health and node comes back up to a ready state
 on server
$ k3s server
$ sudo systemctl stop k3s.service
$ sudo systemctl status k3s.service
$ sudo k3s --debug certificate rotate 
$ sudo diff -sr /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/tls-<GENERATED-DURING-ROTATE>/ | grep -i identical | awk '{print $2}' | xargs basename -a | awk 'BEGIN{print "Identical Files:  "}; {print $1}' 

$ sudo systemctl start k3s.service $ sudo systemctl status k3s.service $ sudo chmod 644 /etc/rancher/k3s/k3s.yaml $ kubectl get nodes


pass on v1.22.5-rc1+k3s1
4 rotate specific certs with CLI - k3s certificate rotate --service <VALUE> admin api-server controller-manager scheduler k3s-controller k3s-server cloud-controller etcd auth-proxy kubelet kube-proxy expect that existing targeted cert is backed up adjacent to existing tls directory. New cert is located in the correct location. Yes - needs documentation
Test on server:
  1. start k3s server wait for single node to get to ready state
  2. stop k3s, verify this by checking the status if necessary
  3. k3s certificate rotate --service <VALUE> --debug
    This command will back up the related tls certs to an adjacent directory /var/lib/rancher/k3s/server/tls-*
  4. After verifying certs have rotated restart the server
  5. verify service health and node comes back up to a ready state
 on server
$ k3s server
$ sudo systemctl stop k3s.service
$ sudo systemctl status k3s.service
$ sudo k3s --debug certificate rotate --service <VALUE> 
$ sudo diff -sr /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/tls-<GENERATED-DURING-ROTATE>/ | grep -i identical | awk '{print $2}' | xargs basename -a | awk 'BEGIN{print "Identical Files:  "}; {print $1}'

$ sudo systemctl start k3s.service $ sudo systemctl status k3s.service $ sudo chmod 644 /etc/rancher/k3s/k3s.yaml $ kubectl get nodes


pass on v1.22.5-rc1+k3s1
5 rotate certs with a live workload through the CLI - k3s certificate rotate

Deploy workload to a cluster 3 servers 1 agent node Test on server:
  1. start k3s server wait for single node to get to ready state
  2. stop k3s, verify this by checking the status if necessary
  3. k3s certificate rotate --debug
    This command will back up tls certs to an adjacent directory /var/lib/rancher/k3s/server/tls-*
  4. The following eight files should be duplicated in both directories
    • client-ca.crt are identical
    • client-ca.key are identical
    • request-header-ca.crt are identical
    • request-header-ca.key are identical
    • server-ca.crt are identical
    • server-ca.key are identical
    • service.key are identical
  5. After verifying certs have rotated restart the server
  6. verify service health and node comes back up to a ready state
 on server
$ k3s server
$ sudo systemctl stop k3s.service
$ sudo systemctl status k3s.service
$ sudo k3s --debug certificate rotate
$ sudo diff -sr /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/tls-<GENERATED-DURING-ROTATE>/ | grep -i identical | awk '{print $2}' | xargs basename -a | awk 'BEGIN{print "Identical Files:  "}; {print $1}'

$ sudo systemctl start k3s.service $ sudo systemctl status k3s.service $ sudo chmod 644 /etc/rancher/k3s/k3s.yaml $ kubectl get nodes

pass on


6 rotate specific cert with a live workload on the cluster

PREREQUISTIES: Deploy workload to a cluster 3 servers 1 agent node Test on server:
  1. start k3s server wait for single node to get to ready state
  2. stop k3s, verify this by checking the status if necessary
  3. k3s certificate rotate --service <VALUE> --debug
    This command will back up the related tls certs to an adjacent directory /var/lib/rancher/k3s/server/tls-*
  4. After verifying certs have rotated restart the server
  5. verify service health and node comes back up to a ready state
 on server
$ k3s server
$ sudo systemctl stop k3s.service
$ sudo systemctl status k3s.service
$ sudo k3s --debug certificate rotate --service <VALUE> 
$ sudo diff -sr /var/lib/rancher/k3s/server/tls /var/lib/rancher/k3s/server/tls-<GENERATED-DURING-ROTATE>/ | grep -i identical | awk '{print $2}' | xargs basename -a | awk 'BEGIN{print "Identical Files:  "}; {print $1}'

$ sudo systemctl start k3s.service $ sudo systemctl status k3s.service $ sudo chmod 644 /etc/rancher/k3s/k3s.yaml $ kubectl get nodes


pass on

Functionality to be added

[design doc Certificate Rotation]

k3s certificate generate-csr
k3s certificate extend-expiry kube-scheduler
k3s certificate extend-expiry --all

Additional cases to be aware of malformed individual service certificate rotation still triggered the certs to rotate. k3s certificate rotate --service --help