Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
4.9 KiB
Support CA Certificate Renewal / Rotation, Signing by External Root
Date: 2022-12-19
Status
Accepted
Context
On the first startup of a new cluster, K3s currently autogenerates a number of self-signed cluster CAs and keys:
- Cluster Server CA + Key (used to sign server certificates)
- Cluster Client CA + Key (used to sign client certificates)
- Request Header CA + Key (used to sign certificates for apiserver aggregation)
- etcd Peer CA + Key (used to sign certificates for authentication between etcd peer servers)
- etcd Client CA + Key (used to sign certificates for etcd clients, ie the apiserver)
- ServiceAccount Token Signing Key (used to sign ServiceAccount JSON Web Tokens)
These CAs are all self-signed, without any cross-signing or common root or intermediates, and are valid for 10 years. When any of these certs expire, any certificates issued will be invalid, causing a significant outage to the cluster.
Server CA Pinning
The Cluster Server CA is used in node bootstrapping. The full K10
format token includes a SHA265 sum of the
Cluster Server CA file's on-disk PEM representation. Nodes that join the cluster using a full token perform a
set of checks when starting up:
- Download the cluster server CA bundle from
/v1-k3s/cacert
on the server they are joining. - Validate that the hash of the bytes in the CA bundle match the hash string following the
K10
prefix in the token. - Validate that the certificate presented by the server they are joining can be validated using the roots and intermediates present in the CA bundle.
Realistically, this hash should have instead been derived from the DER encoding of the root certificate in that bundle, as PEM format allows for variable padding, line lengths, and so on. Only DER format is guaranteed to be stable, and hashing only the root of the chain would have allowed for rotating or renewing intermediate CAs without breaking trust between cluster nodes.
Bootstrap Data Immutability
There is not currently any way to write new certificates to the datastore. The certificates and keys are written to disk once on initial startup, and from there written to the cluster datastore. From that point on, the files in the datastore are considered authoritative; replacing the files on disk will result in either replacement, or error, depending on whether or not the files on disk are newer than those in the datastore.
The secrets-encrypt
subcommand does currently mutate the bootstrap data, but it only touches the secrets
encryption configuration, not the CA certs or keys.
Summary
For both of the above reasons (hash pinning, and lack of rewriteability) it is not currently possible to renew or replace the cluster CA certs or keys.
Additional Considerations
Aggressive Certificate Rotation
Some users (particularly government or financial customers) attempt to implement the guidance from NIST SP 800-57 Part 1 Rev. 5. This document would see users signing cluster CAs with a set of organizational root and intermediate certificates, and rotating both the intermediate and cluster CA certificates and keys on at least a yearly basis.
ServiceAccount Signing Key Rolling Replacement
While the ServiceAccount signing key is not signed by any CA, rotation of the key must be done carefully so as to avoid causing an outage. The apiserver and controller-manager must be updated to use a new key, while still trusting the old key for a period of time. The old key can then be removed at a later date, once all clients using tokens signed by the old key have received new tokens.
Decision
- K3s will allow for use of CA certificates signed by an arbitrary set of external root/intermediate CAs.
- K3s will allow for non-disruptive1 renewal or replacement of the CA certificates and keys, if the cluster was originally started using user-provided certificates signed by an external CA.
- K3s will allow for disruptive2 renewal or replacement of cluster CA certificates and keys, if the cluster was originally started with autogenerated self-signed CAs.
- K3s will provide example tooling to allow users to generate cluster CA certificates and keys prior to initial cluster startup, and provide tooling and process documentation to update the bootstrap data and prepare agents to trust the new certificates (if necessary)
Consequences
This will require additional documentation, CLI subcommands, and QA work to validate the process steps.
-
Non-disruptive renewal requires no change to node configuration. The service only needs to be restarted. ↩︎
-
Disruptive renewal requires changes to the K3s CLI flags, configuration file, or environment variables prior to restarting the service. Additionally, the cluster may experience a temporary outage while the configuration change has been affected to all nodes, due to cluster nodes temporary not sharing a common root of trust. ↩︎