k3s/docs/adrs/ca-cert-rotation.md
Brad Davidson 9b6b72941f Clarify ADR based on design review feedback
Signed-off-by: Brad Davidson <brad.davidson@rancher.com>
2023-02-06 15:09:31 -08:00

90 lines
4.9 KiB
Markdown

# Support CA Certificate Renewal / Rotation, Signing by External Root
Date: 2022-12-19
## Status
Accepted
## Context
On the first startup of a new cluster, K3s currently autogenerates a number of self-signed cluster CAs and keys:
* Cluster Server CA + Key (used to sign server certificates)
* Cluster Client CA + Key (used to sign client certificates)
* Request Header CA + Key (used to sign certificates for apiserver aggregation)
* etcd Peer CA + Key (used to sign certificates for authentication between etcd peer servers)
* etcd Client CA + Key (used to sign certificates for etcd clients, ie the apiserver)
* ServiceAccount Token Signing Key (used to sign ServiceAccount JSON Web Tokens)
These CAs are all self-signed, without any cross-signing or common root or intermediates, and are valid for 10
years. When any of these certs expire, any certificates issued will be invalid, causing a significant outage
to the cluster.
### Server CA Pinning
The Cluster Server CA is used in node bootstrapping. The full `K10` format token includes a SHA265 sum of the
Cluster Server CA file's on-disk PEM representation. Nodes that join the cluster using a full token perform a
set of checks when starting up:
1. Download the cluster server CA bundle from `/v1-k3s/cacert` on the server they are joining.
2. Validate that the hash of the bytes in the CA bundle match the hash string following the `K10` prefix in the
token.
3. Validate that the certificate presented by the server they are joining can be validated using the roots and
intermediates present in the CA bundle.
Realistically, this hash should have instead been derived from the DER encoding of the root certificate in
that bundle, as PEM format allows for variable padding, line lengths, and so on. Only DER format is guaranteed
to be stable, and hashing only the root of the chain would have allowed for rotating or renewing intermediate
CAs without breaking trust between cluster nodes.
### Bootstrap Data Immutability
There is not currently any way to write new certificates to the datastore. The certificates and keys are
written to disk once on initial startup, and from there written to the cluster datastore. From that point on,
the files in the datastore are considered authoritative; replacing the files on disk will result in either
replacement, or error, depending on whether or not the files on disk are newer than those in the datastore.
The `secrets-encrypt` subcommand does currently mutate the bootstrap data, but it only touches the secrets
encryption configuration, not the CA certs or keys.
### Summary
For both of the above reasons (hash pinning, and lack of rewriteability) it is not currently possible to
renew or replace the cluster CA certs or keys.
### Additional Considerations
#### Aggressive Certificate Rotation
Some users (particularly government or financial customers) attempt to implement the guidance from [NIST SP 800-57
Part 1 Rev. 5](https://csrc.nist.gov/publications/detail/sp/800-57-part-1/rev-5/final). This document would
see users signing cluster CAs with a set of organizational root and intermediate certificates, and rotating
both the intermediate and cluster CA certificates and keys on at least a yearly basis.
#### ServiceAccount Signing Key Rolling Replacement
While the ServiceAccount signing key is not signed by any CA, rotation of the key must be done carefully so
as to avoid causing an outage. The apiserver and controller-manager must be updated to use a new key, while
still trusting the old key for a period of time. The old key can then be removed at a later date, once all
clients using tokens signed by the old key have received new tokens.
## Decision
* K3s will allow for use of CA certificates signed by an arbitrary set of external root/intermediate CAs.
* K3s will allow for non-disruptive[^1] renewal or replacement of the CA certificates and keys, if the cluster was
originally started using user-provided certificates signed by an external CA.
* K3s will allow for disruptive[^2] renewal or replacement of cluster CA certificates and keys, if the cluster was
originally started with autogenerated self-signed CAs.
* K3s will provide example tooling to allow users to generate cluster CA certificates and keys prior to initial
cluster startup, and provide tooling and process documentation to update the bootstrap data and prepare agents
to trust the new certificates (if necessary)
[^1]: Non-disruptive renewal requires no change to node configuration. The service only needs to be restarted.
[^2]: Disruptive renewal requires changes to the K3s CLI flags, configuration file, or environment variables
prior to restarting the service. Additionally, the cluster may experience a temporary outage while the
configuration change has been affected to all nodes, due to cluster nodes temporary not sharing a common
root of trust.
## Consequences
This will require additional documentation, CLI subcommands, and QA work to validate the process steps.