The Salesforce Platform-as-a-Service Security Assurance team is constantly assessing modern compute platforms for security level and features. We use the insights from these research efforts to provide fast and comprehensive support to engineering teams who explore platform options that adequately support their security requirements. Unsurprisingly, Kubernetes is one of the platforms that we deal with on a regular basis. And while Kubernetes multi-tenancy has been a hot topic for a while now, both from an engineering and a security perspective, we were missing one aspect when looking at its isolation capabilities.
There are many great publications available that focus on container runtimes; however, there are not so many that take a look at the Kubernetes control plane with multi-tenancy in mind. We want to start closing this gap by sharing our analysis results with the community as we believe that knowledge must be shared to do good and build trust.
In this post, we will
- define the scope of our work and why the control plane was relevant to assess in the context of multi-tenancy,
- share how node-based multi-tenancy might be implemented with Kubernetes,
- provide an architecture overview, and
- describe relevant threats and the corresponding vulnerability analysis results.
This work has been supported by Atredis Partners who have performed a good part of the architecture and vulnerability review. Their full report is linked below in Conclusions.
Scope & Definitions
Our working scenario comprised an attacker who gained access to a cluster and was able to break out of a container. While there are a lot of options to harden and monitor the container runtime to quite a high security standard, we wanted to evaluate the Kubernetes control plane as a second line of defense, which led us to the following questions:
- Is it possible to schedule pods of one tenant exclusively onto a defined set of worker nodes?
- Does the Kubernetes control plane contain vulnerabilities that disqualifies it as a second line of isolation?
CVE-2020–8559 and the following quote from Kubernetes’ own 2019 security assessment emphasized the need for the more thorough analysis that we were about to perform:
This quote also introduces the need to define the term hostile multi-tenancy, where the platform provider must assume that adversaries will gain access to one or more tenant environments at some point. While most public cloud providers must operate under this assumption, it is a relevant differentiation from common intra-organizational multi-tenancy requirements.
When exploring options to restrict tenants to specific worker nodes, it is important to understand whether your tenants interact directly with the Kubernetes API, e.g. using a restricted user account to manage resources in their namespaces. If tenants have API access, it must also be ensured that they cannot modify the controls that restrict pod scheduling to their assigned nodes. The following list describes options to restrict pods to nodes as well as considerations for preventing user accounts from tampering with those restrictions.
- Taints are applied to nodes and tolerations are part of the pod spec, defining which taints they tolerate for scheduling.
- As they are part of the pod spec, defining tolerations per pod cannot be restricted via Kubernetes’ RBAC. Preventing a user from scheduling pods onto nodes of other tenants would require a policy engine such as OPA gatekeeper, (custom) admission controller, or Kyverno which has sample policies for node selectors/affinities.
- The PodTolerationRestriction admission controller is in alpha state and provides the option to enforce tolerations on namespaces, thus effectively providing a built-in option to restrict customer namespaces to defined nodes.
- Node selectors allow pods to select nodes to run on based on node labels. As the selectors are defined as part of the pod spec, strong restriction faces the same challenges as described in Taints & Tolerations.
- For NodeSelectors, the admission controller PodNodeSelector can be used to enforce a mapping between namespaces and nodes.
- PodNodeSelector is still in alpha state but should stay around even though previous discussion indicated that it might be deprecated at some point after Node Affinity is stable and feature complete.
- The concept of node (anti-) affinity provides more fine-grained scheduling controlled, however, the challenges from our security point of view are comparable to Taints & Tolerations: Affinity is part of the pod spec and no admission controller is readily available to enforce affinity beyond the control of API users.
The considerations described above illustrate two things:
- Leveraging dedicated nodes for isolation requires careful selection and design of controls that enforce the pod/node mapping for the user.
- If you are designing a custom control plane, you should make it a goal to not give users direct Kubernetes API access — this will help protect against various other threats as well.
We also published a repository with code samples to provide a starting point for experimenting with the options above.
Architecture & Data Flow
The following diagram was developed by Atredis Partners to identify the most relevant attack surface:
The report linked in Conclusions provides a detailed threat analysis for the Kubernetes control plane. The following list describes control plane aspects (as illustrated in Architecture & Data Flow) with regard to the overall threat of an adversary with node access attacking the Kubernetes control plane:
- kube-apiserver is the central component of the Kubernetes control plane and thus a particularly interesting target as potential vulnerabilities can result in a manager node compromise, bypassing scheduling restrictions, or extracting sensitive tenant data.
- kube-scheduler and kube-controller-manager are not directly exposed to nodes but process data that can be tampered with by compromised worker nodes — thus making vulnerabilities in them critical as well.
- kubelet and kube-proxy may be attacked to achieve worker-to-worker horizontal privilege escalation (if a relevant vulnerability is identified) in addition to worker-to-manager vertical privilege escalation.
- DNS, etcd, CNI/Overlay Network: The implementation of coredns, etcd, as well as the network plugin of choice was out of scope of this assessment. Configuration issues were in scope. The Results will also emphasize the need for more thorough analysis of multi-tenancy capabilities of those components.
The vulnerability assessment did not identify any severe vulnerabilities in the Kubernetes control plane. The following list summarizes a few important insights to consider when engineering a secure control plane:
- kubelet: If deployed with kubeadm, kubelet is configured to support token-based authentication. These tokens can be intercepted by a node-level attacker and be used to authenticate with other nodes. This authentication mode is not used by default within the control plane; however, if not disabled, an additional component might use it.
- kube-apiserver: If you are using bootstrap tokens to join additional nodes, ensure that they are created with a (short lifetime) expiration date and that the bootstrap tokens are not leaked into worker nodes (e.g. via shell history or temporary files). While looking into bootstrap tokens, a bug has been closed that may have had security implications as well which emphasizes the need to properly protect those tokens.
- DNS: If your cluster is leveraging the CoreDNS kubernetes plugin, DNS queries can be used to query a variety of entities (such as services and endpoints) in your cluster. Depending on your service names, this can leak relevant information to an attacker.
- CNI/Overlay Network: The overlay network was out of scope of this assessment, however, you must assess the overlay networking plugin of your choice for its multi-tenancy capabilities. The recently published research on BGP highjacking within a cluster illustrates the need for a multi-tenancy aware networking plugin: A node-level attacker can simply take the place of e.g. a legitimate BGP peer without the need for network-level highjacking. The popular overlay plugin choice Calico, for example, uses by default no BGP password and, even if the password would be set, an attacker can retrieve it by authenticating with Calico’s service account token.
Our analysis shows that node-based tenant isolation can be used as a hardening layer. This hardening layer should only be used in addition to thorough container runtime hardening and monitoring, which is a crucial first step in securing the cluster and for which we included a few references below for completeness sake.
Node-based isolation is not a one-size-fits-all solution and requires careful engineering of your custom control plane (e.g. when enforcing the node-based isolation) and overall ecosystem (DNS, overlay network). Virtual cluster approaches like vcluster or Cluster API could be explored to see whether a worker node-based isolation of the different virtual/per-tenant control planes offers advantages over existing cluster-as-a-service offerings.
We want to thank Atredis Partners and Kubernetes SIG Security and Multi-Tenancy WG for the great collaboration and are looking forward to see other control plane analysis projects being started based on our data!
The full report as well as configuration samples can be found here.
If deep dives into the security of open source tooling are interesting to you, join our Talent Portal to check out open roles in Security and receive periodic updates from our recruiting team.