In this chapter we will focus on networking aspects of your workloads. We will first review the defaults that Kubernetes proper comes equipped with and what else is readily available due to integrations. We cover networking topics including East-West and North-South traffic—that is, intra-pod and inter-pod communication, communication with the worker node (hosts), cluster-external communication, workload identity, and encryption on the wire.
In the second part of this chapter we have a look at two more recent additions to the Kubernetes networking toolbox: service meshes and the Linux kernel extension mechanism eBPF. We try to give you a rough idea if, how, and where you can, going forward, benefit from both.
As you can see in Figure 5-1, there are many moving parts in the networking space.
The good news is that most if not all of the protocols should be familiar to you, since Kubernetes uses the standard Internet Engineering Task Force (IETF) suite of networking protocols, from the Internet Protocol to the Domain Name System (DNS). What changes, really, is the scope and generally the assumptions about how the protocols are used. For example, when deployed on a worldwide scale, it makes sense to make the time-to-live (TTL) of a DNS record months or longer.
In the context of a container that may run for hours or days at best, this assumption doesn’t hold anymore. Clever adversaries can exploit such assumptions and as you should know by now, that’s exactly what the Captain would do.
In this chapter we will focus on the protocols most often used in Kubernetes—and their weak points with respect to workloads. As Captain Hashjack likes to say, “loose lips sink ships,” so we’ll first explore for permissive networking defaults, then show how to attack them as well as discuss the controls you can implement to detect and mitigate these attacks.
With defaults we mean the default values of configurations of components that you get when you use Kubernetes from source, in an unmodified manner. From a networking perspective, workloads in Kubernetes have the following setup:
Flat topology. Every pod can see and talk to every other pod in the cluster.
No securityContext. Workloads can escalate to host network interface controller (NIC).
No environmental restrictions. Workloads can query their host and cloud metadata.
No identity for workloads.
No encryption on the wire (between pods and cluster-external traffic).
While the preceding list might look scary, a different way to look at it might make it easier to assess the risks present. As depicted in Figure 5-2, the main communication paths in Kubernetes are as follows:
Intra-pod traffic: containers within a pod communicating (see the next section)
Inter-pod traffic: pods in the same cluster communicating (see “Inter-Pod Traffic”)
Pod-to-worker node traffic (see “Pod-to-Worker Node Traffic”)
Cluster-external traffic: communication of pods with the outside world (see “Cluster-External Traffic”)
Let’s now have a closer look at the communication paths and other networking-relevant defaults in Kubernetes. Among other things, we’ll discuss “The State of the ARP”, “No securityContext”, “No Workload Identity”, and “No Encryption on the Wire”.
There are some aspects of the networking space that depend heavily on the environment in which Kubernetes is used. For example, when using hosted Kubernetes from one of the cloud providers, the control plane and/or data plane may or may not be publicly available. If you are interested in learning more how the big three handle this, have a look at:
Amazon EKS private clusters
Azure AKS private clusters
Google GKE private clusters
Since this is not an intrinsic property of Kubernetes and many combinations are possible, we decided to exclude this topic from our discussion in this chapter.
So, are you ready to learn about the Kubernetes networking defaults?
The way intra-pod networking in Kubernetes works is as follows. An implicit
so-called pause container
in a pod (cp
in Figure 5-3) spans a Linux network namespace.
Other containers in the pod, such as init containers (like ci1
and ci2
), and the main
application container and sidecars, such as proxies or logging containers (for example, c1
to c3
), then join the pause container’s network and IPC namespace.
The pause container has the network bridge mode enabled and all the other containers in the pod are sharing their namespace via container mode.
As discussed in Chapter 2, pods were designed to make it easy to lift and shift existing applications into Kubernetes, which has sobering security implications. Ideally, you rewrite the application so that the tight coupling of containers in a pod are not necessary or deploy traditional tooling in the context of a pod.
While the latter seems like a good idea initially, do remember that this is a stopgap measure at best. Once the boundaries are clear and effectively every microservice is deployed in its own pod, you can go ahead and use the techniques discussed in the next sections.
In addition, no matter if you’re looking at defense in depth in the context of a pod or cluster-wide, you can employ a range of dedicated container security open source and commercial offerings. See also respective section “Networking” in Appendix B.
In a Kubernetes cluster, by default every pod can see and talk to every other pod. This default is a nightmare from a security perspective (or a free ride, depending on which side you’re on) and we can not emphasize enough how dangerous this fact is.
No matter what your threat model is, this “all traffic is allowed” policy for both inter-pod and external traffic represents one giant attack vector. In other words, you should never rely on the Kubernetes defaults in the networking space. You should never, ever run a Kubernetes cluster without restricting network traffic in some form or shape. For a practical example on how you can go about this, have a look at “Traffic Flow Control”.
If not disabled, workloads can query the worker node (host) they are running on as well as the (cloud) environments they are deployed into.
No default protection exists for worker nodes, routable from the CNI. Further, the worker nodes may be able to access cloud resources, datastores, and API servers. Some cloud providers, notably Google, offer some solutions for this issue; see, for example, shielded GKE Nodes.
For cloud environments in general, good practices exist. For example, Amazon EKS recommends to restrict access to instance metadata and equally GKE documents how to protect cluster metadata.
Further, commercial offerings like Nirmata’s Virtual Clusters and Workload Policies can be used in this context.
To allow pods to communicate with cluster-external endpoints, Kubernetes has added a number of mechanisms over time. The most recent and widely used is called an Ingress. This allows for layer 7 routing (HTTP), whereas for other use cases such as layer 3/4 routing you would need to use older, less convenient methods. See also Publishing Services (ServiceTypes) in the docs.
In order for you to use the Ingress resource, you will need to pick an ingress controller. You have many many choices, oftentimes open source-based, which include:
In addition, cloud providers usually provide their own solutions, integrated with their managed load-balancing services.
Encryption on the wire (TLS) is almost the default nowadays, and most Ingress solutions support it out of the box. Alternatively, you can use a service mesh for securing your North-South traffic (see “Service Meshes”).
Last but not least, on the application level you might want to consider using a web application firewall (WAF) such as offered by most cloud providers, or you can use standalone offering such as Wallarm.
More and more practitioners are sharing their experiences in this space, so keep an eye out for blog posts and CNCF webinars covering this topic. See, for example, “Shaping Chick-fil-A One Traffic in a Multi-Region Active-Active Architecture”.
Address Resolution Protocol (ARP) is a link layer protocol used by the Internet Protocol (IP) to map IP network addresses to the hardware (MAC) addresses. Liz Rice showed in her KubeCon NA 2019 talk, “CAP_NET_RAW and ARP Spoofing in Your Cluster: It’s Going Downhill From Here”, how defaults allow us to open raw network sockets and how this can lead to issues.
It involves using ARP and DNS to fool a victim pod to visit a fake URL, which is possible due to the way Kubernetes handles local FQDNs, and it requires that CAP_NET_RAW
is available to a pod.
For more details, see the Aqua Security blog post “DNS Spoofing on Kubernetes Clusters”.
The good news is, there are defenses available to mitigate the ARP-based attacks and spoil the Captain’s mood:
Using Pod Security Policies (PSP) as discussed in “Runtime Policies”, to drop CAP_NET_RAW
.
Using generic policy engines as described in “Generic Policy Engines” such as Open Policy Agent/Gatekeeper, or by using Kyverno to convert PSPs (also see this video).
How can you tell if you’re affected? Use kube-hunter, for example.
By default, workloads can escalate to the NIC of the worker node they are running on. For example, when running privileged containers, one can escape from the container using kernel modules. Further, as the Microsoft Azure team pointed out in its “Threat matrix for Kubernetes” blog post:
Attackers with network access to the host (for example, via running code on a compromised container) can send API requests to the Kubelet API. Specifically querying
https://[NODE IP]:10255/pods/
retrieves the running pods on the node.https://[NODE IP]:10255/spec/
retrieves information about the node itself, such as CPU and memory consumption.
Naturally, one wants to avoid these scenarios and one way to go about this is to apply PSPs, as discussed in “Runtime Policies”.
For example, the Baseline/default policy has the following defined:
spec.hostNetwork
spec.hostPID
spec.hostIPC
spec.containers[*].securityContext.privileged
spec.initContainers[*].securityContext.privileged
HostPorts
should be disallowed or at minimum restricted to a known listspec.containers[*].ports[*].hostPort
spec.initContainers[*].ports[*].hostPort
In addition, there are a number of commercial offerings, such as Palo Alto Networks Prisma Cloud (formerly Twistlock), that you can use to harden your worker nodes in this context.
By default, Kubernetes does not assign an identity to services. SPIFFE/SPIRE can be used to manage workload identities and enable mTLS. SPIFFE (Secure Production Identity Framework for Everyone) is a collection of specifications for securely identifying workloads. It provides a framework enabling you to dynamically issue an identity to a service across environments by defining short-lived cryptographic identity documents—called SPIFFE Verifiable Identity Documents (SVIDs)—via an API. Your workloads in turn can use these SVIDs when authenticating to other workloads. For example, an SVID can be used to establish an TLS connection or to verify a JWT token.
For workloads in regulated industries, that is, any kind of app that is required to conform to a (government issued) regulation, encryption on the wire—or encryption in transit, as it’s sometimes called—is typically one of the requirements. For example, if you have a Payment Card Industry Data Security Standard (PCI DSS)–compliant app as a bank, or a Health Insurance Portability and Accountability Act (HIPAA)–compliant app as a health care provider, you will want to make sure that the communication between your containerized microservices is protected against sniffing and person-in-the-middle attacks.
These days, the Transport Layer Security (TLS) protocol as defined in RFC 8446 and older IETF paperwork is usually used to encrypt traffic on the wire. It uses asymmetric encryption to agree on a shared secret negotiated at the beginning of the session (“handshake”) and in turn symmetric encryption to encrypt the workload data. This setup is a nice performance versus security trade-off.
While control plane components such as the API server, etcd
, or a kubelet
can rely on an PKI infra out-of-the-box, providing APIs and good practices for
certificates, the
same is sadly not true for your workloads.
You can see the API server’s hostname, and any IPs encoded into its TLS certificate,
with openssl
. You can find example code for this in “Control Plane”.
By default, the traffic between pods and to the outside world is not encrypted. To mitigate, enable workload encryption on the wire, for example with Calico, with Wireguard VPN, or with Cilium, which supports both Wireguard and IPsec. Another option to provide not only this sort of encryption but also workload identity, as discussed in “No Workload Identity”, are service meshes. With the defaults out of the way, let’s move on to the threat modeling for the networking space.
The threat model in the networking space (see “Starting to Threat Model”); that is, the collection of identified networking vulnerabilities according to the risk they pose—is what we’re focusing on in this section.
So, what is the threat model we consider in the networking space, with respect to workloads? What are our assumptions about what attackers could do to our precious workloads and beyond to the infrastructure?
The following observations should give you an idea about potential threat models. We illustrate these scenarios with some examples of past attacks, covering the 2018–2020 time frame:
Using the front door, for example via an ingress controller or a load balancer, and then either pivoting or performing a denial-of-service attack, such as observed in CVE-2020-15127
Using developer access paths like kubectl cp
(CVE-2019-11249)
or developer environments such as Minikube, witnessed in
CVE-2018-1002103
Launching a pod with access to host networking or unnecessary capabilities, as we will further discuss in “The State of the ARP”
Leveraging a compromised workload to connect to another workload
Port scanning of all CNI plug-ins and further use this information to identify vulnerabilities; for example, CVE-2019-9946
Attacking a control plane component such as the API server and etcd
or a kubelet
or kube-proxy
on the worker; for example, CVE-2020-8558,
CVE-2019-11248,
CVE-2019-11247, and
CVE-2018-1002105
Performing server-side request forgery (SSRF); for example, concerning the hosting environment, like a cloud provider’s VMs
Performing person-in-the-middle attacks, such as seen in the context of IPv6 routing; see also CVE-2020-10749
Now that we have a basic idea of the potential threat model, let’s go through and see how the defaults can be exploited and defended against, in turn.
We’ve seen the networking defaults and what kind of communication paths are present in Kubernetes. In this section, we walk you through an end-to-end setup and show you how to secure the external traffic using network policies.
To demonstrate the networking defaults in action, let’s use kind, a tool for running local Kubernetes clusters using Docker containers.
Let’s create a kind
cluster with networking prepared for Calico and enable Ingress (also see the documentation).
We are using the following
config:
kind
:
Cluster
apiVersion
:
kind.x-k8s.io/v1alpha4
nodes
:
-
role
:
control-plane
kubeadmConfigPatches
:
-
|
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings
:
-
containerPort
:
80
hostPort
:
80
protocol
:
TCP
-
containerPort
:
443
hostPort
:
443
protocol
:
TCP
-
role
:
worker
networking
:
disableDefaultCNI
:
true
podSubnet
:
192.168.0.0/16
Enable Ingress for cluster.
Disable the native kindnet
.
In preparation to install Calico, set to its default subnet.
Assuming the preceding YAML snippet is stored in a file called cluster-config.yaml,
you can now create the kind
cluster as follows:
$
kind create cluster --name cnnp\
--config cluster-config.yaml Creating cluster"cnnp"
...
Note that if you do this the first time, the preceding output might look different and it can take several minutes to pull the respective container images.
Next we install and patch Calico to make it work with kind
. Kudos to Alex
Brand for putting together the necessary
patch instructions:
$
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml configmap/calico-config created customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created ... serviceaccount/calico-kube-controllers created$
kubectl -n kube-systemset
env daemonset/calico-nodeFELIX_IGNORELOOSERPF
=
true
daemonset.apps/calico-node env updated
And to verify if everything is up and running as expected:
$
kubectl -n kube-system get pods|
grep calico-node calico-node-2j2wd 0/1 Running0
18s calico-node-4hx46 0/1 Running0
18s calico-node-qnvs6 0/1 Running0
18s
Before we can deploy our app, we need one last bit of infrastructure in place, a load balancer, making the pods available to the outside world (your machine).
For this we use Ambassador as an ingress controller:
$
kubectl apply -f https://github.com/datawire/ambassador-operator/releases/latest /download/ambassador-operator-crds.yaml&&
\
kubectl apply -n ambassador -f https://github.com/datawire/ambassador-operator/releases /latest/download/ambassador-operator-kind.yaml&&
\
kubectlwait
--timeout=
180s\
-n ambassador --for
=
condition
=
deployed\
ambassadorinstallations/ambassador customresourcedefinition.apiextensions.k8s.io/ambassadorinstallations.getambassador.io created namespace/ambassador created configmap/static-helm-values created serviceaccount/ambassador-operator created clusterrole.rbac.authorization.k8s.io/ambassador-operator-cluster created clusterrolebinding.rbac.authorization.k8s.io/ambassador-operator-cluster created role.rbac.authorization.k8s.io/ambassador-operator created rolebinding.rbac.authorization.k8s.io/ambassador-operator created deployment.apps/ambassador-operator created ambassadorinstallation.getambassador.io/ambassador created ambassadorinstallation.getambassador.io/ambassador condition met
Now we can launch the application, a web server. First off, we want to do all of the
following in a dedicated namespace called npdemo
, so let’s create one:
$
kubectl create ns npdemo
namespace/npdemo created
Next, create a YAML file called workload.yaml that defines a deployment, a service, and an ingress resource, in total representing our workload application:
kind
:
Deployment
apiVersion
:
apps/v1
metadata
:
labels
:
app
:
nginx
name
:
nginx
spec
:
replicas
:
1
selector
:
matchLabels
:
app
:
nginx
template
:
metadata
:
labels
:
app
:
nginx
spec
:
containers
:
-
image
:
nginx:alpine
name
:
main
ports
:
-
containerPort
:
80
---
kind
:
Service
apiVersion
:
v1
metadata
:
name
:
nginx
spec
:
selector
:
app
:
nginx
ports
:
-
port
:
80
---
kind
:
Ingress
apiVersion
:
extensions/v1beta1
metadata
:
name
:
mainig
annotations
:
kubernetes.io/ingress.class
:
ambassador
spec
:
rules
:
-
http
:
paths
:
-
path
:
/api
backend
:
serviceName
:
nginx
servicePort
:
80
We configure the ingress in a way that if we hit the /api URL path we
expect it to route traffic to our nginx
service.
Next, you want to create the resources defined in workload.yaml by using:
$
kubectl -n npdemo apply -f workload.yaml
deployment.apps/nginx created
service/nginx created
ingress.extensions/mainig created
When you now try to access the app as exposed in the Ingress resource you should be able to do the following (note that we’re only counting the lines returned to verify we get something back):
$
curl -s 127.0.0.1/api|
wc -l 25
Wait. What just happened? We put an Ingress in front of the NGINX service and it happily receives traffic from outside? That can’t be good.
So, how can we keep the Captain and their crew from getting their dirty paws on our cluster? Network policies are coming to our rescue. While we will cover policies in a dedicated chapter (see Chapter 8), we point out network policies and their usage here since they are so useful and, given the “by default all traffic is allowed” attitude of Kubernetes, one can argue almost necessary.
While Kubernetes allows you to define and apply network policies out-of-the-box, you need something that enforces the policies you define and that’s the job of a provider.
For example, in the following walkthrough, we will be using Calico, however there are many more options available, such as the eBPF-based solutions discussed in “eBPF”.
We shut down all traffic with the following Kubernetes network policy in a file called, fittingly, np-deny-all.yaml:
kind
:
NetworkPolicy
apiVersion
:
networking.k8s.io/v1
metadata
:
name
:
deny-all
spec
:
podSelector
:
{
}
policyTypes
:
-
Ingress
Network policies are notoriously difficult to get right, so in this context, you may want to check out the following:
To help you edit and visualize network pollicies, check out the tool available from networkpolicy.io.
To debug network policy you can use krew-net-forward.
To test policies, have a look at netassert.
So let’s apply the preceding network policy and see if we can still access the app from outside of the cluster:
$
kubectl -n npdemo apply -f np-deny-all.yaml networkpolicy.networking.k8s.io/deny-all created$
kubectl -n npdemo describe netpol deny-all Name: deny-all Namespace: npdemo Created on: 2020-09-22 10:39:27 +0100 IST Labels: <none> Annotations: <none> Spec: PodSelector: <none>(
Allowing the specific traffic to all pods in this namespace)
Allowing ingress traffic: <none>(
Selected pods are isolatedfor
ingress connectivity)
Not affecting egress traffic Policy Types: Ingress
And this should fail now, based on our network policy (giving it a 3-second time out, just to be sure):
$
curl --max-time3
127.0.0.1/api curl:(
28)
Operation timed out after3005
milliseconds with0
bytes received
If you only have kubectl
available, you can still make raw network requests,
as Rory McCune pointed
out:
kubectl --insecure-skip-tls-verify -s bbc.co.uk get --raw /
Of course, it shouldn’t be in your container image in the first place!
We hope by now you get an idea how dangerous the defaults are—all network traffic to and from pods is allowed—and how you can defend against it.
Learn more about network policies, including recipes, tips, and tricks, via the resources we put together in Appendix B.
In addition to network policies, some cloud providers offer other native mechanisms to restrict traffic from/to pods; for example, see AWS security groups for pods.
Finally, don’t forget to clean up your Kubernetes cluster using
kind delete cluster --name cnnp
, once you’re done exploring
the topic of network policies.
Now that we’ve seen a concrete networking setup in action, let’s move on to a different topic: service meshes. This relatively recent technology can help you in addressing some of the not-so-secure defaults discussed earlier, including workload identity and encryption on the wire.
A somewhat advanced topic, a service mesh is in a sense complementary to Kubernetes and can be beneficial in a number of use cases. Let’s have a look at how the most important workload-level networking issues can be addressed using a service mesh.
A service mesh as conceptually shown in Figure 5-4 is, as per its creators, a collection of userspace proxies in front of your apps along with a management process to configure said proxies.
The proxies are referred to as the service mesh’s data plane, and the management process as its control plane. The proxies intercept calls between services and do something interesting with or to these calls, such as disallowing a certain communication path or collecting metrics from the call. The control plane, on the other hand, coordinates the behavior of the proxies and provides the administrator an API.
At the time of writing, a number of service meshes exist as well as proposed quasi-standards for interoperability, such as the CNCF project Service Mesh Interface or work of the Envoy-based Universal Data Plane API Working Group (UDPA-WG).
While it is early days, we witness certain uptake, especially out of security considerations (see Figure 5-5). For example, The New Stack (TNS) reports in its 2020 Service Mesh survey:
A third of respondents’ organizations are using service meshes to control communications traffic between microservices in production Kubernetes environments. Another 34% use service mesh technology in a test environment, or are piloting or actively evaluating solutions.
Going forward, many exciting application areas and nifty defense mechanisms based on service meshes are possible—for example, Identity Federation for Multi-Cluster Kubernetes and Service Mesh or using OPA in Istio. That said, many end users are not yet ready to go all in and/or are in a holding pattern, waiting for cloud and platform providers to make the data plane of the service mesh part of the underlying infrastructure. Alternatively, the data plane may be implemented on the operating system level, for example, using eBPF.
Linkerd is a graduated CNCF project, originally created by Buoyant.
Linkerd automatically enables mutual Transport Layer Security (mTLS) for most HTTP-based communication between meshed pods. Let’s see that in action.
To follow along, install Linkerd
in a test cluster. We’re using kind
in the following example and assume you have
both the Kubernetes cluster set up and configured as well as the Linkerd CLI:
$
linkerd check --pre
kubernetes-api
...
Status check results are √
Now that we know that we’re in a position to install Linkerd, let’s go ahead and do it:
$
linkerd install|
kubectl apply -f - namespace/linkerd created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created ... deployment.apps/linkerd-grafana created
And finally verify the install:
$
linkerd check
kubernetes-api
...
Status check results are √
Great! All up and running. You could have a quick look at the Linkerd dashboard using
linkerd dashboard &
, which should show something similar to what’s depicted in Figure 5-6.
OK, back to mTLS: once we have enabled the mesh in the respective namespaces
it should be impossible for us, even from within the cluster, to directly talk
to a service using, say curl
, and doing an HTTP query. Let’s see how that works.
In the following example, we’re reusing the setup from “Inter-Pod Traffic” but you can really use any workload that exposes an HTTP service within the cluster.
First, we need to enable the mesh, or meshify, as the good folks from Buoyant call it:
$
kubectl get -n npdemo deploy -o yaml|
\
linkerd inject -|
kubectl apply -f -$
kubectl get -n ambassador deploy -o yaml|
\
linkerd inject -|
kubectl apply -f -
Now we can validate our mTLS setup using tshark as follows:
$
curl -sL https://run.linkerd.io/emojivoto.yml|
linkerd inject --enable-debug-sidecar -|
kubectl apply -f - namespace"emojivoto"
injected ... deployment.apps/web created
Once the sample app is up and running we can use an remote shell into the attached debug container that Linkerd kindly put there for us:
$
kubectl
-n
emojivoto
exec
-it
\
$(
kubectl
-n
emojivoto
get
po
-o
name
|
grep
voting
)
\
-c
linkerd-debug
--
/bin/bash
Connect to pod for interactive (terminal) use.
Provide pod name for the exec
command.
Target the linkerd-debug
container in the pod.
Now, from within the debug container we use tshark
to inspect the packets on the
NIC and expect to see TLS traffic (output edited to fit):
root@voting-57bc56-s4l:/#
tshark
-i
any
\
-d
tcp.port
=
=
8080,ssl
|
grep
-v
127.0.0.1
Running
as
user
"root"
and
group
"root."
This
could
be
dangerous.
Capturing
on
'any'
1
0.000000000
192.168.49.192
→
192.168.49.231
TCP
76
41704
→
4191
[
SYN
]
Seq
=
0...
2
0.000023419
192.168.49.231
→
192.168.49.192
TCP
76
4191
→
41704
[
SYN,
ACK
]
...
3
0.000041904
192.168.49.192
→
192.168.49.231
TCP
68
41704
→
4191
[
ACK
]
Seq
=
1...
4
0.000356637
192.168.49.192
→
192.168.49.231
HTTP
189
GET
/ready
HTTP/1.1
5
0.000397207
192.168.49.231
→
192.168.49.192
TCP
68
4191
→
41704
[
ACK
]
Seq
=
1...
6
0.000483689
192.168.49.231
→
192.168.49.192
HTTP
149
HTTP/1.1
200
OK
...
Listen on all available network interfaces for live packet capture.
Decode any traffic running over port 8080
as TLS.
Ignoring 127.0.0.1
(localhost) as this traffic will always be unencrypted.
Yay, it works, encryption on the wire for free! And with this we’ve completed the mTLS case study.
If you want to learn more about how to use service meshes to secure your East-West communication, we have put together some suggested further reading in “Networking” in Appendix B.
While service meshes certainly can help you with networking-related security challenges, fending off the Captain and their crew, you should be aware of weaknesses. For example, from Envoy-based systems, if you run a container with UID 1337, it bypasses the Istio/Envoy sidecar or, by default, the Envoy admin dashboard is accessible from within the container because it shares a network. For more background on this topic, check out the in-depth Istio Security Assessment.
Now it’s time to move on to the last part of the workload networking topic: what happens on a single worker node.
After the service mesh adventure, we focus our attention now on a topic that is on the one hand entirely of opposite character and on the other hand can also be viewed and understood to be used in the service mesh data plane. We have a look at eBPF, a modern and powerful way to extend the Linux kernel, and with it you can address a number of networking-related security challenges.
Originally, this piece of Linux kernel technology was known under the name Berkeley Packet Filter (BPF). Then it experienced a number of enhancements, mainly dirven by Google, Facebook, and Netflix, and to distinguish it from the original implementation it was called eBPF. Nowadays, the kernel project and technology is commonly known as eBPF, which is a term in itself and does not stand for anything per se; that is, it’s not considered an acronym any longer.
Technically, eBPF is a feature of the Linux kernel and you’ll need the Linux kernel
version 3.18 or above to benefit from it. It enables you to safely and efficiently extend
the Linux kernel functions by using the bpf(2)
syscall (see also the
man pages for details). eBPF
is implemented as an in-kernel virtual machine using a custom 64-bit RISC instruction set.
In Figure 5-7 you see a high-level overview taken from Brendan Gregg’s Linux Extended BPF (eBPF) Tracing Tools (Addison-Wesley) .
This all looks promising, but is eBPF already used in the wild, and also, which options are available? Let’s take a look.
In 2021, eBPF is already used in a number of places and for use cases such as:
In Kubernetes, as a CNI plug-in to enable, for example, pod networking in Cilium and Project Calico, as well as for service scalability (in the context of kube-proxy
)
For observability, like for Linux kernel tracing such as with iovisor/bpftrace as well as in a clustered setup with Hubble
As a security control, for example to perform container runtime scanning as you can use with projects such as CNCF Falco, but also for enforcing network policies in Kubernetes (via Cilium, Calico, etc.) as discussed in “Traffic Flow Control”
Network load balancing like Facebook’s L4 katran library
Low-level intrusion detection systems (IDS) for Kubernetes (see Chapter 9 for details)
We see an increasing number of players entering the eBPF field and leading the charge is Isovalent. While it’s still early days from an adoption perspective, eBPF has a huge potential. Coming back to the service mesh data plane: it is perfectly doable and thinkable to implement the Envoy APIs as a set of eBPF programs and push the handling from user space sidecar proxy into the kernel.
Extending the kernel with userspace programs sounds interesting, but how does that look, in practice?
Let’s have a look at an example from the Cilium
project. The following is a Go program available in
main.go and
demonstrates how you can attach an eBPF program (written in C) to a kernel symbol.
The overall result of the exercise is that whenever the sys_execve
syscall is
invoked, a kernel counter is increased, which the Go program then reads and
prints out the number of times the probed symbol has been called per second.
The following line in main.go (edited to fit the page; should all be on the same line) instructs the Go toolchain to include the compiled C program that contains our eBPF code:
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go
-
cc
clang
-
11
KProbeExample
.
/
bpf
/
kprobe_example
.
c
--
-
I
..
/
headers
In kprobe_example.c we find the eBPF program itself:
#
include "common.h"
#
include "bpf_helpers.h"
char
__license
[
]
SEC
(
"
license
"
)
=
"
Dual MIT/GPL
"
;
struct
bpf_map_def
SEC
(
"
maps
"
)
kprobe_map
=
{
.
type
=
BPF_MAP_TYPE_ARRAY
,
.
key_size
=
sizeof
(
u32
)
,
.
value_size
=
sizeof
(
u64
)
,
.
max_entries
=
1
,
}
;
SEC
(
"
kprobe/sys_execve
"
)
int
kprobe_execve
(
)
{
u32
key
=
0
;
u64
initval
=
1
,
*
valp
;
valp
=
bpf_map_lookup_elem
(
&
kprobe_map
,
&
key
)
;
if
(
!
valp
)
{
bpf_map_update_elem
(
&
kprobe_map
,
&
key
,
&
initval
,
BPF_ANY
)
;
return
0
;
}
__sync_fetch_and_add
(
valp
,
1
)
;
return
0
;
}
You must define a license.
Enables exchange of data between kernel and userspace.
The entry point of our eBPF probe (program).
As you can guess, writing eBPF by hand is not fun. Luckily there are a number of great tools and environments available that take care of the low-level stuff for you.
Just as we were wrapping up the book writing, the Linux Foundation announced that Facebook, Google, Isovalent, Microsoft, and Netflix joined together to create the eBPF Foundation, and with it giving the eBPF project a vendor-neutral home. Stay tuned!
To dive deeper into the eBPF topic we suggest you read Linux Observability with BPF by David Calavera and Lorenzo Fontana (O’Reilly). If you’re looking for a quick overview, Matt Oswalt has a nice Introduction to eBPF.
To stay on top of things, have a look at ebpf.io and check out what the community publishes on the YouTube channel for this topic.
Further, have a look at Pixie, an open source, eBPF-based observability tool with an active community and broad industry support (see Figure 5-8).
Summing up, there are a number of defaults in the Kubernetes networking space you want to be aware of. As a baseline, you can apply the good practices you know from a noncontainerized environment in combination with intrusion detection tooling as shown in Chapter 9. In addition, you want to use native resources such as network policies potentially in combination with other CNCF projects such as SPIFFE for workload identity to strengthen your security posture.
Service meshes, while still in the early days, are another promising option to enforce policies and gain insights into what is going on. Last but not least, eBPF is the up and coming star in the networking arena, enabling a number of security-related use cases.
Now that we have the networking secured, we are ready for the Captain to move on to more “solid” grounds: storage.