Chapter 5. Networking

In this chapter we will focus on networking aspects of your workloads. We will first review the defaults that Kubernetes proper comes equipped with and what else is readily available due to integrations. We cover networking topics including East-West and North-South traffic—that is, intra-pod and inter-pod communication, communication with the worker node (hosts), cluster-external communication, workload identity, and encryption on the wire.

In the second part of this chapter we have a look at two more recent additions to the Kubernetes networking toolbox: service meshes and the Linux kernel extension mechanism eBPF. We try to give you a rough idea if, how, and where you can, going forward, benefit from both.

As you can see in Figure 5-1, there are many moving parts in the networking space.

Network layer model
Figure 5-1. Network layer model

The good news is that most if not all of the protocols should be familiar to you, since Kubernetes uses the standard Internet Engineering Task Force (IETF) suite of networking protocols, from the Internet Protocol to the Domain Name System (DNS). What changes, really, is the scope and generally the assumptions about how the protocols are used. For example, when deployed on a worldwide scale, it makes sense to make the time-to-live (TTL) of a DNS record months or longer.

In the context of a container that may run for hours or days at best, this assumption doesn’t hold anymore. Clever adversaries can exploit such assumptions and as you should know by now, that’s exactly what the Captain would do.

In this chapter we will focus on the protocols most often used in Kubernetes—and their weak points with respect to workloads. As Captain Hashjack likes to say, “loose lips sink ships,” so we’ll first explore for permissive networking defaults, then show how to attack them as well as discuss the controls you can implement to detect and mitigate these attacks.

Defaults

With defaults we mean the default values of configurations of components that you get when you use Kubernetes from source, in an unmodified manner. From a networking perspective, workloads in Kubernetes have the following setup:

  • Flat topology. Every pod can see and talk to every other pod in the cluster.

  • No securityContext. Workloads can escalate to host network interface controller (NIC).

  • No environmental restrictions. Workloads can query their host and cloud metadata.

  • No identity for workloads.

  • No encryption on the wire (between pods and cluster-external traffic).

While the preceding list might look scary, a different way to look at it might make it easier to assess the risks present. As depicted in Figure 5-2, the main communication paths in Kubernetes are as follows:

Kubernetes networking overview
Figure 5-2. Kubernetes networking overview

Let’s now have a closer look at the communication paths and other networking-relevant defaults in Kubernetes. Among other things, we’ll discuss “The State of the ARP”, “No securityContext”, “No Workload Identity”, and “No Encryption on the Wire”.

Note

There are some aspects of the networking space that depend heavily on the environment in which Kubernetes is used. For example, when using hosted Kubernetes from one of the cloud providers, the control plane and/or data plane may or may not be publicly available. If you are interested in learning more how the big three handle this, have a look at:

Since this is not an intrinsic property of Kubernetes and many combinations are possible, we decided to exclude this topic from our discussion in this chapter.

So, are you ready to learn about the Kubernetes networking defaults?

Intra-Pod Networking

The way intra-pod networking in Kubernetes works is as follows. An implicit so-called pause container in a pod (cp in Figure 5-3) spans a Linux network namespace.

Internals of a Kubernetes pod
Figure 5-3. Internals of a Kubernetes pod

Other containers in the pod, such as init containers (like ci1 and ci2), and the main application container and sidecars, such as proxies or logging containers (for example, c1 to c3), then join the pause container’s network and IPC namespace.

The pause container has the network bridge mode enabled and all the other containers in the pod are sharing their namespace via container mode.

As discussed in Chapter 2, pods were designed to make it easy to lift and shift existing applications into Kubernetes, which has sobering security implications. Ideally, you rewrite the application so that the tight coupling of containers in a pod are not necessary or deploy traditional tooling in the context of a pod.

While the latter seems like a good idea initially, do remember that this is a stopgap measure at best. Once the boundaries are clear and effectively every microservice is deployed in its own pod, you can go ahead and use the techniques discussed in the next sections.

In addition, no matter if you’re looking at defense in depth in the context of a pod or cluster-wide, you can employ a range of dedicated container security open source and commercial offerings. See also respective section “Networking” in Appendix B.

Cluster-External Traffic

To allow pods to communicate with cluster-external endpoints, Kubernetes has added a number of mechanisms over time. The most recent and widely used is called an Ingress. This allows for layer 7 routing (HTTP), whereas for other use cases such as layer 3/4 routing you would need to use older, less convenient methods. See also Publishing Services (ServiceTypes) in the docs.

In order for you to use the Ingress resource, you will need to pick an ingress controller. You have many many choices, oftentimes open source-based, which include:

In addition, cloud providers usually provide their own solutions, integrated with their managed load-balancing services.

Encryption on the wire (TLS) is almost the default nowadays, and most Ingress solutions support it out of the box. Alternatively, you can use a service mesh for securing your North-South traffic (see “Service Meshes”).

Last but not least, on the application level you might want to consider using a web application firewall (WAF) such as offered by most cloud providers, or you can use standalone offering such as Wallarm.

More and more practitioners are sharing their experiences in this space, so keep an eye out for blog posts and CNCF webinars covering this topic. See, for example, “Shaping Chick-fil-A One Traffic in a Multi-Region Active-Active Architecture”.

The State of the ARP

Address Resolution Protocol (ARP) is a link layer protocol used by the Internet Protocol (IP) to map IP network addresses to the hardware (MAC) addresses. Liz Rice showed in her KubeCon NA 2019 talk, “CAP_NET_RAW and ARP Spoofing in Your Cluster: It’s Going Downhill From Here”, how defaults allow us to open raw network sockets and how this can lead to issues.

It involves using ARP and DNS to fool a victim pod to visit a fake URL, which is possible due to the way Kubernetes handles local FQDNs, and it requires that CAP_NET_RAW is available to a pod.

For more details, see the Aqua Security blog post “DNS Spoofing on Kubernetes Clusters”.

The good news is, there are defenses available to mitigate the ARP-based attacks and spoil the Captain’s mood:

How can you tell if you’re affected? Use kube-hunter, for example.

No securityContext

By default, workloads can escalate to the NIC of the worker node they are running on. For example, when running privileged containers, one can escape from the container using kernel modules. Further, as the Microsoft Azure team pointed out in its “Threat matrix for Kubernetes” blog post:

Attackers with network access to the host (for example, via running code on a compromised container) can send API requests to the Kubelet API. Specifically querying https://[NODE IP]:10255/pods/ retrieves the running pods on the node. https://[NODE IP]:10255/spec/ retrieves information about the node itself, such as CPU and memory consumption.

Naturally, one wants to avoid these scenarios and one way to go about this is to apply PSPs, as discussed in “Runtime Policies”.

For example, the Baseline/default policy has the following defined:

Sharing the host namespaces must be disallowed
  • spec.hostNetwork

  • spec.hostPID

  • spec.hostIPC

Privileged pods disable most security mechanisms and must be disallowed
  • spec.containers[*].securityContext.privileged

  • spec.initContainers[*].securityContext.privileged

HostPorts should be disallowed or at minimum restricted to a known list
  • spec.containers[*].ports[*].hostPort

  • spec.initContainers[*].ports[*].hostPort

In addition, there are a number of commercial offerings, such as Palo Alto Networks Prisma Cloud (formerly Twistlock), that you can use to harden your worker nodes in this context.

No Encryption on the Wire

For workloads in regulated industries, that is, any kind of app that is required to conform to a (government issued) regulation, encryption on the wire—or encryption in transit, as it’s sometimes called—is typically one of the requirements. For example, if you have a Payment Card Industry Data Security Standard (PCI DSS)–compliant app as a bank, or a Health Insurance Portability and Accountability Act (HIPAA)–compliant app as a health care provider, you will want to make sure that the communication between your containerized microservices is protected against sniffing and person-in-the-middle attacks.

These days, the Transport Layer Security (TLS) protocol as defined in RFC 8446 and older IETF paperwork is usually used to encrypt traffic on the wire. It uses asymmetric encryption to agree on a shared secret negotiated at the beginning of the session (“handshake”) and in turn symmetric encryption to encrypt the workload data. This setup is a nice performance versus security trade-off.

While control plane components such as the API server, etcd, or a kubelet can rely on an PKI infra out-of-the-box, providing APIs and good practices for certificates, the same is sadly not true for your workloads.

Tip

You can see the API server’s hostname, and any IPs encoded into its TLS certificate, with openssl. You can find example code for this in “Control Plane”.

By default, the traffic between pods and to the outside world is not encrypted. To mitigate, enable workload encryption on the wire, for example with Calico, with Wireguard VPN, or with Cilium, which supports both Wireguard and IPsec. Another option to provide not only this sort of encryption but also workload identity, as discussed in “No Workload Identity”, are service meshes. With the defaults out of the way, let’s move on to the threat modeling for the networking space.

Threat Model

The threat model in the networking space (see “Starting to Threat Model”); that is, the collection of identified networking vulnerabilities according to the risk they pose—is what we’re focusing on in this section.

So, what is the threat model we consider in the networking space, with respect to workloads? What are our assumptions about what attackers could do to our precious workloads and beyond to the infrastructure?

The following observations should give you an idea about potential threat models. We illustrate these scenarios with some examples of past attacks, covering the 2018–2020 time frame:

  • Using the front door, for example via an ingress controller or a load balancer, and then either pivoting or performing a denial-of-service attack, such as observed in CVE-2020-15127

  • Using developer access paths like kubectl cp (CVE-2019-11249) or developer environments such as Minikube, witnessed in CVE-2018-1002103

  • Launching a pod with access to host networking or unnecessary capabilities, as we will further discuss in “The State of the ARP”

  • Leveraging a compromised workload to connect to another workload

  • Port scanning of all CNI plug-ins and further use this information to identify vulnerabilities; for example, CVE-2019-9946

  • Attacking a control plane component such as the API server and etcd or a kubelet or kube-proxy on the worker; for example, CVE-2020-8558, CVE-2019-11248, CVE-2019-11247, and CVE-2018-1002105

  • Performing server-side request forgery (SSRF); for example, concerning the hosting environment, like a cloud provider’s VMs

  • Performing person-in-the-middle attacks, such as seen in the context of IPv6 routing; see also CVE-2020-10749

Now that we have a basic idea of the potential threat model, let’s go through and see how the defaults can be exploited and defended against, in turn.

Traffic Flow Control

We’ve seen the networking defaults and what kind of communication paths are present in Kubernetes. In this section, we walk you through an end-to-end setup and show you how to secure the external traffic using network policies.

The Setup

To demonstrate the networking defaults in action, let’s use kind, a tool for running local Kubernetes clusters using Docker containers.

Let’s create a kind cluster with networking prepared for Calico and enable Ingress (also see the documentation). We are using the following config:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true" 1
  extraPortMappings:
  - containerPort: 80
    hostPort: 80
    protocol: TCP
  - containerPort: 443
    hostPort: 443
    protocol: TCP
- role: worker
networking:
  disableDefaultCNI: true 2
  podSubnet: 192.168.0.0/16 3
1

Enable Ingress for cluster.

2

Disable the native kindnet.

3

In preparation to install Calico, set to its default subnet.

Assuming the preceding YAML snippet is stored in a file called cluster-config.yaml, you can now create the kind cluster as follows:

$ kind create cluster --name cnnp \
  --config cluster-config.yaml

Creating cluster "cnnp" ...

Note that if you do this the first time, the preceding output might look different and it can take several minutes to pull the respective container images.

Next we install and patch Calico to make it work with kind. Kudos to Alex Brand for putting together the necessary patch instructions:

$ kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
configmap/calico-config created
customresourcedefinition.apiextensions.k8s.io/bgpconfigurations.crd.projectcalico.org created
...
serviceaccount/calico-kube-controllers created

$ kubectl -n kube-system set env daemonset/calico-node FELIX_IGNORELOOSERPF=true
daemonset.apps/calico-node env updated

And to verify if everything is up and running as expected:

$ kubectl -n kube-system get pods | grep calico-node
calico-node-2j2wd     0/1     Running     0     18s
calico-node-4hx46     0/1     Running     0     18s
calico-node-qnvs6     0/1     Running     0     18s

Before we can deploy our app, we need one last bit of infrastructure in place, a load balancer, making the pods available to the outside world (your machine).

For this we use Ambassador as an ingress controller:

$ kubectl apply -f https://github.com/datawire/ambassador-operator/releases/latest
/download/ambassador-operator-crds.yaml && \
  kubectl apply -n ambassador -f https://github.com/datawire/ambassador-operator/releases
/latest/download/ambassador-operator-kind.yaml && \
  kubectl wait --timeout=180s \
-n ambassador --
for=condition=deployed \
ambassadorinstallations/ambassador
customresourcedefinition.apiextensions.k8s.io/ambassadorinstallations.getambassador.io created
namespace/ambassador created
configmap/static-helm-values created
serviceaccount/ambassador-operator created
clusterrole.rbac.authorization.k8s.io/ambassador-operator-cluster created
clusterrolebinding.rbac.authorization.k8s.io/ambassador-operator-cluster created
role.rbac.authorization.k8s.io/ambassador-operator created
rolebinding.rbac.authorization.k8s.io/ambassador-operator created
deployment.apps/ambassador-operator created
ambassadorinstallation.getambassador.io/ambassador created
ambassadorinstallation.getambassador.io/ambassador condition met

Now we can launch the application, a web server. First off, we want to do all of the following in a dedicated namespace called npdemo, so let’s create one:

$ kubectl create ns npdemo
namespace/npdemo created

Next, create a YAML file called workload.yaml that defines a deployment, a service, and an ingress resource, in total representing our workload application:

kind: Deployment
apiVersion: apps/v1
metadata:
  labels:
    app: nginx
  name: nginx
spec:
  replicas: 1
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - image: nginx:alpine
        name: main
        ports:
        - containerPort: 80
---
kind: Service
apiVersion: v1
metadata:
  name: nginx
spec:
  selector:
    app: nginx
  ports:
  - port: 80
---
kind: Ingress 1
apiVersion: extensions/v1beta1
metadata:
  name: mainig
  annotations:
    kubernetes.io/ingress.class: ambassador
spec:
  rules:
  - http:
      paths:
      - path: /api
        backend:
          serviceName: nginx
          servicePort: 80
1

We configure the ingress in a way that if we hit the /api URL path we expect it to route traffic to our nginx service.

Next, you want to create the resources defined in workload.yaml by using:

$ kubectl -n npdemo apply -f workload.yaml
deployment.apps/nginx created
service/nginx created
ingress.extensions/mainig created

When you now try to access the app as exposed in the Ingress resource you should be able to do the following (note that we’re only counting the lines returned to verify we get something back):

$ curl -s 127.0.0.1/api | wc -l

  25

Wait. What just happened? We put an Ingress in front of the NGINX service and it happily receives traffic from outside? That can’t be good.

Network Policies to the Rescue!

So, how can we keep the Captain and their crew from getting their dirty paws on our cluster? Network policies are coming to our rescue. While we will cover policies in a dedicated chapter (see Chapter 8), we point out network policies and their usage here since they are so useful and, given the “by default all traffic is allowed” attitude of Kubernetes, one can argue almost necessary.

While Kubernetes allows you to define and apply network policies out-of-the-box, you need something that enforces the policies you define and that’s the job of a provider.

For example, in the following walkthrough, we will be using Calico, however there are many more options available, such as the eBPF-based solutions discussed in “eBPF”.

We shut down all traffic with the following Kubernetes network policy in a file called, fittingly, np-deny-all.yaml:

kind: NetworkPolicy
apiVersion: networking.k8s.io/v1
metadata:
  name: deny-all
spec:
  podSelector: {} 1
  policyTypes:
  - Ingress 2
1

Selects the pods in the same namespace, in our case all.

2

Disallow any Ingress traffic.

Tip

Network policies are notoriously difficult to get right, so in this context, you may want to check out the following:

So let’s apply the preceding network policy and see if we can still access the app from outside of the cluster:

$ kubectl -n npdemo apply -f np-deny-all.yaml
networkpolicy.networking.k8s.io/deny-all created

$ kubectl -n npdemo describe netpol deny-all
Name:         deny-all
Namespace:    npdemo
Created on:   2020-09-22 10:39:27 +0100 IST
Labels:       <none>
Annotations:  <none>
Spec:
PodSelector:  <none> (Allowing the specific traffic to all pods in this namespace)
  Allowing ingress traffic:
    <none> (Selected pods are isolated for ingress connectivity)
  Not affecting egress traffic
  Policy Types: Ingress

And this should fail now, based on our network policy (giving it a 3-second time out, just to be sure):

$ curl --max-time 3 127.0.0.1/api
curl: (28) Operation timed out after 3005 milliseconds with 0 bytes received
Tip

If you only have kubectl available, you can still make raw network requests, as Rory McCune pointed out:

kubectl --insecure-skip-tls-verify -s bbc.co.uk get --raw /

Of course, it shouldn’t be in your container image in the first place!

We hope by now you get an idea how dangerous the defaults are—all network traffic to and from pods is allowed—and how you can defend against it.

Learn more about network policies, including recipes, tips, and tricks, via the resources we put together in Appendix B.

Tip

In addition to network policies, some cloud providers offer other native mechanisms to restrict traffic from/to pods; for example, see AWS security groups for pods.

Finally, don’t forget to clean up your Kubernetes cluster using kind delete cluster --name cnnp, once you’re done exploring the topic of network policies.

Now that we’ve seen a concrete networking setup in action, let’s move on to a different topic: service meshes. This relatively recent technology can help you in addressing some of the not-so-secure defaults discussed earlier, including workload identity and encryption on the wire.

Service Meshes

A somewhat advanced topic, a service mesh is in a sense complementary to Kubernetes and can be beneficial in a number of use cases. Let’s have a look at how the most important workload-level networking issues can be addressed using a service mesh.

Options and Uptake

At the time of writing, a number of service meshes exist as well as proposed quasi-standards for interoperability, such as the CNCF project Service Mesh Interface or work of the Envoy-based Universal Data Plane API Working Group (UDPA-WG).

While it is early days, we witness certain uptake, especially out of security considerations (see Figure 5-5). For example, The New Stack (TNS) reports in its 2020 Service Mesh survey:

A third of respondents’ organizations are using service meshes to control communications traffic between microservices in production Kubernetes environments. Another 34% use service mesh technology in a test environment, or are piloting or actively evaluating solutions.

TNS 2020 service mesh survey excerpt
Figure 5-5. TNS 2020 service mesh survey excerpt

Going forward, many exciting application areas and nifty defense mechanisms based on service meshes are possible—for example, Identity Federation for Multi-Cluster Kubernetes and Service Mesh or using OPA in Istio. That said, many end users are not yet ready to go all in and/or are in a holding pattern, waiting for cloud and platform providers to make the data plane of the service mesh part of the underlying infrastructure. Alternatively, the data plane may be implemented on the operating system level, for example, using eBPF.

Case Study: mTLS with Linkerd

Linkerd is a graduated CNCF project, originally created by Buoyant.

Linkerd automatically enables mutual Transport Layer Security (mTLS) for most HTTP-based communication between meshed pods. Let’s see that in action.

To follow along, install Linkerd in a test cluster. We’re using kind in the following example and assume you have both the Kubernetes cluster set up and configured as well as the Linkerd CLI:

$ linkerd check --pre
kubernetes-api
...
Status check results are √

Now that we know that we’re in a position to install Linkerd, let’s go ahead and do it:

$ linkerd install | kubectl apply -f -
namespace/linkerd created
clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created
...
deployment.apps/linkerd-grafana created

And finally verify the install:

$ linkerd check
kubernetes-api
...
Status check results are √

Great! All up and running. You could have a quick look at the Linkerd dashboard using linkerd dashboard &, which should show something similar to what’s depicted in Figure 5-6.

OK, back to mTLS: once we have enabled the mesh in the respective namespaces it should be impossible for us, even from within the cluster, to directly talk to a service using, say curl, and doing an HTTP query. Let’s see how that works.

In the following example, we’re reusing the setup from “Inter-Pod Traffic” but you can really use any workload that exposes an HTTP service within the cluster.

Linkerd dashboard showing example traffic stats
Figure 5-6. Linkerd dashboard showing example traffic stats

First, we need to enable the mesh, or meshify, as the good folks from Buoyant call it:

$ kubectl get -n npdemo deploy -o yaml | \
          linkerd inject - | kubectl apply -f -


$ kubectl get -n ambassador deploy -o yaml | \
          linkerd inject - | kubectl apply -f -

Now we can validate our mTLS setup using tshark as follows:

$ curl -sL https://run.linkerd.io/emojivoto.yml |
  linkerd inject --enable-debug-sidecar - |
  kubectl apply -f -
namespace "emojivoto" injected
...
deployment.apps/web created

Once the sample app is up and running we can use an remote shell into the attached debug container that Linkerd kindly put there for us:

$ kubectl -n emojivoto exec -it \ 1
  $(kubectl -n emojivoto get po -o name | grep voting) \ 2
  -c linkerd-debug -- /bin/bash 3
1

Connect to pod for interactive (terminal) use.

2

Provide pod name for the exec command.

3

Target the linkerd-debug container in the pod.

Now, from within the debug container we use tshark to inspect the packets on the NIC and expect to see TLS traffic (output edited to fit):

root@voting-57bc56-s4l:/# tshark -i any \ 1
                                 -d tcp.port==8080,ssl | 2
                          grep -v 127.0.0.1 3

Running as user "root" and group "root." This could be dangerous.
Capturing on 'any'

 1 0.000000000 192.168.49.192  192.168.49.231 TCP 76 41704  4191 [SYN] Seq=0...
 2 0.000023419 192.168.49.231  192.168.49.192 TCP 76 4191  41704 [SYN, ACK]...
 3 0.000041904 192.168.49.192  192.168.49.231 TCP 68 41704  4191 [ACK] Seq=1...
 4 0.000356637 192.168.49.192  192.168.49.231 HTTP 189 GET /ready HTTP/1.1
 5 0.000397207 192.168.49.231  192.168.49.192 TCP 68 4191  41704 [ACK] Seq=1...
 6 0.000483689 192.168.49.231  192.168.49.192 HTTP 149 HTTP/1.1 200 OK
 ...
1

Listen on all available network interfaces for live packet capture.

2

Decode any traffic running over port 8080 as TLS.

3

Ignoring 127.0.0.1 (localhost) as this traffic will always be unencrypted.

Yay, it works, encryption on the wire for free! And with this we’ve completed the mTLS case study.

If you want to learn more about how to use service meshes to secure your East-West communication, we have put together some suggested further reading in “Networking” in Appendix B.

While service meshes certainly can help you with networking-related security challenges, fending off the Captain and their crew, you should be aware of weaknesses. For example, from Envoy-based systems, if you run a container with UID 1337, it bypasses the Istio/Envoy sidecar or, by default, the Envoy admin dashboard is accessible from within the container because it shares a network. For more background on this topic, check out the in-depth Istio Security Assessment.

Now it’s time to move on to the last part of the workload networking topic: what happens on a single worker node.

eBPF

After the service mesh adventure, we focus our attention now on a topic that is on the one hand entirely of opposite character and on the other hand can also be viewed and understood to be used in the service mesh data plane. We have a look at eBPF, a modern and powerful way to extend the Linux kernel, and with it you can address a number of networking-related security challenges.

Concept

Originally, this piece of Linux kernel technology was known under the name Berkeley Packet Filter (BPF). Then it experienced a number of enhancements, mainly dirven by Google, Facebook, and Netflix, and to distinguish it from the original implementation it was called eBPF. Nowadays, the kernel project and technology is commonly known as eBPF, which is a term in itself and does not stand for anything per se; that is, it’s not considered an acronym any longer.

Technically, eBPF is a feature of the Linux kernel and you’ll need the Linux kernel version 3.18 or above to benefit from it. It enables you to safely and efficiently extend the Linux kernel functions by using the bpf(2) syscall (see also the man pages for details). eBPF is implemented as an in-kernel virtual machine using a custom 64-bit RISC instruction set.

In Figure 5-7 you see a high-level overview taken from Brendan Gregg’s Linux Extended BPF (eBPF) Tracing Tools (Addison-Wesley) .

eBPF overview in the Linux kernel
Figure 5-7. eBPF overview in the Linux kernel

This all looks promising, but is eBPF already used in the wild, and also, which options are available? Let’s take a look.

Options and Uptake

In 2021, eBPF is already used in a number of places and for use cases such as:

We see an increasing number of players entering the eBPF field and leading the charge is Isovalent. While it’s still early days from an adoption perspective, eBPF has a huge potential. Coming back to the service mesh data plane: it is perfectly doable and thinkable to implement the Envoy APIs as a set of eBPF programs and push the handling from user space sidecar proxy into the kernel.

Extending the kernel with userspace programs sounds interesting, but how does that look, in practice?

Case Study: Attaching a Probe to a Go Program

Let’s have a look at an example from the Cilium project. The following is a Go program available in main.go and demonstrates how you can attach an eBPF program (written in C) to a kernel symbol. The overall result of the exercise is that whenever the sys_execve syscall is invoked, a kernel counter is increased, which the Go program then reads and prints out the number of times the probed symbol has been called per second.

The following line in main.go (edited to fit the page; should all be on the same line) instructs the Go toolchain to include the compiled C program that contains our eBPF code:

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go
  -cc clang-11 KProbeExample ./bpf/kprobe_example.c -- -I../headers

In kprobe_example.c we find the eBPF program itself:

#include "common.h"
#include "bpf_helpers.h"

char __license[] SEC("license") = "Dual MIT/GPL"; 1

struct bpf_map_def SEC("maps") kprobe_map = { 2
    .type = BPF_MAP_TYPE_ARRAY,
    .key_size = sizeof(u32),
    .value_size = sizeof(u64),
    .max_entries = 1,
};

SEC("kprobe/sys_execve")
int kprobe_execve() { 3
    u32 key = 0;
    u64 initval = 1, *valp;

    valp = bpf_map_lookup_elem(&kprobe_map, &key);
    if (!valp) {
        bpf_map_update_elem(&kprobe_map, &key, &initval, BPF_ANY);
        return 0;
    }
    __sync_fetch_and_add(valp, 1);

    return 0;
}
1

You must define a license.

2

Enables exchange of data between kernel and userspace.

3

The entry point of our eBPF probe (program).

As you can guess, writing eBPF by hand is not fun. Luckily there are a number of great tools and environments available that take care of the low-level stuff for you.

Note

Just as we were wrapping up the book writing, the Linux Foundation announced that Facebook, Google, Isovalent, Microsoft, and Netflix joined together to create the eBPF Foundation, and with it giving the eBPF project a vendor-neutral home. Stay tuned!

To dive deeper into the eBPF topic we suggest you read Linux Observability with BPF by David Calavera and Lorenzo Fontana (O’Reilly). If you’re looking for a quick overview, Matt Oswalt has a nice Introduction to eBPF.

To stay on top of things, have a look at ebpf.io and check out what the community publishes on the YouTube channel for this topic.

Further, have a look at Pixie, an open source, eBPF-based observability tool with an active community and broad industry support (see Figure 5-8).

Pixie in action
Figure 5-8. Pixie in action

Conclusion

Summing up, there are a number of defaults in the Kubernetes networking space you want to be aware of. As a baseline, you can apply the good practices you know from a noncontainerized environment in combination with intrusion detection tooling as shown in Chapter 9. In addition, you want to use native resources such as network policies potentially in combination with other CNCF projects such as SPIFFE for workload identity to strengthen your security posture.

Service meshes, while still in the early days, are another promising option to enforce policies and gain insights into what is going on. Last but not least, eBPF is the up and coming star in the networking arena, enabling a number of security-related use cases.

Now that we have the networking secured, we are ready for the Captain to move on to more “solid” grounds: storage.