How to Debug a Crash Looping Kubernetes StatefulSet Without Downtime

A crash-looping Kubernetes StatefulSet's pod can’t be shelled into, and its volume can’t be shared with another pod, making debugging difficult. Learn how to overcome this issue without causing downtime.

The Problem

You can’t shell into the pod, either due to the crash loop or the container lacking a shell.
You can’t spin up a debug pod with the same volume mounted since only one pod at a time can mount it.
Scaling down the StatefulSet could cause downtime, especially if the affected pod is pod-0.

Solution

Backup the StatefulSet and delete it (without deleting the pods, but orphaning them)

kubectl get statefulset <statefulset_name> -n <namespace> -o yaml > /tmp/statefulset-backup.yaml
kubectl delete statefulset <statefulset_name> -n <namespace> --cascade=orphan

Create an almost identical debug pod to the original, but with a debug image or a different command. Then, delete the original pod freeing the debug pod to mount the volume.

kubectl debug <statefulset-pod-name> -n <namespace> --copy-to=<statefulset-pod-name>-debug \
  --container=<container_name_of_pod> \
  --image=ubuntu \
  -- /bin/sh -c "tail -f /dev/null"
kubectl delete pod <statefulset-pod-name> -n <namespace>

Shell into the debug pod and resolve the issue.
Cleanup and restore the StatefulSet:

kubectl delete pod <statefulset-pod-name>-debug -n <namespace>
kubectl apply -f /tmp/statefulset-backup.yaml

That's it 🙂.

Need help with Kubernetes or want to upgrade your team's skills?
Let's connect

Get notified when I post