Vid Bregar
Published on

How to Debug a Crash Looping Kubernetes StatefulSet Without Downtime

Authors
  • avatar
    Name
    Vid Bregar
    Twitter

A crash-looping Kubernetes StatefulSet's pod can’t be shelled into, and its volume can’t be shared with another pod, making debugging difficult. Learn how to overcome this issue without causing downtime.

The Problem

  • You can’t shell into the pod, either due to the crash loop or the container lacking a shell.
  • You can’t spin up a debug pod with the same volume mounted since only one pod at a time can mount it.
  • Scaling down the StatefulSet could cause downtime, especially if the affected pod is pod-0.

Solution

  • Backup the StatefulSet and delete it (without deleting the pods, but orphaning them)
kubectl get statefulset <statefulset_name> -n <namespace> -o yaml > /tmp/statefulset-backup.yaml
kubectl delete statefulset <statefulset_name> -n <namespace> --cascade=orphan
  • Create an almost identical debug pod to the original, but with a debug image or a different command. Then, delete the original pod freeing the debug pod to mount the volume.
kubectl debug <statefulset-pod-name> -n <namespace> --copy-to=<statefulset-pod-name>-debug \
  --container=<container_name_of_pod> \
  --image=ubuntu \
  -- /bin/sh -c "tail -f /dev/null"
kubectl delete pod <statefulset-pod-name> -n <namespace>
  • Shell into the debug pod and resolve the issue.
  • Cleanup and restore the StatefulSet:
kubectl delete pod <statefulset-pod-name>-debug -n <namespace>
kubectl apply -f /tmp/statefulset-backup.yaml

That's it 🙂.


Need help with Kubernetes or want to upgrade your team's skills?
Let's connect


Get notified when I post