Kubernetes Troubleshooting Guide: Debug Like a Pro
Troubleshooting accounts for 30% of the CKA exam—the largest single domain. Whether you’re preparing for the exam or fixing production issues, this systematic approach will help you debug any Kubernetes problem.
The Troubleshooting Framework
Step 1: Identify the Problem Layer
Kubernetes issues typically fall into these categories:
- Pod-level issues (CrashLoopBackOff, Pending, ImagePullBackOff)
- Service/Networking issues (Can’t reach service, DNS failures)
- Node-level issues (NotReady, resource exhaustion)
- Cluster-level issues (API server, etcd, scheduler)
Step 2: Gather Information
# Check pod status
kubectl get pods -o wide
# Describe the problematic resource
kubectl describe pod my-pod
# Check events
kubectl get events --sort-by='.lastTimestamp'
# View logs
kubectl logs my-pod --previous # for crashed containersCommon Issues and Solutions
Pod Stuck in Pending
Possible causes:
- Insufficient cluster resources
- Node selector/affinity not matching
- PVC not bound
Debug commands:
kubectl describe pod my-pod | grep -A 10 Events
kubectl get nodes -o wide
kubectl describe nodes | grep -A 5 "Allocated resources"CrashLoopBackOff
Possible causes:
- Application error
- Missing configuration
- Incorrect container command
Debug commands:
kubectl logs my-pod --previous
kubectl describe pod my-pod
kubectl get pod my-pod -o yaml | grep -A 20 containersImagePullBackOff
Possible causes:
- Wrong image name/tag
- Private registry authentication
- Network issues
Debug commands:
kubectl describe pod my-pod | grep -A 5 "Events"
kubectl get pod my-pod -o yaml | grep image:Service Not Reachable
Debugging steps:
# Check service endpoints
kubectl get endpoints my-svc
# Check if pods have correct labels
kubectl get pods --show-labels
# Test DNS resolution
kubectl run tmp --image=busybox --rm -it -- nslookup my-svc
# Test connectivity
kubectl run tmp --image=busybox --rm -it -- wget -O- my-svc:80Node NotReady
Debug commands:
# Check node conditions
kubectl describe node worker-1
# SSH to node and check kubelet
systemctl status kubelet
journalctl -u kubelet -f
# Check container runtime
systemctl status containerdCKA Troubleshooting Scenarios
The exam often includes:
- Fix a broken kubelet
- Restore a failed etcd
- Debug networking issues
- Fix a misconfigured pod
- Recover crashed control plane components
Practice Troubleshooting Scenarios
Reading about troubleshooting isn’t enough. You need hands-on practice with broken clusters. Sailor.sh provides real troubleshooting scenarios where things are intentionally broken for you to fix.
Start practicing today: Sailor.sh Mock Exams