Kubernetes Troubleshooting Guide: Debug Like a Pro

Troubleshooting accounts for 30% of the CKA exam—the largest single domain. Whether you’re preparing for the exam or fixing production issues, this systematic approach will help you debug any Kubernetes problem.

The Troubleshooting Framework

Step 1: Identify the Problem Layer

Kubernetes issues typically fall into these categories:

Pod-level issues (CrashLoopBackOff, Pending, ImagePullBackOff)
Service/Networking issues (Can’t reach service, DNS failures)
Node-level issues (NotReady, resource exhaustion)
Cluster-level issues (API server, etcd, scheduler)

Step 2: Gather Information

# Check pod status
kubectl get pods -o wide

# Describe the problematic resource
kubectl describe pod my-pod

# Check events
kubectl get events --sort-by='.lastTimestamp'

# View logs
kubectl logs my-pod --previous  # for crashed containers

Common Issues and Solutions

Pod Stuck in Pending

Possible causes:

Insufficient cluster resources
Node selector/affinity not matching
PVC not bound

Debug commands:

kubectl describe pod my-pod | grep -A 10 Events
kubectl get nodes -o wide
kubectl describe nodes | grep -A 5 "Allocated resources"

CrashLoopBackOff

Possible causes:

Application error
Missing configuration
Incorrect container command

Debug commands:

kubectl logs my-pod --previous
kubectl describe pod my-pod
kubectl get pod my-pod -o yaml | grep -A 20 containers

ImagePullBackOff

Possible causes:

Wrong image name/tag
Private registry authentication
Network issues

Debug commands:

kubectl describe pod my-pod | grep -A 5 "Events"
kubectl get pod my-pod -o yaml | grep image:

Service Not Reachable

Debugging steps:

# Check service endpoints
kubectl get endpoints my-svc

# Check if pods have correct labels
kubectl get pods --show-labels

# Test DNS resolution
kubectl run tmp --image=busybox --rm -it -- nslookup my-svc

# Test connectivity
kubectl run tmp --image=busybox --rm -it -- wget -O- my-svc:80

Node NotReady

Debug commands:

# Check node conditions
kubectl describe node worker-1

# SSH to node and check kubelet
systemctl status kubelet
journalctl -u kubelet -f

# Check container runtime
systemctl status containerd

CKA Troubleshooting Scenarios

The exam often includes:

Fix a broken kubelet
Restore a failed etcd
Debug networking issues
Fix a misconfigured pod
Recover crashed control plane components

Practice Troubleshooting Scenarios

Reading about troubleshooting isn’t enough. You need hands-on practice with broken clusters. Sailor.sh provides real troubleshooting scenarios where things are intentionally broken for you to fix.

Start practicing today: Sailor.sh Mock Exams