Try Sailor Cloud - 25% off!

Claim Now
Back to all posts

Kubernetes Troubleshooting Guide: Debug Like a Pro

Kubernetes Troubleshooting Guide: Debug Like a Pro

Troubleshooting accounts for 30% of the CKA exam—the largest single domain. Whether you’re preparing for the exam or fixing production issues, this systematic approach will help you debug any Kubernetes problem.

The Troubleshooting Framework

Step 1: Identify the Problem Layer

Kubernetes issues typically fall into these categories:

  1. Pod-level issues (CrashLoopBackOff, Pending, ImagePullBackOff)
  2. Service/Networking issues (Can’t reach service, DNS failures)
  3. Node-level issues (NotReady, resource exhaustion)
  4. Cluster-level issues (API server, etcd, scheduler)

Step 2: Gather Information

# Check pod status
kubectl get pods -o wide

# Describe the problematic resource
kubectl describe pod my-pod

# Check events
kubectl get events --sort-by='.lastTimestamp'

# View logs
kubectl logs my-pod --previous  # for crashed containers

Common Issues and Solutions

Pod Stuck in Pending

Possible causes:

  • Insufficient cluster resources
  • Node selector/affinity not matching
  • PVC not bound

Debug commands:

kubectl describe pod my-pod | grep -A 10 Events
kubectl get nodes -o wide
kubectl describe nodes | grep -A 5 "Allocated resources"

CrashLoopBackOff

Possible causes:

  • Application error
  • Missing configuration
  • Incorrect container command

Debug commands:

kubectl logs my-pod --previous
kubectl describe pod my-pod
kubectl get pod my-pod -o yaml | grep -A 20 containers

ImagePullBackOff

Possible causes:

  • Wrong image name/tag
  • Private registry authentication
  • Network issues

Debug commands:

kubectl describe pod my-pod | grep -A 5 "Events"
kubectl get pod my-pod -o yaml | grep image:

Service Not Reachable

Debugging steps:

# Check service endpoints
kubectl get endpoints my-svc

# Check if pods have correct labels
kubectl get pods --show-labels

# Test DNS resolution
kubectl run tmp --image=busybox --rm -it -- nslookup my-svc

# Test connectivity
kubectl run tmp --image=busybox --rm -it -- wget -O- my-svc:80

Node NotReady

Debug commands:

# Check node conditions
kubectl describe node worker-1

# SSH to node and check kubelet
systemctl status kubelet
journalctl -u kubelet -f

# Check container runtime
systemctl status containerd

CKA Troubleshooting Scenarios

The exam often includes:

  1. Fix a broken kubelet
  2. Restore a failed etcd
  3. Debug networking issues
  4. Fix a misconfigured pod
  5. Recover crashed control plane components

Practice Troubleshooting Scenarios

Reading about troubleshooting isn’t enough. You need hands-on practice with broken clusters. Sailor.sh provides real troubleshooting scenarios where things are intentionally broken for you to fix.

Start practicing today: Sailor.sh Mock Exams