How to fix the kubernetes node not ready error?

Kubernetes is a powerful platform for managing containerized applications, but it can sometimes be challenging to troubleshoot. An error known as "Node Not Ready" is one that Kubernetes administrators often run into. It is important that you quickly resolve this issue as it may affect the stability and performance of your Kubernetes cluster.

We'll go over how to detect and fix the "Node Not Ready" issue in this blog, so you can have your Kubernetes cluster operating at peak efficiency once again.

What is the Kubernetes Node NotReady Error?

A node that is not now available or not ready to run workloads or pods is indicated by the "Node NotReady" error. In order to maintain the intended state of the cluster, the Kubernetes control plane reschedules the existing pods to other healthy nodes and stops scheduling new pods onto the "NotReady" node. The node controller keeps an eye on each node's health and marks it as NotReady if it doesn't report back within a given grace time.

Nodes that show NotReady status are in the NotReady state:

NAME STATUS AGE VERSION
node-1 Ready 18d v1.29.1
node-2 NotReady 18d v1.29.1

Understanding the Kubernetes Node States

The status and health of Kubernetes nodes can be indicated by their different states. One of the following could apply:

Ready: Having no issues, the node is prepared to run pods. It indicates that the node is functional and capable of supporting workloads.

NotReady: Pods cannot execute on this unhealthy node. It indicates that there are problems with the node. Scheduled pods on this node might be moved to another node or evicted.

SchedulingDisabled: This node's status indicates that it is unschedulable, which means that no new pods may be scheduled on it.

Unknown: The node status is displayed as unknown if the node controller cannot get in contact with the node within the most recent node-monitor-grace-period (by default, 40s).

What are the causes of the Kubernetes node NotReady error?

Lack of Resources: For a Kubernetes cluster's nodes to work well, they need to have enough CPU, memory, and disk space. The node may become unresponsive or unable to efficiently handle its workloads when these resources run out, resulting in a NotReady status. Pods may fail to start or may be evicted due to excessive CPU or memory utilization. Disk pressure is the result of the node being labeled as NotReady once disk use is beyond a predetermined level.

Problems with the Kubelet: The Kubelet is the agent that manages containers and pods and runs on every node. A crashed or incorrectly configured Kubelet prevents it from connecting with the API Server, which can leave the node NotReady. The node status indicates issues such as KubeletNotReady.

Problems with Kube-Proxy: Kube-Proxy is in charge of keeping nodes' network rules up to date. Kube-proxy may malfunction or fail, disrupting network communication and marking the node as NotReady.

Problems with connectivity: In order for nodes to interact with the control plane and other nodes in the cluster, they must have network connectivity. This communication can be disrupted by incorrect network setups, which results in nodes failing to communicate their status and entering a NotReady condition.

How to diagnose Kubernetes Node NotReady Error

Verify the Node Status: If the Kubectl get nodes command indicates that the node is not ready, it is likely unhealthy and causing the error you are seeing. The nodes' current state is provided by this command. A node that has the status "NotReady" means that it is not operating properly and is unable to schedule additional pods.

Kubectl get nodes
NAME STATUS AGE VERSION
node-1 Ready 18d v1.29.1
node-2 NotReady 12d v1.29.1

Examine the conditions and details of the node: You can use the kubectl describe node command to learn more about the node. This command offers detailed information on the node, covering its events and conditions. You can pinpoint specific problems that might be causing the node to be NotReady, such as memory pressure, disk pressure, or network issues, by looking through the conditions section.

kubectl describe node node-2

Conditions:
Type Status Reason Message
---- ------ ------ -------
MemoryPressure True KubeletHasInsufficientMemory kubelet
has insufficient memory available
DiskPressure False KubeletHasNoDiskPressure kubelet
has no disk pressure
PIDPressure False KubeletHasSufficientPID
kubelet has sufficient PID available
Ready False KubeletNotReady
Node is under memory pressure

Kubelet Issue: When all conditions are unknown, the node enters a NotReady state because the kubelet is unavailable.

The kubelet is the only point of contact for the Kubernetes cluster. It controls the lifespan of containers on the node and prevents the node from accurately reporting its status when it is not operating. You can identify this by looking for any errors or problems in the Kubelet logs on the node.

Conditions:
Type Status Reason
---- ------ ------
MemoryPressure Unknown NodeStatusUnknown
DiskPressure Unknown NodeStatusUnknown
PIDPressure Unknown NodeStatusUnknown
Ready Unknown NodeStatusUnknown

Check Kubernetes System Pods: Use the kubectl get pods -n kube-system command to see the current state of the Kubernetes system pods in order to diagnose. These pods are important to the cluster's functioning, and if any of them aren't functioning properly, it may have an impact on the nodes' state.

kubectl get pods -n kube-system
NAME READY STATUS RESTARTS
coredns-558bd4d5db-7x8k4 1/1 Running 0
coredns-558bd4d5db-8x9k5 1/1 Running 0
etcd-master 1/1 Running 0
kube-apiserver-master 1/1 Running 0
kube-controller-manager-master 1/1 Running 0
kube-proxy-4x8k4 0/1 CrashLoopBackOff 5
kube-scheduler-master 1/1 Running 0

Checks for connectivity: You can use the command to find the NetworkUnavailable flag in the conditions section and use it to diagnose connectivity issues. The node is experiencing a connectivity problem if this flag is set to True.

kubectl get node ${NODE_NAME} -o
jsonpath="{.status.conditions[*]}" | jq -c '.[] | .type + " is " +
.status'

Conditions:
Type Status
---- ------
NetworkUnavailable True

How to fix Kubernetes Node NotReady issue

Fix Kubelet Problems: Run systemctl status kubelet after logging in via SSH to fix Kubelet problems. The state may be inactive (dead), active (running), or active (exited).

active (running): The problem might be elsewhere, but the kubelet is up and running.
active (exited): A possible cause for the kubelet's exit was an error. Use sudo systemctl restart kubelet to restart it.
dead or inactive: The kubelet failed. To investigate the logs and determine the problem, use journalctl -u kubelet.

It also makes sense to check the kubelet logs if the node is NotReady but the kubelet service is running and has the required permissions; it might be erroring but not crashing.

Fix Kube-Proxy Problems: Look for any problems or warnings in the kube-proxy pod's logs. Verify that the kube-proxy as DaemonSet is set up properly. You can force a restart by deleting the kube-proxy pod if you see any problems. A new pod will be automatically created by the DaemonSet controller.

Checking Connectivity: Make that the required ports are open, examine the node's network setup, and confirm that the network plugins are installed and configured correctly. To check the node's network connectivity to other nodes or external endpoints, use commands like ping or traceroute.

Why choose Supportfly for Kubernetes Consulting services

Supportfly plays a key role in managing Kubernetes deployments, which may be a difficult and time-consuming operation. We handle all aspects of Kubernetes implementation and optimization, from planning and evaluation to cluster architecture, application containerization, security implementation, monitoring, and CI/CD integration.

SupportFly provides complete consulting services for Kubernetes. Our group of experts in Kubernetes can help you with:

Kubernetes Assessment And Planning
Kubernetes Cluster Design And Deployment
Application Containerization And Orchestration
Kubernetes Security And Governance
Kubernetes Monitoring And Performance Optimization
Continuous Integration And Delivery (CI/CD) With Kubernetes

Conclusion

It can be difficult to diagnose and address problems on a Kubernetes node that is in the "Not Ready" state, but with the appropriate strategy, you can rapidly resolve the issue. Before beginning the application of remedies, make sure to thoroughly follow the diagnostic procedures and investigate all potential core causes. Your node should go back to the "Ready" state and your Kubernetes cluster should start operating normally after the problem has been fixed. Remember to keep a close eye on your cluster's logs and metrics in order to identify and address any problems before they get out of hand.