$ guides / kubernetes / kubernetes-network-policy-debugging ▌

Operations Guides

Kubernetes NetworkPolicy debugging: when traffic is denied silently

A pod that could reach its dependency yesterday now times out today. There is no TCP RST, no ICMP unreachable, and often no application log. The packet is dropped in the CNI data plane. If a policy change, namespace reorganization, or cluster upgrade preceded the outage, you are likely dealing with silent NetworkPolicy denial. This guide shows how to confirm it, find the rule or semantic gap responsible, and restore connectivity without opening the cluster.

What this means

Kubernetes NetworkPolicy is an isolation mechanism, not a firewall that judges traffic. A pod is non-isolated by default: all ingress and egress traffic is allowed. Once any NetworkPolicy with Ingress in policyTypes selects a pod, that pod becomes isolated for ingress. Only traffic matching an explicit allow rule in a policy that selects the pod is permitted. Reply traffic is implicitly allowed, but the initial connection must be explicitly permitted. The same logic applies to egress when Egress appears in policyTypes.

Effects of multiple policies are additive. Order does not matter. For a connection to succeed, both the egress policy on the source pod and the ingress policy on the destination pod must allow it.

When a deny-all NetworkPolicy is defined, it is only guaranteed to deny TCP, UDP, and SCTP connections. Behavior for other protocols, such as ICMP, is undefined and varies by CNI plugin. Cilium, for example, blocks ICMP unless explicitly permitted.

Because enforcement happens in the CNI data plane, the sender typically sees a connection timeout. This silence makes NetworkPolicy a common cause of “mystery” connectivity outages.

Common causes

Cause	What it looks like	First thing to check
Missing DNS egress rule	Service discovery fails; curl by hostname hangs; no application error	Whether UDP/TCP port 53 to kube-dns is explicitly allowed
namespaceSelector targets unlabeled namespace	Cross-namespace traffic fails despite a policy “allowing namespace X”	`kubectl get namespace <name> --show-labels`
Empty `podSelector: {}` scope confusion	Operator assumes `{}` grants cross-namespace access, but it only matches pods in the policy’s own namespace	Whether a `namespaceSelector` is present alongside the `podSelector`
Omitted `policyTypes`	Egress rules exist but are ignored; traffic behavior does not match the manifest	The `policyTypes` field in the NetworkPolicy manifest
AWS VPC CNI port limit exceeded	Silent pod-to-pod failures after migrating from Calico to VPC CNI on EKS 1.30+	Number of port entries per selector; consolidate with `endPort`
hostNetwork pod bypass	Policies appear to have no effect for specific workloads	Whether the affected pod uses `hostNetwork: true`
ICMP denied by Cilium	Ping fails between pods even though TCP/UDP on the same path works	Cilium-specific ICMP allow rules

Quick checks

Use these checks to confirm silent NetworkPolicy denial.

# List all NetworkPolicies in the destination namespace
kubectl get networkpolicy -n <destination-ns> -o yaml

# Check which policies select the destination pod by its labels
kubectl get pods -n <destination-ns> --show-labels
# Then match against each policy's podSelector and namespaceSelector

# Test connectivity by pod IP to bypass Service DNAT
kubectl exec -n <source-ns> <source-pod> -- wget -qO- --timeout=5 http://<dest-pod-ip>:<port>

# Test DNS resolution from the source pod
kubectl exec -n <source-ns> <source-pod> -- nslookup <target-service>

# Verify namespace labels (namespaceSelector matches these, not pod labels)
kubectl get namespace <name> --show-labels

# For Calico: check Felix metrics for policy drops
kubectl exec -n kube-system <calico-node-pod> -- wget -qO- http://localhost:9091/metrics | grep felix_

# For Cilium: observe drops in real time
kubectl exec -n kube-system <cilium-pod> -- cilium monitor

What good looks like: Cross-namespace ingress policy should show a namespaceSelector with matching labels on the namespace object, and the destination pod must be selected by at least one policy that includes the source in its from rules. If no NetworkPolicy selects the destination pod, it is non-isolated and NetworkPolicy is not the cause.

How to diagnose it

Follow this flow to isolate the offending policy or CNI behavior.

Confirm the symptom is a silent drop. If application logs show “Connection refused,” the target port is not listening or a Service has no endpoints. If they show “NXDOMAIN,” the issue is DNS. A NetworkPolicy denial produces a timeout or hang with no response.
Determine if the destination pod is isolated. List all NetworkPolicies in the destination namespace. If any policy selects the destination pod via podSelector or namespaceSelector, the pod is isolated in the directions declared in policyTypes. If no policy selects it, look elsewhere.
Verify ingress allows the source. For an isolated destination, inspect every policy that selects it. Check whether any ingress rule permits the source. Remember that from requires both the source pod labels and, if cross-namespace, the namespace labels to match. A bare podSelector: {} inside an ingress rule only matches pods in the same namespace as the policy.
Verify egress allows the destination. Inspect policies in the source namespace. If the source is isolated for egress, check whether an egress rule permits the destination IP, pod labels, namespace labels, or CIDR. A connection requires both sides to agree.
Check the DNS egress trap. If the failure involves hostnames or Kubernetes Services, test DNS resolution from the source pod. Most default-deny or restrictive egress policies omit port 53. See the fix below.
Inspect namespace labels. If you use namespaceSelector in a rule, verify that the namespace object itself carries the expected labels. Most Kubernetes distributions do not label namespaces by default.
Validate policyTypes. If a policy contains egress rules but policyTypes omits Egress, the CNI may ignore the egress rules entirely. Always declare both directions explicitly.
Test CNI-specific behavior. If the above checks are correct but traffic still fails, verify your CNI. Flannel does not enforce NetworkPolicy. For hostNetwork pods, enforcement varies: some CNIs cannot distinguish hostNetwork traffic from node traffic and ignore selectors for those pods. In Cilium, fromCIDR/toCIDR rules only match non-pod endpoints; pod-to-pod traffic must use label selectors.

flowchart TD
    A[Pod cannot reach target] --> B{Connection refused or timeout?}
    B -->|Timeout| C[Check NetworkPolicies selecting source and destination]
    B -->|Refused| Z[Check Service endpoints and port binding]
    C --> D{Destination isolated?}
    D -->|No| E[Check CNI plugin enforcement capability]
    D -->|Yes| F{Ingress allows source?}
    F -->|No| G[Fix ingress rule or namespace labels]
    F -->|Yes| H{Egress allows destination?}
    H -->|No| I[Fix egress rule or CIDR scope]
    H -->|Yes| J{Failure involves hostnames?}
    J -->|Yes| K[Test DNS from source pod]
    K -->|Fails| L[Add DNS egress rule for port 53]
    J -->|No| M[Review CNI-specific behavior]

Metrics and signals to monitor

Signal	Why it matters	Warning sign
CNI plugin health (DaemonSet pod restarts)	Policy enforcement stops if the CNI agent crashes or is OOM killed	CNI pods restarting or stuck in `CrashLoopBackOff`
Pod-to-pod connectivity test failures	Direct confirmation of NetworkPolicy-like drops	Timeouts between known-healthy pods on specific nodes
DNS resolution latency/failures from workloads	The most common symptom of missing DNS egress rules	`nslookup` failures correlated with policy rollout
`felix_*` metrics (Calico)	Felix programs the rules; elevated drop metrics confirm policy denial	Increasing `felix_iptables_*` or policy-related drop counters
Cilium `DROP_POLICY_DENIED` events	Cilium annotates drops with a reason; this one confirms policy	`cilium monitor` output showing policy drops between source and dest
Cluster NetworkPolicy object count	Rapid growth increases collision risk and debugging surface	Sudden spikes in policy count without change management

Fixes

If the cause is missing DNS egress

Add an explicit egress rule to your default-deny or restrictive policy:

- to:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: kube-system
    podSelector:
      matchLabels:
        k8s-app: kube-dns
  ports:
  - protocol: UDP
    port: 53
  - protocol: TCP
    port: 53

Allow both TCP and UDP. Some DNS queries use TCP.

If the cause is namespaceSelector mismatch

Apply the expected label to the namespace object itself, or change the policy to match existing labels. Do not assume namespace names are labels; selectors operate on metadata labels.

If the cause is omitted policyTypes

Explicitly declare both directions in every policy:

policyTypes:
- Ingress
- Egress

Omitting a direction leaves it unregulated, which can be either too permissive or cause the CNI to ignore rules in that direction.

If the cause is AWS VPC CNI port limits

AWS VPC CNI limits each protocol in each selector to 24 unique port combinations. Reduce the port list or use endPort to specify ranges. If you migrated from Calico, audit existing policies for large port lists.

If the cause is hostNetwork or CNI bypass

For hostNetwork pods, do not rely solely on NetworkPolicy for isolation. Add node-level firewall rules or run the workload as a normal pod. If you use Flannel, be aware that NetworkPolicy objects are accepted by the API but never enforced.

If the cause is ICMP under Cilium

Add an explicit ICMP allow rule, or use CiliumNetworkPolicy with icmps rules if ICMP is required for your operational health checks.

Prevention

Always include DNS egress in any default-deny or restrictive egress policy. Service discovery depends on it, and its absence is the top cause of silent breakage.
Validate namespace labels before deploying policies that rely on namespaceSelector. Add labels to namespaces as part of namespace provisioning.
Declare policyTypes explicitly in every NetworkPolicy, even if the default behavior appears correct in testing.
Stage policies with real traffic before production. A policy that looks correct in a yaml linter can still deny critical control plane or sidecar traffic.
Monitor CNI health alongside application metrics. If the CNI agent is unhealthy, policy enforcement is inconsistent or absent.
Document cross-cluster behavior if you use Cilium Cluster Mesh. Cilium may restrict label-based selectors to the local cluster by default; remote cluster traffic may require explicit cluster label selectors.

How Netdata helps

Netdata surfaces the silent nature of these failures by correlating signals that application logs miss:

Correlate sudden drops in inter-pod network throughput with CNI plugin CPU, memory, or restart events.
Monitor DNS resolution latency at the node level to catch the DNS egress trap.
Track kernel conntrack utilization and drop rates when policy changes increase connection churn.
Map per-node network anomalies alongside Kubernetes workload events to identify the policy rollout that coincided with the first timeouts.

The Netdata solution

Kubernetes monitoring with Netdata

Netdata monitors Kubernetes with per-second metrics across the control plane, nodes, and every pod, with ML anomaly detection and zero per-pod configuration. Correlate API-server and etcd latency, kubelet PLEG stalls, scheduling pressure, and OOMKills in one place.

See Kubernetes monitoring → Start monitoring free

Kubernetes NetworkPolicy debugging: when traffic is denied silently

Kubernetes NetworkPolicy debugging: when traffic is denied silently

What this means

Common causes

Quick checks

How to diagnose it

Metrics and signals to monitor

Fixes

If the cause is missing DNS egress

If the cause is namespaceSelector mismatch

If the cause is omitted policyTypes

If the cause is AWS VPC CNI port limits

If the cause is hostNetwork or CNI bypass

If the cause is ICMP under Cilium

Prevention

How Netdata helps

Related guides

Kubernetes monitoring with Netdata