Kubernetes NetworkPolicy debugging: when traffic is denied silently

A pod that could reach its dependency yesterday now times out today. There is no TCP RST, no ICMP unreachable, and often no application log. The packet is dropped in the CNI data plane. If a policy change, namespace reorganization, or cluster upgrade preceded the outage, you are likely dealing with silent NetworkPolicy denial. This guide shows how to confirm it, find the rule or semantic gap responsible, and restore connectivity without opening the cluster.

What this means

Kubernetes NetworkPolicy is an isolation mechanism, not a firewall that judges traffic. A pod is non-isolated by default: all ingress and egress traffic is allowed. Once any NetworkPolicy with Ingress in policyTypes selects a pod, that pod becomes isolated for ingress. Only traffic matching an explicit allow rule in a policy that selects the pod is permitted. Reply traffic is implicitly allowed, but the initial connection must be explicitly permitted. The same logic applies to egress when Egress appears in policyTypes.

Effects of multiple policies are additive. Order does not matter. For a connection to succeed, both the egress policy on the source pod and the ingress policy on the destination pod must allow it.

When a deny-all NetworkPolicy is defined, it is only guaranteed to deny TCP, UDP, and SCTP connections. Behavior for other protocols, such as ICMP, is undefined and varies by CNI plugin. Cilium, for example, blocks ICMP unless explicitly permitted.

Because enforcement happens in the CNI data plane, the sender typically sees a connection timeout. This silence makes NetworkPolicy a common cause of “mystery” connectivity outages.

Common causes

CauseWhat it looks likeFirst thing to check
Missing DNS egress ruleService discovery fails; curl by hostname hangs; no application errorWhether UDP/TCP port 53 to kube-dns is explicitly allowed
namespaceSelector targets unlabeled namespaceCross-namespace traffic fails despite a policy “allowing namespace X”kubectl get namespace <name> --show-labels
Empty podSelector: {} scope confusionOperator assumes {} grants cross-namespace access, but it only matches pods in the policy’s own namespaceWhether a namespaceSelector is present alongside the podSelector
Omitted policyTypesEgress rules exist but are ignored; traffic behavior does not match the manifestThe policyTypes field in the NetworkPolicy manifest
AWS VPC CNI port limit exceededSilent pod-to-pod failures after migrating from Calico to VPC CNI on EKS 1.30+Number of port entries per selector; consolidate with endPort
hostNetwork pod bypassPolicies appear to have no effect for specific workloadsWhether the affected pod uses hostNetwork: true
ICMP denied by CiliumPing fails between pods even though TCP/UDP on the same path worksCilium-specific ICMP allow rules

Quick checks

Use these checks to confirm silent NetworkPolicy denial.

# List all NetworkPolicies in the destination namespace
kubectl get networkpolicy -n <destination-ns> -o yaml

# Check which policies select the destination pod by its labels
kubectl get pods -n <destination-ns> --show-labels
# Then match against each policy's podSelector and namespaceSelector

# Test connectivity by pod IP to bypass Service DNAT
kubectl exec -n <source-ns> <source-pod> -- wget -qO- --timeout=5 http://<dest-pod-ip>:<port>

# Test DNS resolution from the source pod
kubectl exec -n <source-ns> <source-pod> -- nslookup <target-service>

# Verify namespace labels (namespaceSelector matches these, not pod labels)
kubectl get namespace <name> --show-labels

# For Calico: check Felix metrics for policy drops
kubectl exec -n kube-system <calico-node-pod> -- wget -qO- http://localhost:9091/metrics | grep felix_

# For Cilium: observe drops in real time
kubectl exec -n kube-system <cilium-pod> -- cilium monitor

What good looks like: Cross-namespace ingress policy should show a namespaceSelector with matching labels on the namespace object, and the destination pod must be selected by at least one policy that includes the source in its from rules. If no NetworkPolicy selects the destination pod, it is non-isolated and NetworkPolicy is not the cause.

How to diagnose it

Follow this flow to isolate the offending policy or CNI behavior.

  1. Confirm the symptom is a silent drop. If application logs show “Connection refused,” the target port is not listening or a Service has no endpoints. If they show “NXDOMAIN,” the issue is DNS. A NetworkPolicy denial produces a timeout or hang with no response.

  2. Determine if the destination pod is isolated. List all NetworkPolicies in the destination namespace. If any policy selects the destination pod via podSelector or namespaceSelector, the pod is isolated in the directions declared in policyTypes. If no policy selects it, look elsewhere.

  3. Verify ingress allows the source. For an isolated destination, inspect every policy that selects it. Check whether any ingress rule permits the source. Remember that from requires both the source pod labels and, if cross-namespace, the namespace labels to match. A bare podSelector: {} inside an ingress rule only matches pods in the same namespace as the policy.

  4. Verify egress allows the destination. Inspect policies in the source namespace. If the source is isolated for egress, check whether an egress rule permits the destination IP, pod labels, namespace labels, or CIDR. A connection requires both sides to agree.

  5. Check the DNS egress trap. If the failure involves hostnames or Kubernetes Services, test DNS resolution from the source pod. Most default-deny or restrictive egress policies omit port 53. See the fix below.

  6. Inspect namespace labels. If you use namespaceSelector in a rule, verify that the namespace object itself carries the expected labels. Most Kubernetes distributions do not label namespaces by default.

  7. Validate policyTypes. If a policy contains egress rules but policyTypes omits Egress, the CNI may ignore the egress rules entirely. Always declare both directions explicitly.

  8. Test CNI-specific behavior. If the above checks are correct but traffic still fails, verify your CNI. Flannel does not enforce NetworkPolicy. For hostNetwork pods, enforcement varies: some CNIs cannot distinguish hostNetwork traffic from node traffic and ignore selectors for those pods. In Cilium, fromCIDR/toCIDR rules only match non-pod endpoints; pod-to-pod traffic must use label selectors.

flowchart TD
    A[Pod cannot reach target] --> B{Connection refused or timeout?}
    B -->|Timeout| C[Check NetworkPolicies selecting source and destination]
    B -->|Refused| Z[Check Service endpoints and port binding]
    C --> D{Destination isolated?}
    D -->|No| E[Check CNI plugin enforcement capability]
    D -->|Yes| F{Ingress allows source?}
    F -->|No| G[Fix ingress rule or namespace labels]
    F -->|Yes| H{Egress allows destination?}
    H -->|No| I[Fix egress rule or CIDR scope]
    H -->|Yes| J{Failure involves hostnames?}
    J -->|Yes| K[Test DNS from source pod]
    K -->|Fails| L[Add DNS egress rule for port 53]
    J -->|No| M[Review CNI-specific behavior]

Metrics and signals to monitor

SignalWhy it mattersWarning sign
CNI plugin health (DaemonSet pod restarts)Policy enforcement stops if the CNI agent crashes or is OOM killedCNI pods restarting or stuck in CrashLoopBackOff
Pod-to-pod connectivity test failuresDirect confirmation of NetworkPolicy-like dropsTimeouts between known-healthy pods on specific nodes
DNS resolution latency/failures from workloadsThe most common symptom of missing DNS egress rulesnslookup failures correlated with policy rollout
felix_* metrics (Calico)Felix programs the rules; elevated drop metrics confirm policy denialIncreasing felix_iptables_* or policy-related drop counters
Cilium DROP_POLICY_DENIED eventsCilium annotates drops with a reason; this one confirms policycilium monitor output showing policy drops between source and dest
Cluster NetworkPolicy object countRapid growth increases collision risk and debugging surfaceSudden spikes in policy count without change management

Fixes

If the cause is missing DNS egress

Add an explicit egress rule to your default-deny or restrictive policy:

- to:
  - namespaceSelector:
      matchLabels:
        kubernetes.io/metadata.name: kube-system
    podSelector:
      matchLabels:
        k8s-app: kube-dns
  ports:
  - protocol: UDP
    port: 53
  - protocol: TCP
    port: 53

Allow both TCP and UDP. Some DNS queries use TCP.

If the cause is namespaceSelector mismatch

Apply the expected label to the namespace object itself, or change the policy to match existing labels. Do not assume namespace names are labels; selectors operate on metadata labels.

If the cause is omitted policyTypes

Explicitly declare both directions in every policy:

policyTypes:
- Ingress
- Egress

Omitting a direction leaves it unregulated, which can be either too permissive or cause the CNI to ignore rules in that direction.

If the cause is AWS VPC CNI port limits

AWS VPC CNI limits each protocol in each selector to 24 unique port combinations. Reduce the port list or use endPort to specify ranges. If you migrated from Calico, audit existing policies for large port lists.

If the cause is hostNetwork or CNI bypass

For hostNetwork pods, do not rely solely on NetworkPolicy for isolation. Add node-level firewall rules or run the workload as a normal pod. If you use Flannel, be aware that NetworkPolicy objects are accepted by the API but never enforced.

If the cause is ICMP under Cilium

Add an explicit ICMP allow rule, or use CiliumNetworkPolicy with icmps rules if ICMP is required for your operational health checks.

Prevention

  • Always include DNS egress in any default-deny or restrictive egress policy. Service discovery depends on it, and its absence is the top cause of silent breakage.
  • Validate namespace labels before deploying policies that rely on namespaceSelector. Add labels to namespaces as part of namespace provisioning.
  • Declare policyTypes explicitly in every NetworkPolicy, even if the default behavior appears correct in testing.
  • Stage policies with real traffic before production. A policy that looks correct in a yaml linter can still deny critical control plane or sidecar traffic.
  • Monitor CNI health alongside application metrics. If the CNI agent is unhealthy, policy enforcement is inconsistent or absent.
  • Document cross-cluster behavior if you use Cilium Cluster Mesh. Cilium may restrict label-based selectors to the local cluster by default; remote cluster traffic may require explicit cluster label selectors.

How Netdata helps

Netdata surfaces the silent nature of these failures by correlating signals that application logs miss:

  • Correlate sudden drops in inter-pod network throughput with CNI plugin CPU, memory, or restart events.
  • Monitor DNS resolution latency at the node level to catch the DNS egress trap.
  • Track kernel conntrack utilization and drop rates when policy changes increase connection churn.
  • Map per-node network anomalies alongside Kubernetes workload events to identify the policy rollout that coincided with the first timeouts.