$ guides / network / network-cloud-onprem-flow-correlation ▌

Operations Guides

Correlating cloud VPC flow logs with on-prem NetFlow

Cloud flow logs and on-prem flow records share the 5-tuple concept but diverge in nearly every dimension that matters for correlation: transport, latency, sampling, timestamps, topology, and NAT visibility. Cloud providers emit VPC flow logs via push to object storage with implicit sampling and aggregation intervals measured in minutes. On-premises devices export NetFlow v5/v9, IPFIX, or sFlow over UDP with configurable sampling and near-real-time delivery.

This gap is a recurring contributor to operational incidents. An attacker pivoting from a compromised cloud workload to on-prem via VPN is invisible across the boundary if no join exists between the two telemetry sources. The same gap hides legitimate operational issues: cross-boundary packet loss, asymmetric routing through cloud transit gateways, and NAT translation mismatches between cloud NAT and on-prem firewalls.

This reference covers the format incompatibilities, sampling semantics, timestamp behavior, and NAT opacity that make cross-environment flow correlation operationally difficult. It is aimed at operators building or maintaining a normalization layer across these two domains.

Why cloud and on-prem flow data resist correlation

Cloud flow logs are not flow records in the NetFlow/IPFIX sense. They are aggregated, sampled, and delivered asynchronously. There is no SNMP, no CDP, no LLDP. The flow record is the only native signal. Topology is the cloud provider’s graph. Delivery is push-based to object storage. Lag is minutes, not seconds. Sampling is implicit.

Dimension	On-prem NetFlow/IPFIX	Cloud flow logs
Transport	UDP push to collector	Push to object storage, polled by consumer
Latency	Near real-time (seconds)	Minutes (aggregation plus delivery)
Sampling	Explicit, configurable per exporter	Implicit, provider-controlled
Timestamps	Exporter clock (NTP-dependent)	Provider-aggregated start and end epochs
Topology	CDP/LLDP/FDB/ARP available	Cloud graph only
NAT visibility	Post-NAT at perimeter	Pre-NAT inside VPC (if pkt-* fields enabled)
Template or cache	NetFlow v9/IPFIX template cache	No template concept

flowchart LR
    CL["Cloud flow logs
AWS / GCP / Azure
push to object storage
minutes of lag"] -->|"timestamp skew
up to 60s (AWS)"| NORM["Normalization layer
windowed 5-tuple join
NAT translation logs
direction inference"]
    OP["On-prem NetFlow / IPFIX
UDP export
near real-time"] -->|"NTP-dependent
sampling-aware"| NORM
    NORM --> CORR["Cross-boundary
correlation
same conversation
different vantage points"]

Provider reference: cloud flow log semantics

Each cloud provider’s flow log format has distinct semantics that affect how records can be joined with on-prem data.

AWS VPC Flow Logs

AWS VPC Flow Logs aggregate captured packets into intervals. The default aggregation interval is 10 minutes, reducible to 1 minute. Format versions 2 through 11 exist, each adding fields without removing prior ones.

Core 5-tuple fields: srcaddr, dstaddr, srcport, dstport, protocol (IANA number). Additional fields include packets, bytes, start and end (Unix epoch seconds), action (ACCEPT or REJECT), and log-status.

For correlation through NAT gateways or EKS pods, pkt-srcaddr and pkt-dstaddr are essential. Without them, srcaddr and dstaddr reflect the translated IP, not the original. EKS pods have separate pod IPs from node ENI IPs. pkt-srcaddr exposes the pod IP while srcaddr shows the node ENI IP.

The log-status field distinguishes data gaps:

SKIPDATA: records were dropped internally by AWS due to capacity constraints. One SKIPDATA record can represent multiple uncaptured flows.
NODATA: no traffic on that ENI during the interval. Not a gap, but operators frequently confuse it with SKIPDATA.

The tcp-flags field is a bitmask aggregated across the entire aggregation interval: FIN=1, SYN=2, RST=4, SYN-ACK=18. A short-lived connection that opens and closes within a single aggregation interval may appear as a single record with combined flags (SYN+FIN = 3, SYN-ACK+FIN = 19). This makes TCP state machine reconstruction unreliable compared to on-prem NetFlow v5, which records a single OR’d flags value per flow representing the union of all flags seen during that flow’s lifetime. For packet-level TCP handshake analysis, neither cloud nor NetFlow records are a substitute for a packet capture.

The flow-direction field (added in version 5) resolves initiator ambiguity for AWS-side flows.

GCP VPC Flow Logs

GCP uses dual-stage sampling. The primary sampling rate is opaque and dynamic, varying with host load. The secondary rate is configurable from 0.0 to 1.0. Primary sampling is uncontrollable by the operator.

Aggregation intervals are configurable: 5 seconds, 30 seconds, 1 minute, 5 minutes, 10 minutes, or 15 minutes.

GCP flow logs do not indicate which endpoint initiated a flow. They identify packet direction relative to the interface only. This complicates correlation with on-prem NetFlow, which also lacks initiator context natively.

A critical asymmetry exists in firewall interaction:

Egress packets are sampled before egress firewall rules are evaluated. Denied packets can still appear in logs.
Ingress packets are sampled after ingress firewall rules are evaluated. Dropped packets are not logged.

This means GCP flow logs overestimate allowed egress and underestimate blocked ingress compared to on-prem firewalls that log both directions.

Azure NSG Flow Logs and VNet Flow Logs

Azure NSG Flow Logs version 2 introduces flow state tracking with B (Begin), C (Continue), and E (End) states, plus bidirectional byte and packet counters.

Byte and packet counts are not recorded for flows affected by non-default inbound rules. Reported totals will be lower than actual traffic for those flows. This affects chargeback correlation and volumetric validation.

Azure NSG Flow Logs are officially retiring on 30 September 2027. The successor is VNet Flow Logs, which operates at the virtual network level rather than per-NSG and captures platform-rule traffic that NSG flow logs miss. Teams staying on NSG flow logs should use version 2 only. Version 1 lacks byte and packet counters entirely.

Timestamp skew: the biggest correlation killer

Timestamp alignment is the single hardest problem in cross-environment flow correlation. Each source introduces skew differently.

AWS VPC Flow Logs: the start and end timestamps can be up to 60 seconds off from actual packet receipt or transmission. AWS documentation states these values might be either up to 60 seconds before the packet was received on the network interface or up to 60 seconds after.

GCP VPC Flow Logs: the primary sampling stage adds interpolation layers. Missed packets are compensated by interpolation from captured packets, which introduces additional timestamp uncertainty.

On-prem NetFlow/IPFIX: timestamps depend on exporter clock accuracy. Juniper has documented IPFIX timestamp inaccuracy on certain platforms where timestamps diverge from system time despite apparent NTP sync.

NetFlow v5: flow records do not carry absolute timestamps. Flow timing is derived from system uptime offsets (the FIRST and LAST fields relative to export uptime), which means accuracy depends on the exporter’s clock and uptime counter. NetFlow v9, IPFIX, and sFlow include timestamps that can be used directly to compute export-to-ingest latency.

The operational rule: always use a windowed join, never an exact timestamp match. A 2-minute tolerance is conservative for most paths. Five minutes is safer for high-latency cross-VPN paths where aggregation and delivery lag compound. Note that NTP drift between collectors is a separate problem from flow-log aggregation lag. Even sub-second NTP offset between a cloud flow log epoch and an on-prem NetFlow exporter clock will not matter if the join window is set to minutes, but it will compound with aggregation lag on paths where every second counts.

Sampling and completeness gaps

Cloud flow logs introduce data gaps that have no direct equivalent in on-prem flow collection.

AWS SKIPDATA: one SKIPDATA record can represent multiple uncaptured flows. The on-prem side shows traffic. The cloud side shows nothing. Treat any SKIPDATA record during a known traffic window as a data-quality finding.

GCP secondary sampling: when set below 1.0, flow entries are discarded randomly. Correlation against on-prem NetFlow will show missing entries proportional to the secondary sample rate. At 0.5 secondary sampling, expect roughly half the cloud-side flows to be absent.

Cloud delivery lag: cloud flow logs arrive via object storage poll, not real-time UDP push. The cloud side of a correlated view is always delayed relative to the on-prem side. Real-time alerting on cross-boundary patterns is not feasible with cloud flow logs alone. Retrospective correlation is the realistic use case.

NAT and identity correlation

NAT boundaries break 5-tuple joins. The endpoint IP inside the flow record is the NAT’d address. Security teams investigate the wrong host. Topology inference places the endpoint at the NAT device port, not the actual endpoint.

Inside the VPC: cloud flow logs may show pre-NAT IPs. AWS pkt-srcaddr and pkt-dstaddr expose the original pod or instance IP before NAT gateway translation. Without these fields, only the translated IP is visible.

At the perimeter: on-prem NetFlow typically reflects post-NAT IPs at the perimeter device. The cloud side may show pre-NAT IPs inside the VPC. The on-prem side shows the translated address as traffic exits the cloud.

To correlate across the NAT boundary, operators must normalize to whichever vantage point they are correlating from, and integrate NAT translation logs (cloud NAT logs, firewall session logs) into the enrichment pipeline. Without translation logs, identity recovery is impossible for flows that crossed the NAT boundary.

See locating endpoints behind NAT and wireless for the related problem of placing endpoints whose IP appears only behind a NAT device.

Direction ambiguity

Direction is ambiguous in both cloud and on-prem flow data, but in different ways.

GCP: no initiator direction at all. Flow logs identify packet direction relative to the interface, not which endpoint started the conversation.

AWS: the flow-direction field (version 5 and later) resolves this for AWS-side flows. Earlier versions lack it.

On-prem NetFlow: lacks initiator context natively. Flow direction is typically inferred from port assignment (low port = server) or from template metadata, not from packet inspection.

For cross-environment correlation, a flow initiated from cloud to on-prem appears as ingress on the on-prem side and egress on the cloud side. Without explicit direction fields, the join must rely on 5-tuple symmetry (same source and destination pair, ports swapped) rather than directional matching.

The normalization layer

Cross-domain correlation requires a normalization layer that translates cloud flow log formats and on-prem flow records into a common schema. Commercial platforms exist for this. For teams building their own normalization, the minimum requirements are:

Common timestamp field: normalize all timestamps to UTC epoch seconds. Apply a windowed join with tolerance appropriate to the path. Two to five minutes is the practical range for cross-cloud-to-on-prem paths.
Sampling-rate awareness: multiply sampled counts by the sampling rate for both cloud and on-prem sources. GCP’s opaque primary sampling rate means cloud-side byte and packet counts may be inherently unreliable for volumetric comparison against on-prem data.
NAT translation integration: join cloud NAT logs and on-prem firewall session logs into the enrichment pipeline to recover pre-NAT and post-NAT identity.
Direction normalization: infer conversation direction from 5-tuple symmetry rather than relying on provider-specific direction fields.
Completeness tracking: track SKIPDATA (AWS), sampling gaps (GCP), and template-cache misses (NetFlow v9/IPFIX) as data-quality signals, not just as absent records. A gap on one side with traffic on the other is itself a finding.

Signals to watch across the boundary

Signal	Why it matters	Warning sign
Timestamp offset between sources	Windowed joins fail silently when skew exceeds tolerance	Flows present on one side, absent on the other despite known traffic
Cloud log delivery lag	Prevents real-time cross-boundary alerting	Cloud records arrive 5 to 10 minutes after on-prem records for the same conversation
SKIPDATA or NODATA frequency	Indicates cloud-side data loss that creates false gaps in correlation	Sudden increase in SKIPDATA records during a traffic spike
GCP secondary sampling rate	Below 1.0, random flow entries are discarded	Correlation shows missing cloud entries proportional to the configured rate
NAT translation log retention	Without translation logs, pre-NAT identity is unrecoverable	Investigation window exceeds NAT log retention period
NTP offset on on-prem exporters	Drift compounds with aggregation lag to shift flow records outside the join window	Flow records from different devices do not align for the same event
Azure non-terminating flow counts	Byte and packet totals are silently absent for affected flows	Cloud-side volumetric totals consistently lower than on-prem for same path

How Netdata helps

Netdata can serve as the on-prem half of the correlation equation:

On-prem flow collection: Netdata collects NetFlow v5/v9, IPFIX, and sFlow data from network devices, providing the on-prem telemetry that cloud flow logs must be joined against.
Per-second metric resolution: Netdata’s collection frequency allows tight temporal correlation between on-prem flow data and contextual signals such as interface counters, BGP state, and syslog events.
Gap corroboration: when cloud flow logs show a gap, Netdata’s on-prem signals (interface utilization, error counters, discard counters) can confirm whether traffic actually flowed during the gap or whether the cloud-side absence reflects a real outage.
NTP monitoring: Netdata tracks NTP offset on collectors and can alert when clock skew exceeds thresholds that would compound with cloud-side aggregation lag.
UDP buffer health: for on-prem flow collectors, Netdata monitors Udp_RcvbufErrors and NIC RX drops, ensuring the on-prem half of the correlation is not silently losing data before it reaches storage.

Correlating cloud VPC flow logs with on-prem NetFlow

Correlating cloud VPC flow logs with on-prem NetFlow

Why cloud and on-prem flow data resist correlation

Provider reference: cloud flow log semantics

AWS VPC Flow Logs

GCP VPC Flow Logs

Azure NSG Flow Logs and VNet Flow Logs

Timestamp skew: the biggest correlation killer

Sampling and completeness gaps

NAT and identity correlation

Direction ambiguity

The normalization layer

Signals to watch across the boundary

How Netdata helps

Related guides