Understanding Network And Internet Latency
Many network operators, engineers, and administrators often overlook several critical factors when troubleshooting issues related to throughput and latency. A common method used to test throughput involves transferring a large file and measuring the time it takes to complete the transfer. However, this method typically measures goodput—the actual useful data transmitted—rather than the maximum theoretical throughput. Goodput is inherently lower due to protocol overhead and other contributing variables, which can mislead operators into believing the link is underperforming.
Introduction
At first glance, network performance analysis might seem straightforward. Consider a client and a server connected via a high-performance Layer 2 switch (e.g., a Cisco Catalyst 9300 or equivalent modern switch). One might expect file transfers between the two devices to achieve near line-rate speeds—such as 1Gbps, 10Gbps, or 100Gbps—especially on an idle link. While this expectation is logical, real-world results often fall short due to a variety of underlying factors.
In this discussion, we’ll explore the foundational concepts impacting throughput and latency, focusing on Ethernet, TCP, Bandwidth-Delay Product (BDP), and Long Fat Networks (LFNs). With over a decade of experience in a company that transitioned from an ISP (supporting dial-up, DSL, and web hosting) to a managed hosting and colocation provider, I’ve handled numerous support cases involving misinterpreted traceroutes, latency anomalies, and confusion over bandwidth and throughput performance.
Key Factors Affecting TCP/IP Throughput
The actual throughput achievable over a network link—particularly one utilizing TCP/IP—is influenced by several interdependent variables:
- End-to-End Latency
- Link Speed (1Gbps, 10Gbps, 100Gbps, or 1Tbps)
- Propagation, Serialization, and Queuing Delay
- TCP Window Size (RWIN), Window Scaling, and Congestion Control (e.g., CTCP)
- Network Congestion
- Maximum Transmission Unit (MTU) and Maximum Segment Size (MSS)
- Protocol Overhead (Layers 2, 3, and 4)
- Ethernet Frame Efficiency
- Link Reliability and Error Rates
Accurately diagnosing performance bottlenecks requires understanding how each of these factors interacts across the entire data path—not just at a single hop. In the sections that follow, we’ll break down each component and examine its role in shaping end-to-end performance.
Layer 2, 3, and 4 Overhead
Ethernet remains the dominant networking standard in modern infrastructures. While we won’t delve into the origins of the protocol, it’s important to understand that Ethernet frame size plays a key role in performance. Larger frames are more efficient than smaller ones, particularly as network speeds increase. This is why, where possible, configuring for a higher MTU is beneficial. Frames exceeding 1,500 bytes of payload are considered jumbo frames, and while the conventional jumbo size is 9,000 bytes, variations exist depending on vendor and environment.
As we’ll explore in the TCP section, the actual data size within a TCP segment is influenced by both the MTU and protocol overhead. For instance, if the MTU is set to 1500 bytes, then accounting for 20 bytes of IP header and 20 bytes of TCP header (assuming no options or timestamps), the Maximum Segment Size (MSS) becomes 1460 bytes.
The Good, the Bad, and the Ugly of Jumbo Frames
The topic of jumbo frames has sparked countless technical discussions. While jumbo frames offer performance benefits in certain environments—like high-speed LANs, database replication, or NFS-based storage—they provide limited value for typical Internet traffic. The majority of web traffic (e.g., HTTP/HTTPS) consists of smaller payloads, often in the 64–512 byte range, meaning jumbo frames won’t offer a noticeable boost.
That said, many modern ISPs do utilize jumbo frames on their high-capacity 10Gbps, 100Gbps, or even 1Tbps backbone networks. Similarly, backend environments in datacenters—especially those optimized for performance—may use jumbo frames to reduce CPU overhead and improve throughput.
Ethernet Frame Overhead
A typical Ethernet frame, excluding VLAN tagging, contains:
- 6 bytes Destination MAC
- 6 bytes Source MAC
- 2 bytes EtherType/Length
- 4 bytes Frame Check Sequence (FCS)
This adds up to 18 bytes of Layer 2 overhead. When a VLAN tag is used, an additional 4 bytes are included, bringing the total to 22 bytes.
Physical Layer Considerations: Preamble and IFG
The preamble and IFG are mandated components that add an additional 20 bytes (8 + 12 = 20) of overhead per frame. This is important when calculating Ethernet frame efficiency—you must include it. Why? Because it’s hard-coded into the Ethernet spec across all speeds: 1Gbps, 10Gbps, 100Gbps, and 1Tbps.
MTU and MSS Efficiency
Before continuing further, we need to briefly cover MTU and MSS. Again, I won’t go too deep here—this assumes you already have a baseline understanding. MTU (Maximum Transmission Unit) is often referred to as the packet size, and it defines the largest amount of data that can be transmitted in one physical frame. MSS (Maximum Segment Size) refers to the largest TCP payload segment that can be delivered at Layer 4.
Let’s now assume you’re sending only 64-byte frames across a 1Gbps network. These are minimum-sized Ethernet frames, and they already include the Ethernet header/trailer. The IP payload in this case is only 46 bytes.
How many 64-byte frames can I send per second on a 1Gbps link?
- 1Gbps = 1,000,000,000 bits/sec
- Convert to bytes: 1,000,000,000 / 8 = 125,000,000 bytes/sec
- Include preamble + IFG (20 bytes), so total frame size = 64 + 20 = 84 bytes
- 125,000,000 / 84 = 1,488,095.2 pps
Efficiency:
- Preamble and IFG (20 bytes): 1,488,095.2 * 20 * 8 = 238,095,232 bps → 23.8%
- Ethernet header/trailer (18 bytes): 1,488,095.2 * 18 * 8 = 214,285,680 bps → 21.4%
So ~45% of your bandwidth is overhead, leaving only 54% for actual data.
Now consider 1500-byte frames:
- Frame size with overhead: 1500 + 38 = 1538 bytes
- 125,000,000 / 1538 = 81,274.5 pps
- Effective data per frame (MSS): 1460 bytes
- Throughput = 81,274.5 * 1460 * 8 = ~949 Mbps
Overhead:
- Layer 2 overhead (38 bytes): ~2.4%
- Layer 3 & 4 overhead (40 bytes): ~2.6%
Which means ~5% total protocol overhead and 95% wire efficiency. That’s a massive difference in performance depending on frame size—even more so as you scale to 10Gbps and beyond.
Propagation, Serialization, and Queuing Delay
The combination of propagation, serialization, and queuing often creates a complex and variable latency profile. Serialization and queuing delays are negligible in high-speed networks (1Gbps and above) and are rarely considered in most real-world latency calculations.
Serialization delay refers to the time required for a networking device to place a packet onto the transmission medium. For example, a 1500-byte packet (12,000 bits) transmitted over a legacy T1 line (1.536Mbps) would have a serialization delay of:
Serialization Delay = (MSS + headers) * 8 / Link Speed
= 12000 / 1,536,000 = 7.8ms
The same packet on a 1Gbps link:
12000 / 1,000,000,000 = 0.012ms
As shown, serialization is significant on low-speed links, but trivial on high-speed modern infrastructure.
Routers process packets sequentially. If a router receives packets faster than it can forward them, it queues them—this is queuing delay. Circuit utilization should be viewed as an average over time. A 10Gbps link at 70% average utilization still means the interface is either fully in use or idle at any given microsecond. If a packet arrives when the egress interface is occupied, it waits in the queue until the previous transmission completes.
In most backbone networks, queuing delay only becomes measurable at sustained utilization above 90–95%, where congestion occurs.
Propagation Delay (PD)
This is the most important fixed contributor to latency. Propagation delay is a function of physics—specifically the speed of light through a medium. Light travels fastest in a vacuum (~186,000 miles/sec). In fiber, it slows to about 68% of that due to the refractive index of the core material (~1.48):
Speed in fiber = 186,000 * 0.68 = 126,480 miles/sec
This means approximately:
1ms of delay ≈ every 125 miles of fiber
Example: Denver to Seattle
- Distance: 1327 miles
- Speed in fiber: ~126,480 miles/sec
- One-way delay: 1327 / 126,480 = 10.4ms
- Round-trip time: ~20.8ms theoretical
Actual ping latency: ~28ms — the additional delay is mostly due to router processing, rate-limiting ICMP, or minimal queuing.
A key point: If traceroute shows high latency on one hop but not the next, it doesn't always indicate a fault. That router may deprioritize ICMP or be momentarily busy. True latency problems persist or worsen with each successive hop.
Summary Rule of Thumb:
Every 125 miles of fiber ≈ 1ms latency (excluding congestion or device processing delays)
End-to-End Network Latency
In simple terms, end-to-end latency is the total delay encountered when a packet travels from one host to another and back. This round-trip time (RTT) is critical because TCP uses RTT and the window size to determine the sender’s transmission rate. Just because your bandwidth is, say, 10Gbps, doesn’t mean you’ll realize 9.4Gbps of throughput—protocol and Ethernet overhead play a key role.
End-to-end latency is the cumulative delay across all hops, including:
- Serialization Delay (fixed)
- Queuing and Processing Delay (variable)
- Propagation Delay (fixed, physics-based)
All of these influence the effective throughput, especially for TCP.
For high-latency paths, the Bandwidth-Delay Product (BDP) becomes the limiting factor if TCP window sizes are too small. Fortunately, modern operating systems—such as Windows 10/11, macOS, and current Linux distributions—implement CTCP, window scaling, and receive window auto-tuning. These features dynamically scale the TCP window based on path latency and capacity. Legacy systems, including outdated Linux kernels (e.g., 2.4) and pre-Vista Windows releases (e.g., XP, 2000), do not support these features and require manual RWIN tuning.
Window Size and Scaling
TCP has evolved to include enhancements for high-latency, high-bandwidth networks, commonly called Long Fat Networks (LFNs). Satellite and RF-based networks also fall into this category due to inherently long delays.
The TCP window defines how much unacknowledged data can be in flight. If the window size is too small relative to latency, performance degrades. For example, using a 32KB to 128KB receive window allows reception of 21–85 packets (assuming a 1500-byte MTU) before requiring an acknowledgment:
32,000 / 1500 ≈ 21 packets
128,000 / 1500 ≈ 85 packets
What’s the optimal window size? It depends on latency.
Bandwidth-Delay Product (BDP)
BDP is the amount of data required to fill a network path at full capacity. It’s calculated as:
BDP = Bandwidth (bytes/sec) × Latency (seconds)
Example 1: Two hosts on a 1Gbps switch, 1ms latency
1,000,000,000 / 8 = 125,000,000 bytes/sec
BDP = 125,000,000 × 0.001 = 125,000 bytes
Example 2: 1Gbps between Seattle and Denver, 192ms RTT
BDP = 125,000,000 × 0.192 = 24,000,000 bytes = 24MB
To fully utilize the path, the TCP window size needs to match the BDP. But if a host has a fixed RWIN (like 17,520 bytes in legacy systems), the max throughput would be:
Throughput = RWIN / RTT
= 17,520 / 0.192 ≈ 91,250 bytes/sec ≈ 730Kbps
Even without packet loss, TCP won’t exceed this throughput due to window limitations. This is why understanding latency is critical for tuning performance.
Lastly, remember: TCP interprets packet loss as congestion, triggering window size reduction and further throttling.
There are many articles online about tuning TCP on various OS platforms. The information I’ve provided here should give you a foundational understanding of network throughput behavior over high-latency, high-bandwidth links—with a continued focus on Ethernet and TCP performance.