Infrastructure

How We Achieved Sub-20ms Latency Across 14 Global Regions

Sub-20ms isn't a marketing number. It's an architecture target that requires specific decisions at every layer of the stack.

Article Image gamestackhq.com article-image

Every hosting provider claims low latency. Most of them measure it from their data center to their own probes, under ideal conditions, on days when the network isn't congested. That's not the number that matters to your players.

The number that matters is one-way server latency from a player's machine to your game server, at 8pm on Saturday, during a seasonal event spike. That's what we optimized for, and here's the architecture that gets us there.

Start with node placement, not hardware

Raw hardware speed has diminishing returns past a certain point. Network distance — actual physical distance between your server and the player — doesn't improve with better CPUs. Light travels at roughly 200km/ms through fiber. There is no software optimization that overcomes 1,200km of cable between your server and your player.

Our 14-region network was designed from geographic population density maps and internet exchange point locations, not from whatever regions a cloud provider happened to offer. We specifically identified underserved markets — Southeast Asia, Brazil, Middle East — where major infrastructure tends to be concentrated in a single city, and deployed to PoPs that get closer to actual player location.

This single decision — optimizing node placement rather than node specs — cut baseline latency for roughly 22% of our player population by 15–40ms compared to the prior architecture. No code changes. Just geography.

UDP and protocol optimization

TCP is wrong for real-time game state. It's reliable delivery, ordered packets, retransmission on loss. For web content, that's fine. For game state, a retransmitted packet from 80ms ago is worse than a dropped packet — it delays all subsequent data while the TCP stack waits for the retransmission to complete.

All game traffic on our network runs UDP with a custom reliability layer on top. We implement selective reliability: match-critical events (damage, kills, ability activations) get acknowledged and retransmitted. Position updates don't. If a position packet drops, the next one will arrive within one tick interval anyway. The cost of retransmitting it is higher than the cost of interpolating over the gap.

On top of this, we run sequence numbering and delta compression on game state. Clients send only state changes from their last acknowledged snapshot, not full state. A typical position + rotation + velocity update compresses to under 40 bytes with this approach. At 64 ticks per second with 20 players, that's roughly 50KB/s of outbound game state per connected client — well within any reasonable bandwidth budget.

The routing problem at scale

BGP routing was never designed for latency-sensitive real-time traffic. It's optimized for capacity and resilience, not speed. Standard internet routing between a player in São Paulo and a server in Miami might route through New York, adding 30–40ms that the physical distance doesn't require.

We solved this with dedicated backbone interconnects between regional nodes and private peering arrangements at major internet exchanges. Traffic between our PoPs doesn't touch the public internet — it goes over controlled paths with known, consistent latency characteristics.

The difference matters most for cross-region traffic and for regions with historically poor BGP routing. Southeast Asia in particular has public routing that often adds 60–80ms of unnecessary latency. Private peering in Singapore, Tokyo, and Mumbai knocked that down to under 30ms additional for cross-region matches.

Kernel-level network tuning

Application-level optimization can only go so far. The Linux kernel's default network stack is tuned for general-purpose workloads. Game server traffic — high packet rate, small payloads, strict latency requirements — benefits significantly from kernel tuning.

Specific changes we run on all game server nodes:

Reduced socket buffer sizes. Default kernel buffers are sized for throughput, not latency. Oversized buffers mean packets queue longer before being processed. For game traffic, smaller buffers with immediate processing reduce queueing delay by 2–4ms.

CPU affinity pinning for network interrupt handlers. By default, interrupt handling can move between CPU cores as the scheduler sees fit. Pinning network interrupts to dedicated cores eliminates cross-core latency from cache misses during interrupt context switches.

Disabled Nagle's algorithm. TCP_NODELAY is standard for game servers, but many developers don't realize the UDP equivalent — delayed send coalescing — also needs to be disabled explicitly on some kernel versions.

Together, these changes typically reduce 95th percentile processing latency by 3–7ms on busy servers. It's not dramatic, but when you're targeting sub-20ms, every millisecond in the budget matters.

What sub-20ms actually means in practice

We measure latency at the 95th percentile, not average. Averages hide the tail behavior that players actually experience. A server averaging 12ms with P95 at 47ms is worse than a server averaging 16ms with P95 at 22ms, from a player experience standpoint.

Our current P95 numbers by region: NA-West at 17ms, NA-East at 19ms, EU-West at 16ms, EU-Central at 18ms, APAC-Japan at 15ms, APAC-SEA at 23ms, Brazil at 21ms. SEA and Brazil are above 20ms P95 — we're still working on those. The remaining 10 regions all sit under 20ms at P95 under normal load.

Under load spike — which we define as 3x normal concurrent for a given region — P95 numbers typically increase 3–6ms as queueing delays increase. That's acceptable. What's not acceptable is P99 behavior, which we gate at 50ms maximum under any load condition short of hardware failure.

The honest answer

Sub-20ms latency requires infrastructure investment, geographic distribution, and protocol-level discipline. You can't buy your way there with better hardware alone. And you can't claim it without measuring honestly — at the percentile that matters, under load, from actual player locations, not from your monitoring stack in the same data center as the server.

14 regions. Sub-20ms P95. Verify it yourself.

Run our latency test from your location. If the numbers don't hold up, we'll tell you why and what we're doing about it.

Start your free trial