GPU fabrics don't forgive jitter

A training job is only as fast as its slowest GPU finishing a collective. That single sentence explains most of what makes AI networking different from everything that came before it.

Collectives turn the network into a barrier

When 256 GPUs run an all-reduce, every rank waits for the last byte before the next step begins. There is no “mostly done.” A microburst that delays one flow by two milliseconds doesn’t slow that flow — it stalls the entire job for two milliseconds, on every device.

The uncomfortable truth: tail latency is the only latency that matters on a GPU fabric. Your p50 is a vanity metric.

Where the buffers go

The math is unforgiving. To absorb a burst without dropping, a switch needs buffer proportional to bandwidth × round-trip time — and at 400G that number gets large fast:

Shallow-buffer switches drop, and a drop on a lossless fabric triggers pause, which spreads.
Deep-buffer switches add latency, which is the thing you were trying to avoid.
The answer is almost never “more buffer.” It’s PFC, ECN, and a congestion-control loop tuned for the fabric.

Get the buffer and back-pressure story right and the fabric disappears into the background — which is exactly where it belongs.

Written by the 2× CCIE

Enterprise → cloud → AI networking. I write the breakdowns I wish I’d had. New field notes roughly twice a month.

GPU fabrics don't forgive jitter

Collectives turn the network into a barrier

Where the buffers go

Written by the 2× CCIE

More writing

What actually happens when you peer two VPCs

Cloud Architecture Battlegrounds AWS vs. Azure