all writing ai networking

GPU fabrics don't forgive jitter

A training job is only as fast as its slowest GPU finishing a collective. That single sentence explains most of what makes AI networking different from everything that came before it.

Collectives turn the network into a barrier

When 256 GPUs run an all-reduce, every rank waits for the last byte before the next step begins. There is no “mostly done.” A microburst that delays one flow by two milliseconds doesn’t slow that flow — it stalls the entire job for two milliseconds, on every device.

The uncomfortable truth: tail latency is the only latency that matters on a GPU fabric. Your p50 is a vanity metric.

Where the buffers go

The math is unforgiving. To absorb a burst without dropping, a switch needs buffer proportional to bandwidth × round-trip time — and at 400G that number gets large fast:

Get the buffer and back-pressure story right and the fabric disappears into the background — which is exactly where it belongs.

Written by the 2× CCIE

Enterprise → cloud → AI networking. I write the breakdowns I wish I’d had. New field notes roughly twice a month.

keep reading

More writing