Rethinking Ethernet for AI

jenniferg17
Oct 8
3 min read

Read Below:

AI Demands a New Ethernet Fabric: As AI models scale, traditional Ethernet struggles with micro-packet inefficiencies and jitter. Broadcom’s Tomahawk Ultra redefines Ethernet with 250 ns per-hop switch latency, streamlined headers and in-network collective offloading.
Optimized for Circuit Designers: Tomahawk Ultra enables circuit designers to minimize redundant data movement, reduce buffer size and streamline ASIC/NIC development by leveraging compressed headers, lossless fabric and unified Ethernet-based infrastructure.
McKinsey Electronics Accelerates Adoption: By offering access to Tomahawk Ultra-compatible silicon, Smart NICs and FPGA solutions, McKinsey Electronics equips engineers across the Middle East, Africa and Türkiye with the tools to prototype, validate and scale next-gen AI interconnects efficiently and reliably.

As AI models grow, so does the number of accelerators (GPUs, XPUs) that must coordinate closely across tens or hundreds of chips. These systems rely on ultra-fast, fine-grained communication patterns (e.g., all‑reduce, broadcast), where latency, even at nanosecond scales, can bottleneck performance. Traditional networks use large buffers and high throughput, but struggle with:

Micro‑packet overhead: small 64‑byte messages flooded with 46-byte headers
Traffic jitter: congestion and packet loss adding unpredictable latency
Communication bloat: collective patterns like MPI/SHMEM pushing data back and forth inefficiently

Circuit designers, particularly those working on AI NICs, on‑chip interconnects or co‑packaged optics, grapple with these constraints. Their goal? To engineer systems that reduce redundant data motion and latency while maximizing bandwidth utilization.

Tomahawk Ultra

Broadcom’s Tomahawk Ultra (BCM78920 series) flips the script by treating Ethernet not just as a best‑effort backbone, but as a near-instant transport channel tailored for scale‑up AI clusters.

Key Innovations

Ultra‑low latency, line‑rate
Delivers 250 ns switch latency at full 51.2 Tbps, even with 64-byte packets, handling ~77 Bpps
Streamlined headers
Slashes header size from ~46 B to as little as 6–10 B using compressed headers fully compliant with Scale-up Ethernet (SUE) specs
Lossless fabric via LLR + CBFC
Implements hardware-level link retries and credit-based flow control, ensuring every packet arrives—similar reliability to InfiniBand and proprietary fabrics, but on Ethernet
In‑network collectives
Planned support for offloading operations such as all-reduce and broadcast from XPUs into the switch fabric itself—reducing redundant traffic and lowering compute overhead
Pin‑compatible with Tomahawk 5
Enables seamless upgrades in existing data center systems

Why Designers Should Care

Efficient Small‑Packet Handling
Circuit designers working on AI NICs and DPU fabrics can optimize for these compressed headers and high packet-per-second rates, reducing data bus stress and accelerating ASIC buffer design.
Offload‑Friendly In‑Network Compute
In‑network collectives let systems reduce redundant multipoint traffic. Designers can build NICs supporting SUE‑Lite header compression and flow‑control signaling to enable efficient coordination with Tomahawk Ultra fabrics.
Enhanced Reliability
Thanks to LLR and CBFC, less buffer is needed at NIC endpoints and accelerators, helping designers reduce die area and power on SerDes and local buffers.
Unified Infrastructure
Rather than maintaining separate Ethernet and proprietary fabrics (InfiniBand/NVLink), circuit designers can standardize on Ethernet interconnects, simplifying validation and boosting interoperability.

A Circuit Designer’s Practical Example: FPGA‑Smart NIC + Tomahawk Ultra

Recall Rui Ma et al.’s FPGA‑based AI Smart NIC that offloaded all‑reduce traffic to the NIC and boosted 6‑node performance by 1.6× and could scale 2.5× at 32‑nodes. Pairing that concept with Tomahawk Ultra enables:

Compressed header offloading: NIC FPGA preprocesses SUE headers at wire speed
Switch‐side collective offload: In‑network INC handles high‐order collectives instead of the NIC
Error resilience: Link retries and credit flow control relieve NIC from packet‑loss handling

This hybrid design lets circuit teams split functions intelligently between endpoint logic and fabric capabilities, thus minimizing latency and maximizing throughput for AI workloads.

Tomahawk Ultra positions Ethernet as a strong contender for AI interconnects alongside InfiniBand and NVLink, pushing interconnect design into a unified, open‑standards ecosystem. Circuit designers can now:

Embed SUE‑Lite support in NIC ASICs and DPUs
Reduce complex memory buffers in accelerators
Simplify design flows with streamlined Ethernet logic
Leverage hardware‑based INC patterns to boost overall system efficiency

Tomahawk Ultra transforms a ubiquitous protocol into a powerful, low-latency, lossless fabric for AI and HPC. Circuit designers gain a powerful lever, steering efforts toward compact, efficient and interoperable designs that wouldn't be possible using older Ethernet models. It opens the door to designing next-gen accelerators and NICs built to thrive in this smarter, faster and more integrated networking world.

McKinsey Electronics supports this new change of AI-centric networking by providing access to cutting-edge Ethernet switch silicon, Smart NIC components and embedded FPGA solutions from leading top-tier manufacturers. Through its expansive line card, McKinsey Electronics empowers circuit designers and system architects across the Middle East, Africa and Türkiye to prototype, validate and scale high-performance interconnects such as those enabled by Tomahawk Ultra.

Whether you're designing next-gen DPUs, AI accelerators or edge inference platforms, McKinsey Electronics ensures reliable sourcing, technical guidance and regional supply chain continuity for AI-native infrastructure.

Sources

Broadcom press on Ultra Ethernet features: header compression, LLR, CBFC, INC, latency numbers.
Reuters and other coverage of Tomahawk Ultra launch and positioning vs NVIDIA/NVLink.
FPGA‑based Smart NICs for distributed training (all‑reduce offload).

Rethinking Ethernet for AI

Recent Posts

QUICK LINKS

CONTACT