top of page

Rethinking Ethernet for AI

Read Below:

  • AI Demands a New Ethernet Fabric: As AI models scale, traditional Ethernet struggles with micro-packet inefficiencies and jitter. Broadcom’s Tomahawk Ultra redefines Ethernet with 250 ns per-hop switch latency, streamlined headers and in-network collective offloading.

  • Optimized for Circuit Designers: Tomahawk Ultra enables circuit designers to minimize redundant data movement, reduce buffer size and streamline ASIC/NIC development by leveraging compressed headers, lossless fabric and unified Ethernet-based infrastructure.

  • McKinsey Electronics Accelerates Adoption: By offering access to Tomahawk Ultra-compatible silicon, Smart NICs and FPGA solutions, McKinsey Electronics equips engineers across the Middle East, Africa and Türkiye with the tools to prototype, validate and scale next-gen AI interconnects efficiently and reliably.

ree

As AI models grow, so does the number of accelerators (GPUs, XPUs) that must coordinate closely across tens or hundreds of chips. These systems rely on ultra-fast, fine-grained communication patterns (e.g., all‑reduce, broadcast), where latency, even at nanosecond scales, can bottleneck performance. Traditional networks use large buffers and high throughput, but struggle with:

  • Micro‑packet overhead: small 64‑byte messages flooded with 46-byte headers

  • Traffic jitter: congestion and packet loss adding unpredictable latency

  • Communication bloat: collective patterns like MPI/SHMEM pushing data back and forth inefficiently


Circuit designers, particularly those working on AI NICs, on‑chip interconnects or co‑packaged optics, grapple with these constraints. Their goal? To engineer systems that reduce redundant data motion and latency while maximizing bandwidth utilization.


ree

Tomahawk Ultra

Broadcom’s Tomahawk Ultra (BCM78920 series) flips the script by treating Ethernet not just as a best‑effort backbone, but as a near-instant transport channel tailored for scale‑up AI clusters.


Key Innovations

  1. Ultra‑low latency, line‑rate

    Delivers 250 ns switch latency at full 51.2 Tbps, even with 64-byte packets, handling ~77 Bpps

  2. Streamlined headers

    Slashes header size from ~46 B to as little as 6–10 B using compressed headers fully compliant with Scale-up Ethernet (SUE) specs

  3. Lossless fabric via LLR + CBFC

    Implements hardware-level link retries and credit-based flow control, ensuring every packet arrives—similar reliability to InfiniBand and proprietary fabrics, but on Ethernet

  4. In‑network collectives

    Planned support for offloading operations such as all-reduce and broadcast from XPUs into the switch fabric itself—reducing redundant traffic and lowering compute overhead

  5. Pin‑compatible with Tomahawk 5

    Enables seamless upgrades in existing data center systems


Why Designers Should Care

  1. Efficient Small‑Packet Handling

    Circuit designers working on AI NICs and DPU fabrics can optimize for these compressed headers and high packet-per-second rates, reducing data bus stress and accelerating ASIC buffer design.

  2. Offload‑Friendly In‑Network Compute

    In‑network collectives let systems reduce redundant multipoint traffic. Designers can build NICs supporting SUE‑Lite header compression and flow‑control signaling to enable efficient coordination with Tomahawk Ultra fabrics.

  3. Enhanced Reliability

    Thanks to LLR and CBFC, less buffer is needed at NIC endpoints and accelerators, helping designers reduce die area and power on SerDes and local buffers.

  4. Unified Infrastructure

    Rather than maintaining separate Ethernet and proprietary fabrics (InfiniBand/NVLink), circuit designers can standardize on Ethernet interconnects, simplifying validation and boosting interoperability.


A Circuit Designer’s Practical Example: FPGA‑Smart NIC + Tomahawk Ultra

Recall Rui Ma et al.’s FPGA‑based AI Smart NIC that offloaded all‑reduce traffic to the NIC and boosted 6‑node performance by 1.6× and could scale 2.5× at 32‑nodes. Pairing that concept with Tomahawk Ultra enables:

  • Compressed header offloading: NIC FPGA preprocesses SUE headers at wire speed

  • Switch‐side collective offload: In‑network INC handles high‐order collectives instead of the NIC

  • Error resilience: Link retries and credit flow control relieve NIC from packet‑loss handling


This hybrid design lets circuit teams split functions intelligently between endpoint logic and fabric capabilities, thus minimizing latency and maximizing throughput for AI workloads.

ree

Tomahawk Ultra positions Ethernet as a strong contender for AI interconnects alongside InfiniBand and NVLink, pushing interconnect design into a unified, open‑standards ecosystem. Circuit designers can now:

  • Embed SUE‑Lite support in NIC ASICs and DPUs

  • Reduce complex memory buffers in accelerators

  • Simplify design flows with streamlined Ethernet logic

  • Leverage hardware‑based INC patterns to boost overall system efficiency


Tomahawk Ultra transforms a ubiquitous protocol into a powerful, low-latency, lossless fabric for AI and HPC. Circuit designers gain a powerful lever, steering efforts toward compact, efficient and interoperable designs that wouldn't be possible using older Ethernet models. It opens the door to designing next-gen accelerators and NICs built to thrive in this smarter, faster and more integrated networking world.


McKinsey Electronics supports this new change of AI-centric networking by providing access to cutting-edge Ethernet switch silicon, Smart NIC components and embedded FPGA solutions from leading top-tier manufacturers. Through its expansive line card, McKinsey Electronics empowers circuit designers and system architects across the Middle East, Africa and Türkiye to prototype, validate and scale high-performance interconnects such as those enabled by Tomahawk Ultra.

Whether you're designing next-gen DPUs, AI accelerators or edge inference platforms, McKinsey Electronics ensures reliable sourcing, technical guidance and regional supply chain continuity for AI-native infrastructure.


Sources

  • Broadcom press on Ultra Ethernet features: header compression, LLR, CBFC, INC, latency numbers.

  • Reuters and other coverage of Tomahawk Ultra launch and positioning vs NVIDIA/NVLink.

  • FPGA‑based Smart NICs for distributed training (all‑reduce offload).

 
 
bottom of page