Ethernet supports data transfer rates from 10Mbps right up
to 800Gbps (Gigabit per second), with 1.6Tbps (Terabits per second)
arriving soon. These speeds are crucial for handling the massive
datasets that AI typically utilizes.
- Real-Time Responsiveness: Low latency is essential for AI
systems. Ethernet minimizes delays, ensuring timely interactions
between components like GPUs, CPUs, and storage devices.
- Real-Time Decision-Making: Ethernet enables real-time
AI-driven decision-making. Its high bandwidth ensures efficient
communication between AI nodes.
- Lossless Networking: Traditional Ethernet may drop packets
during congestion, affecting AI model accuracy. However, emerging
technologies promise “lossless” transmission, ensuring data
integrity even under heavy loads.
- Scalability: As AI models grow in complexity, scalable
infrastructure becomes vital. Ethernet allows seamless expansion by
connecting additional servers and devices. Ethernet accommodates
their exponential growth, ensuring efficient connectivity and data
exchange.
- Standards-based Interoperability: Ensuring low-latency and
lossless performance is essential for AI applications wanting to
maximize the benefits of terabit Ethernet. Teledyne LeCroy therefore
supports both the Ultra Ethernet Consortium (UEC)
specification and IEEE standards.
- AI & UE Solution Track:
Dedicated set of licensed features delivering the most advanced test
capabilities for verifying Ultra Ethernet being used for AI
applications.
(learn more)
Ultra Ethernet is designed to meet the special needs of AI &
HPC environments, including synchronized traffic bursts, ultra‑low
latency, fast loss recovery, and predictable performance at speeds up to
1.6Tbps (using 224G SerDes) and 3.2Tbps (using 448G SerDes) on its way.
Ultra Ethernet enhances the architecture of standard Ethernet with
initiatives such as link‑layer retry, advanced congestion handling, and
AI‑optimized transport behavior, to ensure more deterministic
performance under real AI traffic conditions.
Validating UE networks requires deep, protocol‑aware testing at the
frame, symbol, and fabric level. From capability discovery using
UE‑specific LLDP extensions, to localized loss recovery and intelligent
flow control, Ultra Ethernet testing focuses on ensuring that devices
interoperate correctly and consistently in demanding, large‑scale AI and
HPC deployments.
Teledyne LeCroy offers advanced hardware and software solutions for
testing Ultra Ethernet being used for networks running AI applications.
These includes the Z800 Freya and Z1608 Edun traffic generators that can
generate traffic at speeds up to 800Gbps and 1.6Tbps using both 112G
SerDes and 224G SerDes, and the SierraNet M1288 protocol analyzer for
full line-rate capture and advanced jamming capabilities up to 800Gbps.
In addition to the comprehensive test features provided as standard with
these devices, there is also the Xena AI & UE Solution Track which adds
UE‑specific Link Layer behaviors required for validating UEC‑enabled
switches and xPUs. These include:
-
Link Layer Retry (LLR)
-
Credit‑Based Flow Control (CBFC)
-
Link Layer Negotiation (LLDP for UE extensions)
-
Stateful UE protocol control and message exchange
-
UE error injection, message inspection, and capture
To learn more about how scale‑up and scale‑out Ethernet architectures
differ, why Ultra Ethernet matters, and what it takes to test
next‑generation AI networks with confidence, see
https://xenanetworks.com/ultra-ethernet-testing/
Data center architectures for AI workloads
often adopt a spine-and-leaf structure, connecting thousands of AI
accelerators and storage solutions through low-latency L2/L3 networking
infrastructure at 400 -800Gbps port speeds. RDMA over Converged Ethernet
(RoCE) is a promising choice for storage data transport protocols.
- Data Center Bridging (DCB): facilitate high-throughput,
low-latency, and zero packet loss transport of RDMA packets
(lossless traffic) alongside regular best-effort traffic (lossy
traffic).
- Priority Flow Control (PFC): to prevent packet loss by
prompting a sender to temporarily pause sending packets when a
buffer becomes filled beyond a certain threshold.
- Congestion Notification (CN): RoCEv1 and RoCEv2 implement a
signaling between network devices that congestion that can be used
to reduce congestion spreading in lossless networks as well as
decreasing latency and improving burst tolerance.
- Enhanced Traffic Selection (ETS): enabling the allocation of
a minimum guaranteed bandwidth to each Class of Service (CoS).