Home Data Center Cisco shapes its strategy for Ethernet-based AI networks

by Michael Cooney

Senior Editor

Cisco shapes its strategy for Ethernet-based AI networks

News Analysis

Sep 14, 20236 mins

Cisco SystemsData CenterNetworking

Future-proofing Ethernet for AI is a priority for Cisco, which is positioning its Nexus data center switches as core elements of AI networking infrastructure.

Cisco is on a mission to make sure Ethernet is the chief underpinning for artificial intelligence networks now and in the future.

It has been a huge contributor to Ethernet development in the IEEE and other industry groups over the years, and now it’s one of the core vendors driving the Ultra Ethernet Consortium (UEC), a group that’s working to develop physical, link, transport and software layer advances for Ethernet to make it more capable of supporting AI infrastructures.

“Organizations are sitting on massive amounts of data that they are trying to make more accessible and gain value from faster, and they are looking at AI technology now,” said Thomas Scheibe, vice president of product management with Cisco’s cloud networking, Nexus & ACI product line.

“Customers want to know what they need to do now on the networking side to be able to run the huge clusters of GPUs they need and handle the volumes of data they create. And for most customers, it’s going to be Ethernet,” Scheibe said.

To that end, Cisco has put together a blueprint defining how organizations can use existing data center Ethernet networks to support AI workloads now.

Advancing Nexus 9000 features

A core component of Cisco’s AI blueprint is its Nexus 9000 data center switches, which support up to 25.6Tbps of bandwidth per ASIC and “have the hardware and software capabilities available today to provide the right latency, congestion management mechanisms, and telemetry to meet the requirements of AI/ML applications,” Cisco wrote in its Data Center Networking Blueprint for AI/ML Applications. “Coupled with tools such as Cisco Nexus Dashboard Insights for visibility and Nexus Dashboard Fabric Controller for automation, Cisco Nexus 9000 switches become ideal platforms to build a high-performance AI/ML network fabric.”

Two technologies that enable Nexus AI-based networking are the switch’s NX-OS operating system support for Remote Direct Memory Access Over Converged Ethernet, version 2 (ROCEv2) and Explicit Congestion Notification (ECN), Scheibe said.

ROCEv2 is a high-performance network computing technology that lets data transfer directly between the memory of two devices without having to involve a server CPU. It allows multiple packets to be transferred or routed simultaneously over a single connection, reducing latency and complexity as well as boosting throughput.

ECN essentially enables a lossless Ethernet network by monitoring for network congestion or other situations where packets could get dropped and throttling back the network to ensure that doesn’t happen. Lossless Ethernet networks are not only a key requirement for AI networking but also for today’s VOIP or video environments, Scheibe noted.

Another tool, Priority Flow Control, can help control congestion in Layer 3-based networks and plays an important role in overall congestion management.

Taken together, these technologies can give an Ethernet network the ability to prioritize certain sets of workloads – such as AI workloads that cannot tolerate any dropped packets and will always get network priority even if there’s congestion, Scheibe said.

“These technologies can be implemented in Nexus networks today, and customers can tune their environments to handle their workload mix,” Scheibe said. “There is ongoing work to handle larger and more AI workloads, and there are other techniques that can be used to make sure customers can easily distribute them across available bandwidth.”

Cisco has also published scripts so customers can automate specific settings across the network to set up this fabric and simplify configurations, Scheibe said.

In addition, Nexus 9000 switches come with built-in telemetry capabilities that can be used to correlate issues in the network and help optimize it for RoCEv2 transport, Cisco stated.

“The Cisco Nexus 9000 family of switches provides hardware flow telemetry information through flow table and flow table events. With these features, every packet traversing the switch can be accounted for, observed, and correlated with behavior such as micro-bursts or packet drops,” Cisco wrote. Customers can export this data to the Cisco Nexus Dashboard Insights management package and show the data per-device, per-interface, down to per-flow level granularity, according to Cisco.

Beyond the Nexus 9000

Another element of Cisco’s AI network infrastructure is its new high-end programmable Silicon One processors, which are aimed at large-scale AI/ML infrastructures for enterprises and hyperscalers.

Cisco added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its now 13-member Silicon One family. The processors can be customized for routing or switching from a single chipset, eliminating the need for different silicon architectures for each network function. This is accomplished with a common operating system, P4 programmable forwarding code, and an SDK.

The new devices, positioned at the top of the Silicon One family, will bring networking enhancements that make them ideal for demanding AI/ML deployments or other highly distributed applications, Cisco said.

Core to the Silicon One system is its support for enhanced Ethernet features, such as improved flow control, congestion awareness, and avoidance.

The system also includes advanced load-balancing capabilities and “packet-spraying” that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link-failure recovery also helps ensure the network operates at peak efficiency, according to Cisco.

Combining these enhanced Ethernet technologies and taking them a step further ultimately lets customers set up what Cisco calls a Scheduled Fabric. In a Scheduled Fabric, the physical components – chips, optics, switches – are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior and much higher bandwidth throughput, especially for flows like AI/ML, Cisco said.

Data-center sustainability focus

While AI seems all-encompassing these days, there are other topics that are challenging data center network operators.

For example, customers are looking to efficiently expand existing data center networks to handle larger workloads, so they want to find the best way to integrate 400G into the network, Scheibe said.

Two other major challenges are reducing data center power consumption and increasing sustainability practices, Scheibe said.

“Organizations are looking for help on getting a baseline on how much power they are using and learning what their current carbon footprint is so they can make informed decisions on how to move forward,” Scheibe said.

Cisco Nexus Cloud offers a Network Energy Utilization service that gives customers an idea of a data center’s environmental impact.

Recently, Cisco announced that the Nexus Dashboard will provide real-time and historical insights for power consumption of all IT equipment in the data center and estimate the energy footprint of data center operations.

Nexus Dashboard will also provide AI Data Center Blueprint for Networking, which will offer enterprises looking to develop AI-based applications a way to set up their networks to handle the additional transaction load. For example, it will detail how to implement InfiniBand-to-Ethernet network migrations and large-scale machine-learning fabrics.

by Michael Cooney

Senior Editor

Michael Cooney is a Senior Editor with Network World who has written about the IT world for more than 25 years. He can be reached at michael_cooney@foundryco.com.

Americas

Topics

About

Policies

Our Network

More

Cisco shapes its strategy for Ethernet-based AI networks

Future-proofing Ethernet for AI is a priority for Cisco, which is positioning its Nexus data center switches as core elements of AI networking infrastructure.

Advancing Nexus 9000 features

Beyond the Nexus 9000

Data-center sustainability focus

More from this author

Optical networking challenges gain attention as AI networking demands rise

Ransomware, zero-day exploits keep network security pros scrambling

Fortinet grabs cloud security player Lacework

Cisco steps up full-stack observability play with Splunk tie-ins

Cisco Live: AI takes center stage

Juniper tunes AI to find and fix SD-WAN, WAN routing problems

HPE Aruba aims to simplify private 5G for enterprises

AT&T taps Cisco fixed 5G wireless gateways for WAN service

Most popular authors

Show me more

OpenSSH vulnerability regreSSHion puts millions of servers at risk

Nutanix hunts disgruntled VMware customers

Alibaba to cease data center operations in India and Australia

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the stat command

The SL command easter egg

How to use the shuf command

Cisco shapes its strategy for Ethernet-based AI networks

Future-proofing Ethernet for AI is a priority for Cisco, which is positioning its Nexus data center switches as core elements of AI networking infrastructure.

Advancing Nexus 9000 features

Beyond the Nexus 9000

Data-center sustainability focus

Related content

Cisco patches actively exploited zero-day flaw in Nexus switches

Nokia to buy optical networker Infinera for $2.3 billion

French antitrust charges threaten Nvidia amid AI chip market surge

Lenovo adds new AI solutions, expands Neptune cooling range to enable heat reuse

Newsletter Promo Module Test

More from this author

Optical networking challenges gain attention as AI networking demands rise

Ransomware, zero-day exploits keep network security pros scrambling

Fortinet grabs cloud security player Lacework

Cisco steps up full-stack observability play with Splunk tie-ins

Cisco Live: AI takes center stage

Juniper tunes AI to find and fix SD-WAN, WAN routing problems

HPE Aruba aims to simplify private 5G for enterprises

AT&T taps Cisco fixed 5G wireless gateways for WAN service

Most popular authors

Show me more

OpenSSH vulnerability regreSSHion puts millions of servers at risk

Nutanix hunts disgruntled VMware customers

Alibaba to cease data center operations in India and Australia

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the stat command

The SL command easter egg

How to use the shuf command