Home Networking Cisco sets a foundation for AI network infrastructure

by Michael Cooney

Senior Editor

Cisco sets a foundation for AI network infrastructure

News Analysis

Jun 20, 20234 mins

Cisco SystemsNetworking

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

diversity saudi arabia turkey middle east networking globe map connections by dem10 gettyimages 118

Credit: Getty Images

Cisco is taking the wraps off new high-end programmable Silicon One processors aimed at underpinning large-scale Artificial Intelligence (AI)/Machine Learning (ML) infrastructure for enterprises and hyperscalers.

The company has added the 5nm 51.2Tbps Silicon One G200 and 25.6Tbps G202 to its now 13-member Silicon One family that can be customized for routing or switching from a single chipset, eliminating the need for different silicon architectures for each network function. This is accomplished with a common operating system, P4 programmable forwarding code, and an SDK.

The new devices, positioned at the top of the Silicon One family, bring networking enhancements that make them ideal for demanding AI/ML deployments or other highly distributed applications, according to Rakesh Chopra, a Cisco Fellow in the vendor’s Common Hardware Group.

“We are going through this huge shift in the industry where we used to build these sorts of reasonably small high-performance compute clusters that seemed large at the time but nothing compared to the absolutely huge deployments required for AI/ML,” Chopra said. AI/ML models have grown from needing a few GPUs to needing tens of thousands linked in parallel and in series. “The number of GPUs and the scale of the network is unheard of.”

The new Silcon One enhancements include a P4-programmable parallel-packet processor capable of launching more than 435 billion lookups per second.

“We have a fully shared packet buffer where every port has full access to the packet buffer regardless of what’s going on,” Chopra said. This is in contrast with allocating buffers to individual input and output ports, which means the buffer you get depends on which port the packets go to. “That means that you’re less capable of writing through traffic bursts and more likely to drop a packet, which really decreases AI/ML performance,” he said.

In addition, each Silicon One device can support 512 Ethernet ports letting customers build a 32K 400G GPU AI/ML cluster requiring 40% fewer switches than other silicon devices needed to support that cluster, Chopra said.

Core to the Silicon One system is its support for enhanced Ethernet features such as improved flow control, congestion awareness, and avoidance.

The system also includes advanced load-balancing capabilities and “packet-spraying” that spreads traffic across multiple GPUs or switches to avoid congestion and improve latency. Hardware-based link-failure recovery also helps ensure the network operates at peak efficiency, the company stated.

Combining these enhanced Ethernet technologies and taking them a step further ultimately lets customers set up what Cisco calls a Scheduled Fabric.

In a Scheduled Fabric, the physical components—chips, optics, switches—are tied together like one big modular chassis and communicate with each other to provide optimal scheduling behavior, Chopra said. “Ultimately what it translates to is much higher bandwidth throughput, especially for flows like AI/ML, which lets you get much lower job-completion time, which means that your GPUs run much more efficiently.”

With Silicon One devices and software, customers can deploy as many or as few of these features as they need, Chopra said.

Cisco is part of a growing AI networking market that includes Broadcom, Marvell, Arista and others that is expected to hit $10B by 2027, up from the $2B it is worth today, according to a recent blog from the 650 Group.

“AI networks have already been thriving for the past two years. In fact, we have been tracking AI/ML networking for nearly two years and see AI/ML as a massive opportunity for networking and one of the main drivers for data-center networking growth in our forecasts,” the 650 blog stated. “The key to AI/ML’s impact on networking is the tremendous amount of bandwidth AI models need to train, new workloads, and the powerful inference solutions that appear in the market. In addition, many verticals will go through multiple digitization efforts because of AI during the next 10 years.”

The Cisco Silicon One G200 and G202 are being tested by unidentified customers now and are available on a sampled basis, according to Chopra.

by Michael Cooney

Senior Editor

Michael Cooney is a Senior Editor with Network World who has written about the IT world for more than 25 years. He can be reached at michael_cooney@foundryco.com.

Americas

Topics

About

Policies

Our Network

More

Cisco sets a foundation for AI network infrastructure

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

More from this author

Optical networking challenges gain attention as AI networking demands rise

Ransomware, zero-day exploits keep network security pros scrambling

Fortinet grabs cloud security player Lacework

Cisco steps up full-stack observability play with Splunk tie-ins

Cisco Live: AI takes center stage

Juniper tunes AI to find and fix SD-WAN, WAN routing problems

HPE Aruba aims to simplify private 5G for enterprises

AT&T taps Cisco fixed 5G wireless gateways for WAN service

Most popular authors

Show me more

OpenSSH vulnerability regreSSHion puts millions of servers at risk

Nutanix hunts disgruntled VMware customers

Alibaba to cease data center operations in India and Australia

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the stat command

The SL command easter egg

How to use the shuf command

Cisco sets a foundation for AI network infrastructure

Cisco adds two new high-end programmable Silicon One devices that can support massive GPU clusters for AI/ML workloads.

Related content

Cisco patches actively exploited zero-day flaw in Nexus switches

Nokia to buy optical networker Infinera for $2.3 billion

French antitrust charges threaten Nvidia amid AI chip market surge

Lenovo adds new AI solutions, expands Neptune cooling range to enable heat reuse

Newsletter Promo Module Test

More from this author

Optical networking challenges gain attention as AI networking demands rise

Ransomware, zero-day exploits keep network security pros scrambling

Fortinet grabs cloud security player Lacework

Cisco steps up full-stack observability play with Splunk tie-ins

Cisco Live: AI takes center stage

Juniper tunes AI to find and fix SD-WAN, WAN routing problems

HPE Aruba aims to simplify private 5G for enterprises

AT&T taps Cisco fixed 5G wireless gateways for WAN service

Most popular authors

Show me more

OpenSSH vulnerability regreSSHion puts millions of servers at risk

Nutanix hunts disgruntled VMware customers

Alibaba to cease data center operations in India and Australia

Has the hype around ‘Internet of Things’ paid off? | Ep. 145

Episode 1: Understanding Cisco’s Converged SDN Transport

Episode 2: Pluggable Optics and the Internet for the Future

How to use the stat command

The SL command easter egg

How to use the shuf command