RoCE Likely Marks The Beginning Of The End For Nvidia’s InfiniBand Business (NASDAQ:NVDA)

Semiconductor Maker Nvidia Reports Quarterly Earnings

Justin Sullivan

One of Nvidia’s (NASDAQ:NVDA) strategic vectors, when it acquired Mellanox for $6.9B in 2020, was to increase its lock on data center customers by bundling Mellanox’s InfiniBand solutions with Nvidia GPUs and accelerators (hereafter referred to as accelerators for simplicity). Once customers committed to Mellanox’s InfiniBand solution, they were locked into Nvidia accelerators. Nvidia could get away with a ransom in these situations.

But customers, especially powerful customers, like Amazon (AMZN), Microsoft (MSFT), Alphabet (GOOG) (GOOGL), and Meta (META), do not like to be tied to proprietary solutions as doing so is not in their economic or strategic self-interest. Consequently, and due to their general dislike for non-industry standard solutions, data center customers have been pushing the industry toward more standardized network solutions based on Ethernet.

To be sure, Mellanox also offers Ethernet-based solutions, but these solutions are not competitive with InfiniBand solutions for AI/ML workloads with a large amount of data movement. This is because InfiniBand is optimized for this class of solutions and has special features such as Remote Direct Memory Access, or RDNA, and is tailored to be a Software Defined Network.

The challenge for customers was that Nvidia with the Mellanox acquisition was the dominant supplier of both accelerators and InfiniBand interconnect; and that limited customer options for AI/ML workloads. The combination of Mellanox and Nvidia was so strong that some customers have moved away from industry-standard Ethernet interconnect to Mellanox InfiniBand solutions. Consequently, InfiniBand has become the de facto interconnect standard for high-performance systems – especially in the deep learning space.

The industry is attempting to undo Nvidia’s lock on the space with a technology called RDMA over Converged Ethernet. RoCE incorporates InfiniBand-like features into Ethernet and is a standards-driven alternative to the data center customer dilemma. RoCE features allow InfiniBand equivalent performance within the confines of Ethernet infrastructure. This in turn means that solutions from companies other than Nvidia, including CSP in-house solutions, can compete for either the accelerator or the network sockets.

While RoCE itself is not new, having a class of solutions that address data center use cases has been missing to date. Broadcom’s (AVGO) Tomahawk 5 is the first notable switching solution for high-traffic data center applications and AI/ML workloads. Going into technical details of this solution is neither in the scope of this article nor is it productive for investors, but interested readers can find relevant information at sites such as Next Platform.

Impact On Nvidia

This chip is already sampling and is expected to be widely deployed next year. Broadcom has lined up a strong array of infrastructure partners and claims to have design wins at Amazon, Google, Meta, Microsoft, and others. This widely supported solution (note the breath of customer and partner support at the bottom of the linked Tomahawk PR) puts Broadcom in the catbird’s seat for the next generation interconnect deployments.

Nvidia’s alternative Spectrum-4 switch, though announced earlier this year, is not yet in the market and, even if it becomes reasonably successful, will set Nvidia back a bit as the Company will lose its proprietary lock for this class of applications. Beyond The Hype would go as far as to say that we are starting to witness the beginning of the end of Mellanox InfiniBand deployments in the data center. The setback for Nvidia here is not just the networking piece but also the accelerator piece as CSPs, decoupled from InfiniBand, will now be able to more freely adopt non-Nvidia accelerators.

While Nvidia does not disclose Mellanox InfiniBand revenues, Beyond The Hype estimates that Mellanox contributes approximately a third of Nvidia’s Datacenter revenues or about $1B revenues a quarter. A lion’s share of these revenues is from InfiniBand. This is a major headwind for Nvidia. A segment that investors expect will be growing will actually be shrinking.

The impact of this is likely to become more evident in 2023 as other networking and accelerator solutions gain at the expense of Nvidia.

Closing Thoughts

To be sure, there has been perennial talk of Ethernet killing off InfiniBand for the last decade or more, but that has never happened. Instead, InfiniBand has grown into a comfortable and lucrative niche. There were good reasons for this growth – InfiniBand offered a strong value proposition for the niche and Nvidia exploited its Accelerator strength to drive that further. But finally, it appears that the time has come for InfiniBand to go in the reverse. With RoCE taking away one of the biggest advantages of InfiniBand and with data center customers showing little tolerance for high-priced proprietary or near proprietary solutions, the time has come for Ethernet to start replacing InfiniBand in many data center applications. This does not mean that InfiniBand will die immediately but it does mean that the growth is about to reverse and with it drive down Nvidia’s fortunes.

This is a significant and gathering storm that works against Nvidia’s growth narrative.

Be the first to comment

Leave a Reply

Your email address will not be published.


*