AI Supercycle: Picks and Shovels Winner == Nvidia

The Three Pillars of AI: Compute, Data, and Algorithms - Nvidia's Dominance and the Future of AI Compute

Jul 28, 2024

COMPUTE, DATA, and ALGORITHMS

Three key pillars of modern artificial intelligence are COMPUTE, DATA, and ALGORITHM. Winners of each will be crowned during different epochs. Nvidia is the clear leader and likely the runaway winner of the COMPUTE pillar.

Jensen Huang speaking at the COMPUTEX 2022, Taipei

The single greatest beneficiary of the current AI supercycle is Nvidia. Demand for Nvidia accelerated compute products continues to outstrip supply and manufacturing capacity through the end of 2024. Nvidia plans to triple output H100s in 2025 as B100 production begins. Despite winter clouds looming over AI, Nvidia's blistering growth will likely continue as demand for drive data center and edge drive massive capital expenditure. The amazing part is that Nvidia's price-earnings ratio decreased between 2023 and 2024, demonstrating a semi-rational company valuation with significant revenue dollars driven by hockey-stick data center GPU systems growth. Nvidia commands 98% (2023) of the data center GPU market.

CUDA Moat

We are in the first or second inning of the AI buildout's $1T capex cycle. This capex cycle is dominated by infrastructure such as data center construction, compute and network hardware, power, and cooling. Nvidia is benefiting from tailwinds from the replacement cycle of traditional (CPU) to accelerated (GPU) computing and the current AI hype. The software moat protects Nvidia's dominance, Compute Unified Device Architecture (CUDA). While many are awed by Nvidia's "overnight" meteoric rise in the equities market, CUDA software is an 18-year investment covering gaming, crypto, and AI. For example, AMD's MI100 has more robust raw hardware performance metrics than Nvidia's A100s in terms of TFLOPs performance (Table 1). With CUDA, deep learning operations are optimized, specifically mixed-precision training, utilizing FP16 for computation and FP32 for accumulation, which is highly efficient with CUDA. A tightly integrated software-hardware combination allows significant performance boosts in AI workloads despite having a smaller compute footprint in GPU clock speed and tensor cores.

Training and Inference Hardware Competition

Of Nvidia's 98% market share, their biggest customers are, in this order, Microsoft, Meta, Google, Amazon, and Oracle (source). Microsoft is expected to have a total of 1.8 million H100 GPUs by the end of 2024. Microsoft alone will consume the energy equivalent of Norway when all GPUs come online. Despite massive capex outlay, Microsoft, Meta, Google, and Amazon all have inference and training AI hardware ambitions. Google's Tensor Processors (TPUs) launched in 2015 to run inference workloads through ASIC hardware. It is in its 7th generation. Meta launched v1 of their in 2024 Meta Training and Inference Accelerator (MTIA) after years of R&D. It is a clear signal that Microsoft, Meta, Google, and Amazon all see overweighted risk to rely on Nvidia's GPU systems. Each de-risk by integrating down into the metal and own every layer along the way.

Startups like Cerebras, Groq, and Fireworks challenge Nvidia's hardware dominance by introducing novel architectures to accelerate inference by a performance a factor 10x or more. Speed matters and speed wins, 0.1 second improvement result in 10% sales increase. High performing AI inference will see significant adoption and uplift when performing at near human reading or speaking rates.

Even Intel, who missed the mobile infra buildout, has gotten into the game with their Gaudi chips. There will be no shortage of competition coming after Nvidia's cheese. In Q4 2024, Nvidia reported a gross margin of 76.1% in its data center business. Jeff Bezos stated "your margin is my opportunity." I don't see Nvidia sitting idly as its largest customers become competitors, there's a world where I see Nvidia move further up the stack into the algorithm layer.

No Emergence of a Killer App Yet

Despite the AI hype, we've not yet to see any killer apps for AI. The magic of generative AI only manifested in the form of features within an existing app, it has not manifested itself into a standalone app that accretes value. Case in point, Uber and Airbnb emerged from startup to become industry disruptors through the promise of GPS, cellular data, and mobile technology to create asset-light companies. Two and half years into the AI renaissance, nothing substantive has emerged. The uncorrelated data that companies continue to set alight piles of cash to build out AI infrastructure is represented by Nvidia uncorrelated data that training (60%) and inference (40%) workloads remained the same in a 6-month period between August 2023 and February 2024.

The only way to build a true business is for a consumer to pay you, or an enterprise to pay you. Nvidia captured the majority of the enterprise spend from hyperscalers building out infrastructure for training and inference workloads. We've not seen revenue growth stemming from AI-infused features in software companies, including ServiceNow, Salesforce, SAP, and Oracle. Enterprise customers are redicent to pay for additional features in with limited returns. OpenAI is one of the few companies that surpassed the $1B revenue (up to $3.4B) in 2024, given the company is private, it is difficult to know profitability measurement.

Playing to Win

I see investments continue to flow into AI infrastructure as there is an acute race to build open and closed source frontier models. Infrastructure overbuild is a historical pattern that repeats evidenced by the extensive overbuild of terrestrial and undersea fiber networks estimated at $500B. The buildout paved way to the current internet infrastructure. The AI overbuild is no different; the cyclical pricing of semiconductors over the last 20 years supporting this thesis.

"There's a meaningful chance that a lot of the companies are overbuilding now, and you look back, and you're like, 'Oh, maybe we all spent maybe some billions of dollars more than we had to.' On the flip side, we are all making a rational decision because the downside of being behind is that you're out of position for the most important technology for the next 10 to 15 years." Zuck shared this insight a day after the Llama 3.1 launch.

Despite the risks of overinvesting, many companies, including Meta, see a more significant downside given AI's transformative potential and power. Making big bets and playing to win!

For the record, this post is not investment advice. Picking Nvidia is also “no, duh!” selection.

Dave’s Substack

Discussion about this post