tech

The Real Challenge to NVIDIA

The hyperscalers are building their own chips, the software moat is being crossed, and a war in the Middle East is making Nvidia's power-hungry GPUs more expensive by the day. The empire is intact. The foundations are shifting.

essay·Critical Regard editorial·21 March 2026

The Real Challenge to NVIDIA — NVIDIA faces challenges - illustration by Kaspy

Three weeks ago, Jensen Huang walked onto the floor of the SAP Center in San Jose — home of the San Jose Sharks, temporarily redecorated as a throne room — and told a crowd of 30,000 that he now sees a trillion dollars in purchase orders for Nvidia's Blackwell and Vera Rubin chips through 2027. The number had doubled since last year. A robotic Olaf the snowman waddled onstage. There was a cartoon campfire singalong with robots. The leather jacket was immaculate. The show is getting bigger every year, and so, for now, is the company it represents.

But beneath the spectacle of GTC 2026 was something more interesting than another record revenue forecast. Huang spent considerable time this year talking about inference — the process by which trained AI models actually respond to queries, generate text, power agents, return results. He said the word 36 times in his keynote. That emphasis was not accidental. Training, the phase of AI development that made Nvidia what it is, is becoming a smaller part of the picture. Inference is where the volume lives. And inference is where Nvidia's grip is genuinely, materially, and increasingly vulnerable.

Nvidia's dominance in AI training was built on two things: its GPU architecture, which happened to be well-suited to the matrix multiplication at the heart of neural network computation, and CUDA, the proprietary software layer it built over twenty years that locked researchers, developers, and eventually entire industries into its ecosystem. The H100 and now the Blackwell architecture became the de facto currency of the AI build-out. To train a large model in 2023 or 2024, you needed Nvidia. There was no serious alternative. The company achieved gross margins of 88 percent on its H100 chips — chips that cost roughly $3,300 to manufacture and sold for $28,000. It locked in 60 percent of TSMC's advanced CoWoS packaging capacity. It was, and remains, one of the most complete monopolies the technology industry has ever produced.

The easy narrative — AI equals GPUs equals upside — is fading. What matters now is where AI workloads actually land, how durable capital spending proves to be, and which vendors retain pricing power as inference, efficiency, and deployment take centre stage. Huang's answer at GTC was characteristic: don't defend the GPU in isolation; swallow the whole stack. Vera Rubin, Nvidia's new platform, can train large models with one-quarter the number of GPUs versus Blackwell and deliver ten times higher inference throughput per watt at one-tenth the cost per token. Nvidia also unveiled the Groq 3 LPU — the first chip from the startup whose intellectual property and key personnel Nvidia acquired for $20 billion in December 2025, its largest deal ever. Paired with Vera Rubin, the Groq 3 LPX rack can increase throughput for a one-trillion-parameter model by 35 times compared with the previous Blackwell generation. Huang is not merely selling chips. He is pitching a vertically integrated operating layer for the agentic AI economy — silicon, networking, software, factory design tools, robotics simulation, and now orbital data centers. The ambition is to make switching costs so high that the question of an alternative never becomes practical.

The problem is that the most powerful organisations in the world have already decided to ask it anyway.

Google, Amazon, Meta, and Microsoft — the four hyperscalers whose collective capital expenditure on AI infrastructure is projected to exceed $700 billion in 2026 — have each concluded that dependence on a single external chip supplier is a structural business risk they cannot sustain. Their collective shift to custom silicon is a strategic move to ensure competitiveness. The depth and maturity of that shift is the real story of 2026.

Google was first. Its Tensor Processing Units, designed in-house and built by Broadcom and TSMC, have been running Google's internal AI workloads for nearly a decade. The latest generation, codenamed Ironwood, runs on a 3-nanometre process and uses Optical Circuit Switching to dynamically reconfigure its Superpods, allowing for ten times faster collective operations than equivalent Ethernet-based clusters. Google now reports that over 75 percent of its Gemini model computations are handled by its internal TPU fleet. Sundar Pichai's company is not simply building an alternative to Nvidia — it has largely replaced Nvidia for its core inference workloads, and it is beginning to market that capacity externally. Anthropic, Midjourney, Salesforce, and Safe Superintelligence have all signed agreements to run workloads on Google TPUs. Anthropic announced a landmark expansion in October 2025: access to up to one million TPU chips, a deal worth tens of billions of dollars that will bring over a gigawatt of compute capacity online in 2026. The significance is not lost on anyone. Anthropic — the company whose Claude models run on Nvidia hardware in most enterprise deployments — is betting a substantial portion of its compute future on Google's silicon.

Amazon has followed a parallel path. AWS's Trainium and Inferentia chips, designed by its Annapurna Labs division, offer 30 to 40 percent better price performance than other hardware vendors, according to Ron Diamant, Trainium's head architect. Trainium3 entered volume production in early 2026, offering 2.5 petaflops of performance on a 3-nanometre process, with an UltraServer configuration that interconnects 144 chips into a single liquid-cooled rack capable of matching Nvidia's Blackwell architecture in rack-level performance while offering a significantly more efficient power profile. At the same time, Andy Jassy's company continues to fill its data centres with Nvidia GPUs for customer-facing workloads, because the flexibility, software support, and multi-cloud portability that Nvidia offers cannot yet be replicated for the full range of enterprise use cases. The strategy is not replacement but diversification — reducing dependency, gaining bargaining power, and offering cheaper alternatives in segments where the workloads are predictable enough for custom silicon to excel.

Meta is doing something similar but with different logic. Mark Zuckerberg's company has deployed its Meta Training and Inference Accelerator — MTIA — primarily to offload high-volume recommendation engine workloads from Nvidia H100s, allowing Meta to reserve its Nvidia GPUs for advanced AI research. Meta's 2026 roadmap includes its first dedicated in-house training chip, designed to support the development of Llama 4 and beyond within its massive Titan clusters — gigawatt-scale campus projects that are raising questions about the long-term sustainability of the AI infrastructure arms race. Reports that Meta is exploring a partial shift of training workloads to Google's TPUs have added another dimension. Whether that materialises or not, the signal is clear: a hyperscaler the size of Meta is now willing to entertain serious alternatives to Nvidia in its most compute-intensive operations.

Microsoft, under Satya Nadella, has had a more complicated path. Its in-house Maia chip programme faced delays and internal difficulties. Microsoft's next-generation chip, codenamed Braga, was delayed until 2026, placing the company in the position of continuing to purchase Nvidia Blackwell GPUs at high prices to meet OpenAI's computing demands. The company's CTO Kevin Scott has been publicly pragmatic about this: Nvidia offers the best price-performance, and Microsoft will use what works. But Maia 200 is now powering a significant portion of ChatGPT's inference workloads, and the partnership with Intel's 18A foundry for its next chip generation shows that Microsoft is not abandoning its custom silicon ambitions. It is merely behind schedule.

Beyond the hyperscalers, the competitive landscape is also fragmenting at the startup level, in ways that pose a different kind of threat. Cerebras Systems has built chips the size of entire wafers that handle inference workloads at speeds Nvidia cannot match for certain model sizes. SambaNova unveiled the SN50 chip in early 2026, claiming five times faster performance than competitive chips and three times lower total cost of ownership compared to GPUs for agentic AI workloads. The Groq LPU — the company whose technology Nvidia just paid $20 billion to absorb — was generating 800 tokens per second on inference tasks where Nvidia's GPUs operate at a fraction of that speed for latency-sensitive applications. By acquiring Groq, Nvidia has neutralised one challenger while simultaneously signalling that it recognises inference-optimised architectures as the next battleground.

The structural question underneath all of this is software. Nvidia's CUDA ecosystem — twenty years of development, four million developers, every major machine learning framework optimised for CUDA first — has always been the real moat. The hardware competes; the software traps. But that moat is being crossed. OpenAI's Triton has emerged as the industry's primary off-ramp, allowing developers to write high-performance kernels in Python that are hardware-agnostic, with mature backends now available for Google's TPU, AWS Trainium, and AMD's MI350 series. The OpenXLA compiler, backed by Google, Amazon, and other parties, allows PyTorch models to be deployed across hardware architectures with minimal modification. The software advantage that once made Nvidia indispensable is not gone, but it is eroding at a pace the company would have found unthinkable five years ago.

Lisa Su's Advanced Micro Devices occupies the most straightforwardly competitive position in the merchant chip market. AMD's Instinct accelerators — the MI300X and now MI350 series — offer genuine hardware advantages in memory bandwidth that matter specifically for inference workloads. The MI300X carries 192 gigabytes of memory and 5.3 terabytes per second of bandwidth, a configuration that benefits large language model inference where memory is the binding constraint. Su has been consistent and disciplined in targeting breadth over dominance — positioning AMD across PCs, industrial systems, and embedded use cases, while its software stack ROCm continues to close the gap with CUDA. AMD is unlikely to dethrone Nvidia in training. It does not need to. A ten-to-fifteen percent shift in inference market share would represent tens of billions of dollars in annual revenue, and AMD is the only merchant-chip alternative that enterprise customers without hyperscaler resources can actually deploy.

Then there is energy. And this is where the Middle East enters the picture, not as background noise but as a structural constraint that reshapes the competitive calculus in ways the financial markets have not yet fully absorbed.

Energy typically accounts for up to 60 percent of a data centre's operating costs. The sector was already navigating rising electricity prices — US electricity prices jumped 6.9 percent in 2025, more than double the headline inflation rate, according to Goldman Sachs. A single Nvidia B200 Blackwell chip draws 1,200 watts — nearly double its predecessor. Hyperscaler AI capital expenditure is projected at $700 billion in 2026, with over a trillion dollars in total private AI infrastructure capital planned. The energy requirement embedded in that spend is immense, and it is vulnerable.

The Strait of Hormuz facilitates the daily transit of approximately 20 percent of the world's oil consumption and a significant portion of liquefied natural gas. Since the US-Israel operation against Iran began on 28 February 2026, that chokepoint has been effectively closed. Wood Mackenzie analysts warned that a disruption to LNG flows through the Strait would be comparable in scale to the curtailment of Russian gas to Europe in 2022, when prices at European hubs briefly touched the equivalent of nearly $600 a barrel of oil. Data centres across North America and Europe have been pivoting toward gas-fired generation as a primary power source. The war has placed that bet under direct pressure.

A prolonged conflict in the Middle East could also disrupt supplies of helium and bromine, key materials in semiconductor manufacturing. Qatar produces over a third of the world's helium supply. QatarEnergy's Ras Laffan Industrial City was hit by an Iranian drone attack, taking the site offline. Phil Kornbluth, president of Kornbluth Helium Consulting, has said it is getting hard to imagine that the industry is not facing a minimum two-to-three month shutdown of helium production, with a four-to-six month period before the supply chain returns to normal. Helium is used in lithography and heat transfer in chip fabrication. There is no viable substitute. TSMC, which manufactures the overwhelming majority of advanced AI chips, is an island nation with its own significant energy insecurity. Add a global helium shortage to already-constrained TSMC capacity, and the supply chain for Nvidia's products tightens from two directions simultaneously.

Morningstar equity analyst Jing Jie Yu told CNBC that the high dependency on crude oil indicates significantly higher costs for AI data centres, which are roughly three to five times more power-hungry than regular data centres, and that this could significantly increase the total cost of ownership for hyperscalers, posing a direct threat to AI infrastructure adoption. The logic is uncomfortable but straightforward: soaring oil prices could drive AI companies to throttle their purchases of Nvidia's pricey GPUs, and could drive more data centre operators toward AMD's cheaper GPUs or toward developing their own custom accelerators. The war is not just raising energy costs. It is accelerating the economic incentive to find alternatives.

Iranian-affiliated forces have named regional offices, cloud infrastructure, and data centres linked to Google, Amazon, Microsoft, Nvidia, IBM, Oracle, and Palantir as targets. Iranian drone strikes have already hit three AWS data centres in the UAE and Bahrain — the first military strikes on US hyperscalers in history — causing fires, power outages, and knock-on disruptions to banking and payments services across the region. Nvidia temporarily closed its Dubai offices after nearby strikes. The war has also placed the Gulf's ambitions to become a sovereign AI superpower in direct question. The UAE, Saudi Arabia, and Qatar had collectively attracted billions in hyperscaler investment on the basis of abundant cheap energy, political stability, and data sovereignty requirements. The Gulf Cooperation Council expects the data centre market in the region to grow to nearly $9.5 billion by 2030, with PwC predicting capacity to triple from one gigawatt in 2025 to 3.3 gigawatts over the next five years. Those projections were made before the missiles started landing.

The picture that emerges from all of this is not a story of Nvidia's collapse. The trillion-dollar order pipeline is real. The CUDA ecosystem will not dissolve in an earnings cycle. The Vera Rubin platform is genuinely advanced, and Huang's move to absorb Groq — neutralising the most credible inference-only challenger while simultaneously claiming inference efficiency leadership — is the kind of strategic manoeuvre that has kept Nvidia ahead for twenty years. The leather jacket is not just theatre. The man inside it is a very good competitor.

But the structural forces now aligned against Nvidia's current form of dominance are not cyclical. Hyperscalers are spending tens of billions of dollars per year to reduce their dependence on a single supplier, and they are years into programmes that are now producing chips competitive enough to run a significant fraction of their own workloads. The software moat that made Nvidia's position seem permanent is being crossed by open-source tooling that the entire industry is motivated to accelerate. The energy economics that underpinned the GPU's brute-force approach are tightening, structurally and geopolitically, in ways that favour efficiency-optimised custom silicon over general-purpose raw compute. And a war in the Middle East is raising the cost and risk of the entire infrastructure build-out in ways that compress margins and accelerate the search for cheaper alternatives.

Based on AI server shipment growth rates, custom ASIC shipments from cloud providers are projected to grow 44.6 percent in 2026, while GPU shipments are expected to grow 16.1 percent. Nvidia will grow. The market will grow faster without it. That is the definition of losing ground.

The era when Nvidia was the only serious answer to the question of how to run AI is over. What comes next is a more fragmented, more competitive, more energy-constrained landscape in which Nvidia remains formidable but no longer irreplaceable. The question is not whether its dominance will erode. It is whether the company's pivot to become the operating layer of the entire AI economy — factories, agents, robots, space — can move fast enough to stay ahead of the floor collapsing beneath the GPU business that funded it all.

Jensen Huang built an empire. Now he is building the infrastructure empire needs to run on. Whether those are the same thing remains, for the moment, genuinely open.