Why OpenAI has to go full-stack and why Nvidia has to bet big on OpenAI

Detangling the "circular financing" partnerships between OpenAI-Nvidia-Oracle, and two words on why this might be a win for developers and enterprises: subsidized compute.

Sep 25, 2025

Three companies just agreed to the most bizarre financial arrangement in tech history: Nvidia invests $100 billion in OpenAI (more than OpenAI has raised in its entire existence), OpenAI gives it back by leasing the latest and greatest Nvidia chips, and Oracle houses them all for $300 billion.

On paper, it’s a play for infrastructure. In reality, it’s makes sense if you understand what they are really buying: a hedge against a future where Gemini on TPUs makes Nvidia’ obsolete.

The stakes are binary.

If OpenAI reaches AGI first using Nvidia’s CUDA architecture, they define how all future intelligence runs. If Google gets there on TPUs (their in-house chip which now powers Gemini at 70% less cost per token) Nvidia’s tight-grip on the AI ecosystem loosens overnight.

This is why the three companies are willing to structure the largest infrastructure deal in history as circular vendor financing: Nvidia’s money goes to OpenAI, which goes back to Nvidia chips, which Oracle houses in data centers that OpenAI rents.

It’s such a capital-intensive endeavor that they had to raise from Wall Street - and it worked. On news of the multi-billion dollar partnerships, Oracle’s stock surged an insane 43% in a day. And then raised an oversubscribed $18 billion in corporate bonds (financing for cheap) which can be used to build out these AI data centers.

It only makes sense if the winner takes everything. Let’s break it down.

Why this alliance, why now:

1. Model layer economics are brutal. Vertical integration is survival strategy.

OpenAI raced to $12 billion in revenue this year but is negative $8 billion (spending $20 billion a year). They estimate burning $115 billion over the next five years in their pursuit of frontier models. That means every dollar of revenue they earn costs $1.60 in compute. In traditional SaaS, $1 of revenue might cost $0.20 to serve.

Why? Right now, OpenAI is paying Microsoft’s data center margin on top of Nvidia’s expensive GPU margin for every query served.

It must sting: OpenAI is doing all the expensive R&D and taking the risks with market validation while everyone else captures the value. They train the models, debug the safety issues, handle the regulatory scrutiny. And then watch as cloud providers take infrastructure margins, and large platforms capture distribution value. To add salt to the wound, the hyperscalers (Google, Meta, Microsoft, Amazon) are all running massive data centers, and likely prioritizing their own workloads.

The model layer is structurally unprofitable without either AGI (winner takes all) or going vertical to own compute (increase margins by lowering costs). This is why OpenAI must go full-stack or bust.

2. Google/TPUs threaten to make CUDA irrelevant

The urgency behind the N-O-O partnership becomes clear when you understand what Google has already built: a fully integrated AI stack from custom silicon (TPUs) to global distribution (Android, Search, Workspace). Gemini trains on TPUs, runs on TPUs, and soon could be integrated across billions of devices where the marginal cost of inference approaches zero.

This isn’t hypothetical. Gemini 2.5 already demonstrates that TPU-native models can match GPU-trained models. Google’s internal cost per token is estimated at 70-80% less than OpenAI’s costs on rented Azure infrastructure.

All the hyperscalers are working on their own models (or routing between several model providers) and designing their own chips. And while they aren’t yet as top-of-line as Nvidia’s GPUs or OpenAI’s models, the intent is to reduce reliance over time.

3. Nvidia’s Defensive Play

Nvidia is in a rare position to play kingmaker right now.

Even as they become the first-ever $4 trillion dollar company in the world, Nvidia’s position hinges on a few heavyweights: 40-50% of their $46 billion in revenue comes from 2-3 unnamed customers. China, representing 15-20% of revenue, is now closed due to geopolitical tensions. Major hyperscalers are all developing custom chips.

Nvidia needs OpenAI to push the frontier so aggressively that nobody’s custom silicon can keep up. They need training runs that require millions of GPUs, inference workloads that consume gigawatts. They need AGI to require so much raw compute that architectural efficiency doesn’t matter.

The investment “locks in” OpenAI to the Nvidia ecosystem they’ve been using since day one. But more importantly, it ensures that if AGI happens, it happens on CUDA. Because whoever achieves AGI first doesn’t just win the model race. They define the infrastructure standard for all intelligence that follows.

When one model wins, enterprises, countries, and developers will rush to build on that infrastructure. And those building the compute stack will be able to charge top dollar for that demand.

On Oracle’s play to be neutral

Oracle’s role is the shrewdest position of all. They’re not betting on any specific chip architecture or model winner. They’re positioning to become the “neutral infrastructure operator”. So while hyperscalers like AWS or Azure are conflicted (building their own models while serving competitors), Oracle can be the pure-play infrastructure provider.

It is telling that Microsoft, OpenAI’s previously largest investor with first-refusal rights, has chosen to sit out on the Oracle-OpenAI data center partnership. This is even as they announced one of the largest data centers build-outs in history.

Oracle CEO Larry Ellison understands something critical. They don’t need to win the AGI race. They just need to be indispensable to whoever does. And he’s proven that they can amass huge amounts of capital for the ambitious buildouts. At magnitudes more so than AI-native data center operators like Coreweave.

The Binary Outcome

If AGI arrives:

OpenAI’s infrastructure investment pays off infinitely
Nvidia’s compute demand goes parabolic. Think 100x or 1000x current requirements
Oracle becomes the utility provider for intelligence itself
The entire partnership validates as the move that defined the next era

If AGI doesn’t arrive:

OpenAI might collapse under unsustainable burn unless something like a technical breakthrough changes their economics
The model layer commoditizes. Open source as a cheaper alternative becomes more attractive
Nvidia watches as custom chips slowly erode their margins (maybe)
The data centers would still serve demand for inferences as AI gets embedded into workflows and daily lives.

How about other players like Anthropic?

Why isn’t Anthropic doing this? With $3 to 5 billion in revenue (a quarter of OpenAI’s), they’re staying deliberately capital-light. They’ve secured compute partnerships with both Google ($2B for TPU access) and AWS ($4B deal), avoiding the massive capital commitments of building their own data centers.

Either they’ve found a different solution to the margin problem… or they’re betting the compute intensity curve breaks before vertical integration pays off.

A bet worth making?

The thing about AGI is that the payoff is so asymmetric that even a small probability might justify the bet. If artificial general intelligence emerges, whoever controls it reshapes civilization. Next to that prize, $300 billion looks like a reasonable ante.

We’re watching three companies push all their chips to the center of the table, betting everything on a single hand. Either AGI arrives and retroactively justifies everything, or we’re about to witness the most spectacular cascade of write-offs in corporate history.

What this means for builders

While titans battle over AGI infrastructure, developers get: subsidized compute. My prediction is that this race means GPU prices will stay artificially low as the hyperscalers and infrastructure players compete for developer and enterprise mindshare. Models will get cheaper and more powerful as companies burn capital to prove their stack’s superiority.

The smart play is to stay agnostic. Route between models based on performance. Build abstraction layers that can swap between GPT, Claude, Gemini, and open source alternatives. Use the subsidized compute while it lasts, but don’t architect assuming any single provider survives.

The winners of the last platform war weren’t those who picked sides early, but those who stayed flexible enough to ride whatever wave ultimately won.

Aaron Childress

Sep 28

Original take! I haven’t heard this argument before… the counter I would give is 1) Google hasn’t shown interest/ability to scale TPU production beyond it’s own demand (and this isn’t just a switch that can be turned on). 2) assume Google reaches AGI first: OpenAI would likely follow faster than Google could scale hardware production even if it wanted to. 3) there’s probably a baked in expectation on the demand side that this would be the case, so enterprises probably wouldn’t be in a rush to swap out their stack (even if there was instant TPU supply, there would likely still be a grace period).

So, I think some version of this is true, but mainly hinges on inference cost parity when AGI finish line is reached by both (within a relatively short time gap).

Finally, as we create and saturate benchmarks, the designation of AGI will matter increasingly less. More directly, between now and “AGI” increasingly more use cases will have been solved and workflows automated that it will have a frog boiling effect.

Expand full comment

Discussion about this post

Ready for more?