When Jensen Huang, the founder and CEO of NVIDIA, takes the stage, the industry listens. Not merely because NVIDIA’s market capitalization briefly exceeded $5 trillion in late 2025 — a figure that would have seemed hallucinatory just three years ago — but because Huang has an unusual gift: the ability to articulate, with the precision of an engineer and the cadence of a prophet, exactly where the technology is going next. In March 2026, he published a concise but dense post on the NVIDIA blog titled simply, “AI Is a 5-Layer Cake”, crystallizing a framework he had been road-testing at Davos, CES, and in private briefings throughout the year. The piece is short. But the implications are vast.
This article expands on Huang’s framework, stress-tests it against current data, and explains why — whether you are a policymaker in Brussels, a sovereign wealth fund manager in Abu Dhabi, a startup founder in Lagos, or an electrical engineer in Ohio — understanding this stack is arguably the most important thing you can do to orient yourself in the next decade of economic and technological history.
The Break From the Past: From Retrieval to Real-Time Intelligence
Before dissecting the five layers, it is worth understanding the fundamental shift in computing that makes them necessary in the first place.
For the better part of five decades, computing operated on a model of retrieval. A human being wrote an algorithm. That algorithm was stored. A computer executed the stored algorithm when called. Data had to be carefully structured — organized into tables, normalized, indexed — so that queries could find it. The database, the SQL query, the spreadsheet: these were the dominant artifacts of this paradigm, because they made a fundamentally brittle system workable.
As Huang writes in his post, “AI breaks that model.” For the first time in computing history, a machine can take unstructured information — a photograph, a sentence in natural language, a protein sequence, the waveform of a human voice — and derive meaning from it. More importantly, it generates a novel output in real time. Each response is not retrieved from a database; it is synthesized from patterns learned during training, conditioned on the context of the moment. The machine is not a filing clerk retrieving a document. It is something closer to a reasoning engine generating an answer from first principles.
This transition has an enormous physical consequence, which is the central insight of Huang’s framework: if intelligence is produced in real time, rather than retrieved, then the entire computing stack beneath it must be rebuilt from scratch. You cannot run real-time generative intelligence on the infrastructure optimized for query-and-retrieval. The economics do not work. The physics do not work. The architecture does not work.
This is what makes the five-layer cake not just a descriptive metaphor but a predictive and prescriptive one. Each layer represents a bottleneck that must be solved, and each bottleneck creates an enormous economic opportunity.
Layer One: Energy — The Binding Constraint
Huang is unambiguous: “Energy is the first principle of AI infrastructure and the binding constraint on how much intelligence the system can produce.” This is not rhetorical flourish. It is a statement about hard physics.
Every token generated by a large language model — every word in this article, if it were written by an AI — represents an actual movement of electrons through silicon, the dissipation of heat, the conversion of electrical power into computation. There is no abstraction layer beneath this reality. The intelligence is, at its most fundamental level, a function of how much energy can be delivered to the chips, managed through cooling systems, and converted into useful computation without melting the hardware.
The numbers here are staggering and accelerating. According to the International Energy Agency’s Energy and AI report, global data center electricity consumption was approximately 415 terawatt-hours (TWh) in 2024, representing around 1.5% of global electricity consumption. In its base case scenario, the IEA projects this figure will roughly double to approximately 945 TWh by 2030 — just under 3% of total global electricity consumption. Other forecasts are more aggressive. Analysis published in early 2026 projected that global data center electricity consumption could exceed 1,000 TWh by the end of 2026 alone, an amount roughly equivalent to Japan’s entire annual electricity usage.
Gartner puts total worldwide data center electricity consumption at 448 TWh in 2025, growing to 980 TWh by 2030, with AI-optimized servers alone projected to rise nearly fivefold — from 93 TWh in 2025 to 432 TWh in 2030. By 2030, Gartner estimates AI-optimized servers will represent 44% of total data center power usage, up from 21% in 2025.
The geographic concentration of this demand is already creating visible infrastructure crises. In the United States, data centers consumed 183 terawatt-hours of electricity in 2024 — more than 4% of total U.S. electricity consumption, roughly equivalent to the annual electricity demand of all of Pakistan. In Virginia’s data center corridor, facilities now consume roughly one in every four kilowatt-hours produced by the state’s largest utility. A Belfer Center analysis from February 2026 noted that utilities in the PJM electricity market — covering a swath from Illinois to North Carolina — attributed an estimated $9.3 billion increase in capacity market costs to data centers, translating to higher household electricity bills across multiple states.
The physical density problem is intensifying. Between 2021 and 2024, average data center rack power densities rose from 8 kilowatts to 17 kilowatts. By early 2026, AI-driven racks frequently exceed 50 kilowatts per rack — nearly triple what they were just three years ago — forcing operators to abandon traditional air cooling and adopt liquid-based cooling systems at significant capital expense. A single large-scale AI training facility now requires between 100 megawatts and 1,000 megawatts of dedicated power.
This creates a multiplier effect throughout the energy economy. Nuclear power — long politically dormant — is suddenly being dusted off, with major technology companies signing power purchase agreements directly with nuclear operators. Microsoft famously committed to restart the Three Mile Island reactor in Pennsylvania specifically to power its AI data centers. Geothermal, offshore wind, long-duration battery storage, and even small modular reactors (SMRs) are now mainstream topics in data center planning conversations, not engineering curiosities.
The implications for geopolitics are significant. Countries with abundant, cheap, and clean energy — whether Norwegian hydropower, Icelandic geothermal, Saudi solar, or Canadian nuclear — suddenly possess an entirely new form of strategic advantage. The map of AI power may well follow the map of energy abundance.
Layer Two: Chips — The Translation Layer
If energy is the raw material, chips are the machinery that transforms it into computation. And no company better embodies this layer than NVIDIA itself, which makes Huang’s framework simultaneously a strategic analysis and, critics might argue, a very polished advertisement.
But the underlying substance is real. AI workloads differ fundamentally from traditional computing in their demands on hardware. They require massive parallelism — the ability to perform millions of mathematical operations simultaneously, rather than serially. They require high-bandwidth memory to feed those operations quickly enough to keep the processing units from starving. They require fast interconnects to synchronize across thousands of chips operating in concert. General-purpose CPUs, designed for sequential, branching workloads, were never built for this. The GPU — originally designed to render the parallel geometry of video game graphics — turned out to be far better suited to the matrix mathematics of neural network training, which is why NVIDIA, a company that spent its first decade making graphics cards for gamers, became the most valuable company on earth.
NVIDIA’s dominance in this layer is difficult to overstate. Its market share in discrete GPUs stood at approximately 92% in early 2025, and its data center GPU market share remains above 80% by most analyst estimates. The company’s data center revenue reached $51.2 billion in a single quarter in late 2025 — a 66% year-over-year increase. Analysts at the time projected fiscal 2026 annual data center revenue in the range of $175–200 billion. The CUDA software platform, with more than five million developers building on top of it, creates switching costs so high that one semiconductor analyst described the hardware as “almost secondary” to the software moat.
The current workhorse is the Blackwell architecture, which began shipping in volume in late 2024 and ramped aggressively through 2025. At CES 2026, NVIDIA unveiled its next generation: the Rubin platform. According to the official NVIDIA press release, the Rubin platform is built around six specialized chips working as a unified system, delivering up to a 10x reduction in inference token cost and a 4x reduction in the number of GPUs required to train certain model types, compared with Blackwell. The platform also achieves 40% higher energy efficiency per watt — a critical improvement given the energy constraints at layer one. Rubin-based products are expected from partners in the second half of 2026, with AWS, Google Cloud, Microsoft, and Oracle among the first cloud providers to deploy Rubin-based instances.
Competition is intensifying but not yet threatening NVIDIA’s structural position. AMD’s MI series accelerators are gaining traction, and the company’s MI350 was described as its “fastest ramping product in company history.” Google’s TPUs, Amazon’s Trainium and Inferentia chips, and Microsoft’s custom Maia silicon are all examples of hyperscaler custom silicon attempting to reduce dependence on NVIDIA. But as of early 2026, custom silicon is capturing perhaps 15–25% of internal hyperscaler workloads — supplementing rather than replacing NVIDIA hardware for third-party enterprise customers who value the flexibility and software compatibility of the CUDA ecosystem.
The deeper story at this layer is what Huang frames as the end of “general-purpose” computing. The era in which a single chip architecture optimized for one workload (sequential logic, in the CPU’s case) could be pressed into service for all tasks is over. AI has created demand for specialized silicon at every point in the stack: training accelerators, inference chips, networking ASICs, memory controllers, and cooling management processors. The chip layer is fragmenting into a rich ecosystem of specialized hardware, all oriented around the same fundamental problem: converting energy into intelligence as efficiently as possible.
Layer Three: Infrastructure — Building the AI Factory
The third layer is where abstraction meets concrete. Huang uses the term “AI factories” deliberately, and it repays careful attention. A traditional data center was designed to store and retrieve information. An AI factory is designed to manufacture intelligence. The distinction is not merely semantic; it implies entirely different design constraints, supply chains, workflows, and operational models.
The scale of the current buildout is difficult to comprehend without analogy. At Davos in January 2026, Huang noted that TSMC had announced 20 new chip fabrication plants, that Foxconn, Wistron, and Quanta were building 30 new computer assembly plants, and that Micron had committed $200 billion in the United States alone. These are not incremental expansions; they are the simultaneous construction of a new industrial base.
The World Resources Institute has noted that projections for U.S. data center electricity demand through 2030 range from 200 to over 1,050 TWh per year depending on assumptions — a spread so wide as to constitute genuine uncertainty about the future shape of the American grid. The Lawrence Berkeley National Laboratory projects U.S. data center demand growing from 176 TWh in 2023 to between 325–580 TWh by 2028.
Crucially, Huang argues that infrastructure development is where geopolitical divergence is most visible. In conversations about competition between the United States and China, he has consistently highlighted a striking asymmetry in construction velocity. In the United States, the journey from breaking ground to a fully operational AI supercomputer can take around three years — a function of permitting requirements, grid interconnection queues, and supply chain complexity. In China, by contrast, comparable facilities are being constructed in a fraction of that time.
This is not merely a point about engineering efficiency. It reflects fundamentally different approaches to industrial policy, regulatory streamlining, and the prioritization of national infrastructure investment. For the United States, the implications are uncomfortable: the country that dominates chip design and AI model development may find itself constrained by the pace at which it can build the physical factories to run them.
The workforce implications are also significant, and Huang returns to them repeatedly. Building and operating AI factories requires electricians, pipefitters, steelworkers, network technicians, cooling engineers, and construction workers — none of whom require a PhD in computer science. Huang has framed this as an economic opportunity, not merely a constraint: the AI industrial buildout is creating demand for well-paid, skilled, physical labor at a moment when many commentators are focused only on the white-collar disruption AI might cause.
Layer Four: Models — The Intelligence Engine
Above infrastructure sit the models — the trained neural networks that constitute AI’s reasoning capability. This is the layer that most people associate with “AI,” and it is the layer that has undergone the most dramatic public evolution over the past three years. But Huang’s framework insists on placing it in context: models are not the end product; they are one layer in a stack, sitting atop enormous physical infrastructure and below the applications where economic value is ultimately captured.
The model layer is itself fragmenting. Language models are the most visible category, but Huang explicitly names “protein AI, chemical AI, physical simulation, robotics and autonomous systems” as areas where some of the most transformative work is happening. AlphaFold, DeepMind’s protein structure prediction system, is perhaps the most dramatic example: a model that has effectively solved a problem — predicting how proteins fold — that biologists had worked on for fifty years. AlphaFold’s predictions have now been used to accelerate drug discovery pipelines at pharmaceutical companies worldwide, collapsing years of experimental work into hours of computation.
The geopolitical dynamics of this layer received enormous attention in January 2025 when DeepSeek released its R1 model. DeepSeek-R1, a Chinese open-source reasoning model with approximately 670 billion parameters, achieved performance competitive with OpenAI’s then-best models while reportedly being trained at a fraction of the cost — an initial figure of around $5.6 million for the final training run was widely cited, though independent analysis placed the full development cost considerably higher. The release caused a single-day drop of nearly 17% in NVIDIA’s stock price, wiping nearly $589 billion in market capitalization — the largest single-day market cap loss in history. Investors briefly panicked that cheaper models meant less demand for expensive chips.
The panic was short-lived, and arguably based on a misreading of Huang’s framework itself. As the World Economic Forum noted, DeepSeek-R1 by the end of January 2025 had overtaken ChatGPT as the most downloaded free app on the Apple App Store in the United States. Rather than reducing demand for infrastructure, the availability of a powerful open-source model accelerated adoption at the application layer — and therefore increased aggregate demand for training, infrastructure, chips, and energy throughout the stack. Huang himself called DeepSeek “a powerful example” of how open-source models activate demand across the entire stack.
Marc Andreessen hailed DeepSeek-R1 as “one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world.” The AI scholar Kai-Fu Lee noted that “the biggest revelation from DeepSeek is that open-source has won,” with implications for OpenAI’s pricing power and the entire proprietary model business. CNBC reported that Chinese companies from Baidu to smaller startups were moving toward open-source licensing in response, with Counterpoint Research analyst Wei Sun describing DeepSeek’s success as proof that “open-source strategies can lead to faster innovation and broad adoption.”
The model layer is also where the concept of “physical AI” — arguably the most important frontier of the next decade — begins to emerge. Physical AI refers to models that understand the laws of physics: fluid dynamics, material science, protein chemistry, particle interactions. Unlike language models, which manipulate tokens representing meaning, physical AI models manipulate representations of reality. They can simulate how a wing will behave under stress at different temperatures, predict how a new drug molecule will bind to a receptor, or generate the control signals for a robotic arm to perform a delicate assembly task. The implications for manufacturing, medicine, materials science, and infrastructure design are genuinely transformative — and are still largely unrealized.
Layer Five: Applications — Where Value Is Captured
At the apex of the stack are applications — the software products and services where economic value is ultimately created and where AI’s societal impact is most visibly felt. Drug discovery platforms. Industrial robotics. Legal research tools. Autonomous vehicles. Medical imaging analysis. Financial risk modeling. Customer service automation. Educational tutoring systems. These are not theoretical possibilities; they are products in active deployment, generating revenue, reshaping workflows, and beginning to demonstrate measurable economic returns.
Huang’s framework insists on understanding applications as the culmination of everything beneath them. A self-driving car is not primarily a software product; it is an AI application that pulls on every layer of the stack, from the power grid that charged the training clusters to the chips in the onboard computer to the models trained on billions of miles of driving data. A humanoid robot is “an AI application embodied in a body.” The same stack, different form factors, different outcomes.
The venture capital markets have noticed. Huang noted at Davos that 2025 was among the largest years for VC funding on record, with the majority of capital flowing into “AI-native companies” — firms built from the ground up on AI capabilities rather than retrofitting AI onto legacy architectures. These companies span healthcare, robotics, manufacturing, financial services, and legal technology. They represent the application layer of Huang’s cake being constructed in real time, by thousands of entrepreneurs simultaneously.
Perhaps the most instructive example is healthcare. AI in radiology — the analysis of medical imaging — was one of the first clinical applications to achieve genuine product-market fit. The intuitive fear was that AI would replace radiologists. The observed reality has been the opposite: as Huang describes, the number of radiologists has actually increased as AI tools have proliferated. The explanation lies in what economists call the Baumol effect: when a tool dramatically increases productivity in one aspect of a job, the human practitioner is freed to do more of the high-value work that only humans can do — communicating with patients, exercising clinical judgment, managing complex cases. Hospitals can serve more patients, which requires more radiologists, not fewer.
Huang extends the same logic to nursing. The United States faces a shortage of roughly five million nurses, in part because nurses spend nearly half their time on documentation and administrative charting. AI tools that automate charting — like those developed by Abridge, among others — do not eliminate nurses; they restore nurses to their core function of patient care, while simultaneously making it possible to serve more patients with existing staff. The labor economics of the AI era are more complicated, and arguably more optimistic, than the simple displacement narrative suggests.
The Cascade Effect: Why Every Layer Reinforces the Others
The most important structural insight in Huang’s framework is what might be called the cascade effect: every successful application at layer five increases demand for everything beneath it, all the way down to the energy at layer one.
When DeepSeek-R1 became available as an open-source model, developers around the world began building applications on top of it. Those applications generated inference workloads. Those workloads consumed compute. That compute required chips. Those chips required power. Every download of DeepSeek-R1 was, in a small but real sense, a signal propagating down through all five layers of the cake.
This cascade operates in the other direction too. Progress at layer two — better chips — makes inference cheaper, which enables more applications to achieve economic viability, which drives more usage, which creates more revenue, which funds more R&D, which produces better chips. Progress at layer one — cheaper, cleaner energy — reduces the operating cost of data centers, which expands the economic frontier of the application layer. The layers are not independent; they are a tightly coupled system with positive feedback loops operating at every interface.
This is why Huang argues that the buildout will not be confined to a single country or a single sector. The positive feedback dynamics of the stack mean that any country that invests seriously in any layer creates demand and capability that propagates upward and downward. A country that builds out clean energy infrastructure for reasons entirely unrelated to AI will find itself with a competitive advantage at layer three. A country that invests in semiconductor manufacturing will find demand for its products pulled by the explosive growth at layers four and five. A country that cultivates strong application developers will create demand that echoes down to layer one.
The Trillion-Dollar Buildout and Its Discontents
No article about Huang’s framework would be complete without acknowledging the scale of capital being deployed — and the genuine questions about whether it is being deployed wisely.
The buildout is already historic. Microsoft alone signaled capital expenditure plans exceeding $80 billion for fiscal 2025, with a substantial share directed at AI infrastructure. Google’s 2025 data center investment plan totaled $75 billion, more than doubling from $33 billion in 2024. Amazon, Meta, and Oracle are each committing comparable sums. Huang described the total as “a few hundred billion dollars” already deployed, with “trillions of dollars of infrastructure still needed.” This is, as he says, the largest infrastructure buildout in human history.
But not everyone is convinced the economics justify the investment. Skeptics point to the gap between AI capability and AI revenue, arguing that the application layer has not yet generated returns proportionate to the capital being deployed at layers one through three. Some observers, including prominent financial analysts, have questioned whether the buildout resembles previous technology infrastructure bubbles — the fiber optic overinvestment of the late 1990s being the most frequently cited cautionary tale.
Huang’s counter-argument, made at Davos and elsewhere, is structural rather than cyclical: “We are not merely building software; we are executing the largest infrastructure construction project in human history.” The application layer lag is real, but it is a function of the models only recently crossing the threshold of genuine usefulness at scale. As he noted in his blog post, “Models became good enough to be useful at scale. Reasoning improved. Hallucinations dropped. Grounding improved dramatically. For the first time, applications built on AI began generating real economic value.”
There is also an important environmental dimension to this buildout that demands honest acknowledgment. Data centers and AI workloads have significant carbon footprints: estimates suggest AI’s annual carbon footprint could reach 32.6–79.7 million tons of CO₂ by 2025. Google consumed 6.1 billion gallons of water across its data center portfolio in 2023, a figure that has only grown. The energy transition at layer one is not merely an economic necessity; it is an environmental imperative. The AI industry cannot sustain its growth trajectory on fossil fuel power without exacerbating the climate crisis it is simultaneously being asked to help solve.
What This Means for Nations, Companies, and Individuals
Huang’s closing argument in his March 2026 post is almost philosophical in its scope: “AI is becoming the foundational infrastructure of the modern world. And the choices we make now — how fast we build, how broadly we participate, and how responsibly we deploy it — will shape what this era becomes.”
For nations, the framework suggests that AI sovereignty is not primarily a software question. It is an energy question, a chip manufacturing question, an infrastructure construction question, and a workforce development question. Countries that treat AI as a consumer product — something to be purchased from American or Chinese companies — will find themselves structurally dependent in ways that mirror energy dependence. Countries that invest in building their own capabilities at every layer will have genuine strategic agency. As Huang told the WEF audience: “I really believe that every country should get involved to build AI infrastructure, build your own AI, take advantage of your fundamental natural resources, which is your language and culture, and have your national intelligence be part of your ecosystem.”
For companies, the framework offers a diagnostic tool. Where in the stack does your business create value? Where are you vulnerable to commoditization? The intelligence layer — models — is rapidly commoditizing as open-source models approach frontier performance. Competitive advantage is migrating toward proprietary data, domain-specific fine-tuning, workflow integration, and trust-building with end customers. Companies that mistake access to a model for a sustainable moat will be disappointed.
For individuals, the framework is simultaneously reassuring and challenging. Reassuring, because the buildout is creating enormous demand for skilled physical labor — the electricians, pipefitters, and construction workers of the AI industrial age — that is fundamentally immune to AI displacement. Challenging, because the knowledge workers whose tasks can be automated by layer four and layer five AI will need to evolve: not toward tasks that AI does better, but toward tasks that humans do irreplaceably — judgment, empathy, creativity, ethical reasoning, and the navigation of genuine complexity.
The five-layer cake is, in the end, a framework for understanding not just an industry but an era. The Industrial Revolution had its own stack: coal, steam engines, railways, factories, manufactured goods. Each layer enabled and amplified the others. Each created new economic categories, new geopolitical dynamics, new social disruptions, new opportunities. The people who understood the stack — who saw that railways were not merely faster horses, but a completely new relationship between geography and commerce — were positioned to act on it.
Jensen Huang’s insight is that we are living inside a comparable stack transition right now. Energy, chips, infrastructure, models, applications. Everything else follows from these five layers, and from the choices we make about how to build them.
Sources and further reading:
- Jensen Huang, “AI Is a 5-Layer Cake”, NVIDIA Blog, March 10, 2026
- NVIDIA, “‘Largest Infrastructure Buildout in Human History’: Jensen Huang on AI’s ‘Five-Layer Cake’ at Davos”, January 2026
- International Energy Agency, “Energy and AI”, April 2025
- Gartner, “Gartner Says Electricity Demand for Data Centers to Grow 16% in 2025 and Double by 2030”, November 2025
- Pew Research Center, “What we know about energy use at U.S. data centers amid the AI boom”, October 2025
- Belfer Center for Science and International Affairs, “AI, Data Centers, and the U.S. Electric Grid: A Watershed Moment”, February 2026
- World Economic Forum, “What is open-source AI and how could DeepSeek change the industry?”, February 2025
- MIT CSAIL, “DeepSeek: What You Need to Know”, 2025
- CNBC, “China’s open-source embrace upends conventional wisdom around artificial intelligence”, March 2025
- NVIDIA Investor Relations, “NVIDIA Kicks Off the Next Generation of AI With Rubin”, January 2026
- World Resources Institute, “Powering the US Data Center Boom: The Challenge of Forecasting Electricity Needs”
- Bernard Marr, “Davos 2026: Jensen Huang On The Five Layer AI Cake, The AI Bubble And Key AI Breakthroughs”, 2026
