← The MonexusTech

The Arithmetic of Intelligence: AI Training Compute Constraints and the Geopolitics of the H100/B200 Allocation Crisis

The race to train ever-larger AI models has collided with the physical limits of GPU supply, cooling infrastructure, and power grid capacity — constraints that are now functioning as a de facto industrial policy, concentrating frontier AI capability in the hands of a small number of well-capitalised actors while foreclosing meaningful participation by the Global South.

By Moemedi Michael Poncana·Global·9-minute read·18 Apr 2026

In the first quarter of 2026, every major AI laboratory with ambitions to train a frontier model confronted the same constraint: there were not enough NVIDIA Blackwell B200 GPUs available, at any price, to meet demand. The waiting list for B200 clusters through NVIDIA's direct allocation programme stretched eighteen months; spot-market pricing for equivalent compute on hyperscale cloud platforms reached $8 per GPU-hour at peak, compared with $2.50 in mid-2024. Google DeepMind, which had hedged by building out its own TPU v5p infrastructure, reported in February that TPU capacity was oversubscribed internally across Gemini training runs. Anthropic's filing with the SEC noted that compute costs represented the single largest line item in its operating expenditure, exceeding headcount by a ratio of three to one. The model that runs in your browser or answers your queries was trained, at extraordinary cost, on infrastructure so scarce that its allocation has become a matter of national industrial strategy.

The compute constraint is not a temporary supply-chain dislocation of the kind that characterised the 2020–2022 automotive chip shortage. It is, as Kate Crawford observed in Atlas of AI, a structural feature of AI systems: they are not software artefacts floating free of material infrastructure but physical phenomena requiring rare earths, precision manufacturing, water cooling, and electrical power at scales that rival aluminium smelting. The H100 and B200 shortage reveals what was always true but rarely stated plainly: the frontier of AI capability is determined not primarily by algorithmic insight or research talent but by access to physical compute — and that access is being rationed, geopolitically as much as commercially, in ways that will define which actors can participate in frontier AI development for a generation.

The Physical Arithmetic of Model Training

Training a frontier large language model at the scale of GPT-4 or its successors requires on the order of 10²⁴ to 10²⁵ floating-point operations — a number so large that even expressing it in GPU-hours requires scientific notation. At B200 utilisation rates of 65% MFU (model flop utilisation, the industry benchmark for training efficiency), a single frontier training run requires approximately 10,000 to 30,000 B200 GPUs running continuously for six to twelve weeks. Each B200 GPU draws roughly 1,000 watts under training load; a 20,000-GPU cluster therefore requires 20 megawatts of sustained power, the equivalent of a small city's residential load, plus comparable cooling overhead.

This arithmetic produces a structural exclusion that operates independently of talent, data access, or algorithm quality. A university research group, a national AI institute in a middle-income country, or a startup without access to hundreds of millions in compute credit cannot participate in frontier model training. They can fine-tune, they can distil, they can deploy — but the frontier training run that sets the capability ceiling is accessible only to actors who can simultaneously command multibillion-dollar capital, navigate NVIDIA's allocation queue, secure grid-scale power purchase agreements, and build or lease purpose-built liquid-cooled data centre infrastructure. The claim that markets are producing this infrastructure through private innovation obscures the degree to which public power grids, public research universities, and publicly funded chip architecture research (DARPA's long history with GPU-adjacent compute) have underwritten the entire edifice.

The B200 shortage has, moreover, been deliberately shaped by the US export control regime. NVIDIA's H100 and B200 are subject to export licensing requirements that effectively prohibit their sale to Chinese entities and to a long list of countries the Commerce Department designates as posing diversion risk — a category that has expanded under successive rule tightening to include not only Russia and China but entities in the UAE, Saudi Arabia, and India, depending on the end-user certification. The effect is a two-tier global compute market: a licensed tier, dominated by American and European hyperscalers and their closest allies, and an unlicensed tier where Chinese labs, academic institutions in the Global South, and sanctioned-country actors compete for older-generation hardware, domestically produced substitutes, and whatever smuggled or misrouted B200s evade customs enforcement.

Who Gets the Compute: Allocation as Industrial Policy

NVIDIA's GPU allocation process has never been transparent, and under the current supply constraint it has become a site of active geopolitical negotiation. Reports from Q1 2026 indicate that the US government's AI Safety Institute has been in discussions with NVIDIA about establishing a priority allocation track for safety-oriented research — a development that, whatever its stated purpose, would further concentrate leading-edge compute access in Washington's preferred ecosystem. Simultaneously, the White House's AI Action Plan, published in February 2026, established a framework for "compute access partnerships" with allied nations, effectively creating a geopolitical tier system in which Japan, the UK, South Korea, and Australia receive preferential access to American compute infrastructure in exchange for regulatory alignment with US AI governance norms.

The allocation architecture is opaque by design. The criteria by which NVIDIA, AWS, Google Cloud, and Azure allocate scarce compute capacity are not public. They reflect commercial relationships, national security considerations, and political alignments that operate without democratic oversight or transparency. A country that wishes to develop sovereign AI capability — to train models on its own data, in its own language, reflecting its own epistemic priorities — must either accept the terms of American compute provision or invest, as China has done, in the expensive and technically demanding project of domestic GPU development. The Huawei Ascend 910B and its successors represent the most advanced Chinese alternative; independent benchmarks suggest they reach approximately 60–70% of H100 performance at comparable power draw — sufficient for serious inference and fine-tuning, but still constrained for frontier training runs.

Technological systems encode and reproduce social hierarchies even when presented as neutral — and the compute allocation question is a clear case. The countries best positioned to access frontier compute are those with the deepest historical integration into the American technology ecosystem: Western Europe, Japan, South Korea, Israel, Australia. The countries most constrained are those in the Global South, where AI development ambitions — India's national AI mission, Kenya's emerging AI policy, Brazil's sectoral AI programme — run directly into the hardware bottleneck that the export control regime has reinforced and that the allocation architecture perpetuates.

The Power Grid as the Real Constraint

Beyond the GPU shortage lies a constraint that is harder to address through policy intervention: electricity. The International Energy Agency's 2026 electricity market report projects that AI data centre load growth will add 150 to 300 terawatt-hours of annual global power demand by 2028 — equivalent to the entire electricity consumption of France. This demand is heavily concentrated: Virginia's data centre corridor, which hosts approximately 70% of US hyperscale compute, is consuming power at a rate that has forced Dominion Energy to accelerate natural gas peaking plant construction and delay decommissioning of coal units, producing a perverse outcome in which the AI capability race is extending the life of fossil fuel infrastructure in the most compute-dense region of the world.

The power constraint functions as a second-order allocation mechanism, one that reinforces the first-order GPU allocation dynamic. Regions with abundant renewable power — the Nordic countries, Canada's Quebec province, parts of sub-Saharan Africa with hydropower resources — have the physical infrastructure to host AI training compute but lack the capital, the regulatory frameworks for foreign direct investment, and the geopolitical standing to attract hyperscale data centre construction on terms that would benefit domestic economies rather than merely hosting American or Chinese infrastructure. The compute constraint, in this reading, is not a temporary market dislocation. It is a structural feature of the emerging AI political economy, one that concentrates capability where capital and geopolitical alignment already concentrate, and forecloses it where they do not.

Stakes: The Capability Cliff and What Follows

The compute constraint matters beyond the immediate race to train the next frontier model because it is shaping the long-term distribution of AI capability in ways that will be difficult to reverse. Techniques such as mixture-of-experts architectures, inference-time compute scaling (the "thinking" mode implemented by models including Anthropic's Claude and OpenAI's o-series), and distillation have somewhat democratised deployment — allowing smaller models to approximate frontier capability for specific tasks. But the frontier itself continues to advance, and the gap between what a well-resourced lab can achieve with 30,000 B200s and what a national AI institute can achieve with 500 H100s is not narrowing. It is, by most technical assessments, widening.

The geopolitical consequence is a bifurcating world: on one side, a small number of actors — OpenAI, Google DeepMind, Anthropic, Meta, xAI, Mistral, and their Chinese counterparts at Baidu, Zhipu, and DeepSeek — who can train at the frontier and set the capability agenda; on the other, the vast majority of the world's population, whose AI experience will be mediated through APIs, fine-tuned models, and whatever compute access geopolitical alignment permits. This is not merely a commercial disparity. It is a question of who shapes the values, the languages, the epistemic frameworks, and the political assumptions encoded in the AI systems that will increasingly mediate access to information, credit, healthcare, education, and employment across the world. The compute constraint is, ultimately, a power constraint — in both the electrical and political senses of the word.

The Monexus tech desk situates the GPU shortage within political economy rather than treating it as a neutral supply-chain story — because who controls compute controls the terms of AI development for the next decade.