Comment As Jensen Huang likes to say, Moore’s law is dead – and at NVIDIA GTC this monthThe chief executive of the GPU-Slinger made it possible to slip in how deep the law of calculation is deep.
Standing on stage, Huang revealed not only the new generation of the flea designer Blackwell Ultra processors, but a surprising amount of details on its next two generations of accelerated computer platforms, including a 600 kW rack scale system Packaging 576 GPUS. We also learned that a next GPU family, which will arrive in 2028, will be appointed after Richard Feynman. You are surely joking!
It is not so unusual for flea manufacturers to tease their roadmaps from time to time, but we generally do not get as much information at a time. And it’s because Nvidia is stuck. It is encountered not only a roadblock, but several. Worse, in addition to throwing money on the problem, they are all largely out of the control of Nvidia.
These challenges will not have a big surprise for those who pay attention. Distributed IT has always been a Whack-A-Mole game, and AI may well be the ultimate taupe hunt.
Everything is from top to bottom from here
The first and most obvious of these challenges revolves around the calculation of the scale.
Progress of processes have slowed down in recent years. Although there are still buttons to turn, they become exponentially more difficult to move.
Faced with these limitations, Nvidia’s strategy is simple: increasing the amount of silicon in each calculation node as much as possible. Today, the densest systems in Nvidia, or really racks, marched 72 GPU in a single calculation area using its high -speed NVLink fabric of 1.8 TB / s. Eight or more of these racks are then sewn together using infiniband or ethernet to reach the compute and the desired memory capacity.
In GTC, Nvidia revealed His intention to move on to 144 and finally 576 GPU per rack. However, scaling is not limited to racks; This also happens on the paquet of fleas.
This has become clear with the launch Nvidia Blackwell accelerators a year ago. The chips boasted by 5 times the lifting of performance on the hopper, which sounded very well until you realized that it needed twice the number of matrices, a new type of 4 -bit data and 500 watts more power to do so.
The reality was, standardized at FP16, the most specific Blackwell matrices in Nvidia are only 1.25 times faster than a GH100 at 1,250 dense teraflops against 989 – it was just that there were two.


By 2027, the CEO of Nvidia, Jensen Huang, expects the racks to go to 600 kW with the beginnings of the Ultra NVL576 Rubin – Click to enlarge
We do not yet know what Process Tech Nvidia plans to use for its new generation tokens, but what we know is that Rubin Ultra will continue this trend, going from two matrices limited to four. Even with the increase of around 20% of efficiency, Huang plans to get out of TSMC 2NM, it will always be a warm set.
It is not only calculation either; It is also memory. The eyes eagle among you may have noticed a rather large jump in capacity and bandwidth between Rubin Ultra Rubin – 288 GB per package against 1 TB. About half of this comes from faster and superior memory modules, but the other half comes from a doubling of the quantity of silicon dedicated to the memory of eight modules on Blackwell and Rubin at Rubin Ultra.
A higher capacity means that NVIDIA can pile up more model parameters, around 2 Billions at FP4, in a single package or 500 billion per “GPU” because they now have individual matrices instead of the catches. HBM4E also seeks to effectively double memory bandwidth on HBM3E. The bandwidth should go from around 4 TB / s by blackwell matrix today at around 8 TB / s on Rubin Ultra.
Unfortunately, unless a major breakthrough in process technology, it is likely that future NVIDIA GPU packages could wrap even more silicon.
The good news is that progress in processes is not the only way to extend calculation or memory. In general, the transition from the precision of 16 bits to 8 bits actually doubles the flow while half reducing the memory requirements of a given model by half. The problem is that Nvidia is short of bits to fall for juice from its performance gains. From Hopper to Blackwell, Nvidia dropped four bits, doubled silicon and claimed a 5x floating point gain.
But below four -bit precision, LLM inference becomes quite rough, with perplexity scores to climb quickly. That said, there are interesting research carried out around super -precision quantification, as low as 1.58 bits while maintaining precision.
Not that a reduced precision is not the only way to pick up the flops. You can also devote fewer matrix areas to higher precision data whose AI workloads do not need.
We saw it with Blackwell Ultra. Ian Buck, Vice-President of the Accelerated Computer Unit of Nvidia, told us in an interview that they had in fact extinguished the basic Tensor performance of the double precision (FP64) of the chip in exchange for 50% of 4 bit flops.
Whether it is a sign that FP64 is going out in Nvidia remains to be seen, but if you really worry about the double precision groan, the GPUs of AMD and the APU should probably be at the top of your list anyway.
In any case, the Nvidia path is clear: its calculation platforms will only become larger, densest, warmer and more hungry for power from now. As a Huang, deprived of calories, said during his press Q & R last week, the practical limit for a rack is however a lot of power that you can feed it.
“A data center is now 250 megawatts. It is a kind of limit per rack. I think the rest is only details,” said Huang. “If you said that a data center is a gigawatt, and I would say that a gigawatt by Rack seems to be a good limit.”
No exhaust of the power problem
Naturally, the 600 kW racks pose a Helliluva headache for data centers operators.
To be clear and scary from ultra-dense computing megawatts is not a new problem. The people of Cry, Eviden and Lenovo understood this for years. What has changed is that we are not talking about a handful of boutique calculation clusters per year. We are talking about dozens of clusters, some of whom are so large As for dethroning the most powerful supers in the top500, if attaching it 200,000 Hopper GPU with Linpack would make money.
At these scales, highly specialized and low volume thermal management and electricity systems simply do not cut it. Unfortunately, data suppliers – You know people who sell the not so sexy bits and bobs you need to operate these dollars NVL72 racks – are only catching up with demand.
We suspect that this is why so many Blackwell deployments announced so far have been for the HGX B200 for air and not for the NVL72 Huang continues Hysing. These eight HGX GPU systems can be deployed in many existing H100 environments. Nvidia has been racking from 30 to 40 kW for years, so jumping to 60 kW is simply not a section, and it is down to two or three servers per rack is still an option.
This is where these “AI factories” Huang continue to be fired about the entry into play
The NVL72 is a rackscale design strongly inspired by hyperscalers with DC bus bars, power sleds and front networks. And 120 kW of calculation cooled in cash, the deployment of more than some of these things in existing installations is problematic in a hurry. And that will only become even more difficult than once the 600 kW monster racks from Nvidia made their debut at the end of 2027.
This is where these “AI factories“Huang continues to have a nod about getting into play – data centers built for purposes designed in collaboration with partners like Schneider Electric to deal with the power and thermal requests of the AI.
And surprise, surprised, a week after having detailed his GPU roadmap for the next three years, Schneider announcement An expansion of $ 700 million in the United States to stimulate the production of all power and cooling kits necessary to support them.
Of course, having the infrastructure necessary to feed and cool these ultra dense systems is not the only problem. The same goes for power in the data center in the first place, and once again, this is largely out of NVIDIA’s control.
Whenever Meta, Oracle, Microsoft or anyone announces another IA Bit barn, a juicy power purchase agreement generally follows. Mega DC of Meta was born in the Bayou was announcement In addition to a 2.2 GW gas generator factory – both for these commitments of sustainability and carbon neutrality.
And as much as we want to see nuclear resort, it is difficult to take small modular reactors seriously when even the most pink predictions put deployments somewhere in the 2030s.
follow the leader
To be clear, none of these roadblocks are unique in Nvidia. AMD, Intel and all the other designers of cloud suppliers and chips in the running for a tranche of NVIDIA market share are required to meet these same challenges for a long time. Nvidia happens to be one of the first to rely against them.
Although it certainly has its drawbacks, it also puts Nvidia in a somewhat unique position to shape the direction of the power of future data centers and thermal conceptions.
As we mentioned earlier, the reason for which Huang was willing to reveal his next three generations of GPU technologies and to tease his fourth is therefore that his infrastructure partners are ready to support them when they arrive.
“The reason for which I have communicated to the world what the next three and four -year roadmap from Nvidia is now that everyone can plan,” said Huang.
On the other hand, these efforts are also used to open the way for manufacturers of competing fleas. If Nvidia designs a 120 kW, or now 600 kW, rack and roommate suppliers and cloud operators are ready to support this, AMD or Intel are now clear to pack as much calculation in their own rack platforms without having to worry about where customers will put them. ®