NVIDIA & AMD has just submitted the latest MLPERF inference performance references from their latest GPUs, including Blackwell B200 and Instinct MI325X.
Nvidia Blackwell B200, AMD Instinct MI325X and more added to the latest MLPERF inference references, Green Team Miles before the raw performance competition
The MLPERF inference V5.0 performance references were released and the GPU giants submitted their latest results fueled by their last chips. As we have seen in the past, it is not only a question of the power of GPU Brute, but the optimizations of the software and the management of the new ia ecosystems and the workloads also have a lot.
NVIDIA Blackwell sets new records
The GB200 NVL72 system – Connector 72 GPU NVIDIA Blackwell to act as a single massive GPU – delivered up to 30x higher on the LLAMA 3.1 405B reference on NVIDIA H200 NVL8 Submission. This feat was achieved thanks to more than triple performance by GPU and a larger 9x NVIDIA NVLink Interconnection domain.
While many companies manage MLPERF benchmarks on their equipment to assess performance, only NVIDIA and its partners have submitted and published results on the reference LLAMA 3.1 405B.
Production inference deployments often have latency constraints on two key measurements. The first is the time of first token (TTFT), or how long it takes for a user to start to see a response to a request given to a Great language model. The second is the time out of time by output (TPOT), or the speed with which tokens are delivered to the user.
The new interactive reference LLAMA 2 70B has a shorter TPOT 5X and a 4.4x lower TTFT – modeling a more reactive user experience. During this test, NVIDIA’s submission using an NVIDIA DGX B200 system with eight Blackwell GPU tripled on the Eight GPU NVIDIA H200, fixing a high bar for this more difficult version of the LLAMA 2 70B reference.
The combination of Blackwell architecture and its optimized software stack offers new levels of performance inference, opening the way to AI factories to provide higher intelligence rates, an increase in speed and faster chip levels.
That said, we start by talking about the green giant, which has once again taken the lead and has marked impressive records with its last Blackwell GPUs such as the B200. The GB200 NVL72 rack with a total of 72 B200 chips takes the lead, offering a higher 30x performance rate at LLAMA 3.1 405B benchmarks compared to the latest generation NVIDIA H200. NVIDIA also saw a triple in the LLAMA 70B reference when comparing a B200 system of 8 GPU against an 8 GPU H200 system.
0
16667
33334
50001
66668
83335
100002
Blackwell B200 180 GB (X8 @ 1000W)
Hopper H200 141 GB (X8 @ 700W)
Instinct MI325X 256 GB (X8 @ 1000W)
Hopper H100 80 GB (X8 @ 700W)
AMD also submits its new ACCTINDER MI325X 256 GB instinct, which can be seen present in an X8 configuration.
There are also benchmarks for the Hopper H200 series, which has experienced continuous optimizations. Compared to last year, the inference performance was increased by 50%, which is a substantial gain for companies that continue to rely on platforms.