Deepseek-R1 is an open model with advanced reasoning capacities. Instead of offering direct responses, models of reasoning like Deepseek-R1 make several inference passes on a request, making methods of reflection, consensus and research to generate the best response.
The realization of this sequence of inference passes – using the reason to achieve the best answer – is known as the time testing. Deepseek-R1 is a perfect example of this law of scale, demonstrating why accelerated IT is essential for the requirements of Agentic inference.
As the models are allowed to “think” iteratively through the problem, they create more output tokens and longer generation cycles, so that the quality of the model continues to develop. A significant calculation in testing time is essential to allow both real-time inference and better quality responses from reasoning models like Deepseek-R1, requiring greater inference deployments.
R1 offers main precision for tasks requiring logical inference, reasoning, mathematics, coding and understanding of language while providing high inference efficiency.
To help developers safely experience these capacities and build their own specialized agents, the Deepseek-R1 model of 671 billion billion Build.nvidia.com. The NIM Deepseek-R1 microservice can provide up to 3,872 tokens per second on a single NVIDIA HGX H200 system.
Developers can test and experiment with the application programming interface (API), which should be available from a downloadable nim microservice, part of the NVIDIA AI ENTERPRISE Software platform.
The Nim Deepseek-R1 microservice simplifies deployments with support for standard APIs in the industry. Companies can maximize data security and confidentiality by running the NIM microservice on their favorite accelerated IT infrastructure. Using NVIDIA AI FOUNDERY with Nvidia Nemo Software, companies will also be able to create personalized Nim Deepseek-R1 microservices for specialized AI agents.
Deepseek -R1 – A perfect example of time testing
Deepseek-R1 is a large model of expert mixture (MOE). It incorporates 671 billion impressive parameters – 10x more than many other popular Open -Source llms – taking charge of a large duration of the entry context of 128,000 tokens. The model also uses an extreme number of experts per layer. Each R1 layer has 256 experts, each token sent to eight separate experts in parallel for the evaluation.
The delivery of real -time responses for R1 requires many GPUs with high calculation performance, linked to wide -band and low latency communication to transport quick tokens to all experts for inference. Combined with the software optimizations available in the NVIDIA NIM Microservice, only one server with eight GPU H200 connected using NVLink and NVLink Switch can run the complete model of 671 billion Deepseek-R1 parameters up . This flow is made possible by using the FP8 engine of the FP8 transformer of the Nvidia Hopper architecture with each layer – and the 900 GB / s of the NVLink bandwidth for the communication of MOE experts.
It is essential to obtain each floating point operation per second (flops) of a GPU performance for real -time inference. THE New generation Nvidia Blackwell Architecture will give a time-testing scale on reasoning models like Deepseek-R1 a giant boost with fifth generation tensor nuclei that can provide up to 20 Petaflops of advanced FP4 calculation performance and A NVLink 72-GPU domain specifically optimized for inference.
Start now with the Nim Deepseek-R1 microservice
Developers can discover the Deepseek-R1 Nim Microservicenow available on Build.nvidia.com. Look how it works:
https://www.youtube.com/watch?v=47dwcezg1cg
With NVIDIA NIM, companies can easily deploy Deepseek-R1 and ensure that they obtain the great efficiency necessary for agency AI systems.
See notice Regarding information on software products.