The real price of the development of new Deepseek models remains unknown, because a figure cited in a single research document may not capture a full table of its costs. “I don’t think it’s $ 6 million, but even if it’s $ 60 million, it changes the game,” said Umesh Padval, Managing Director of Thomvest Ventures, a company that has invested in Cohere and other IA companies. “This will put pressure on the profitability of companies that focus on consumers’ AI.”
Shortly after, Deepseek revealed the details of his latest model, Ghodsi of Databricks says that customers have started to wonder if they could use it as well as the underlying techniques of Deepseek to reduce costs in their own organizations. He adds that an approach used by Deepseek engineers, known as the distillation, which implies using the output of a large language model to form another model, is relatively cheap and simple.
Padval says that the existence of models like Deepseek will ultimately benefit companies that seek to spend less for AI, but he says that many companies can have reservations about a Chinese model for tasks sensitive. Until now, at least an eminent AI company, perplexity, has publicly announced He uses Deepseek’s R1 model, but he says he is hosted “completely independent of China”.
Amjad Massad, the CEO of folding, a startup that provides AI coding tools, told Wired that he thought that the latest Deepseek models are impressive. Although he always finds that the Sonnet of Anthropic is better in many computer engineering tasks, he found that R1 is particularly good to transform text controls into code which can be executed on a computer. “We explore it especially for the reasoning of agents,” he adds.
The last two deepseek offers – Deepseek R1 and Deepseek R1 -Zero – are capable of the same type of simulated reasoning as the most advanced systems of Openai and Google. They all work by dividing the problems into constituent parts in order to approach them more effectively, a process which requires a considerable amount of additional training to ensure that AI reaches the right answer reliably.
A paper Published by Deepseek Researchers last week describes the approach that the company used to create its R1 models, which it claims to carry out on certain references as well as the revolutionary reasoning model of Openai called O1. The tactics used in depth includes a more automated method to learn how to properly solve problems as well as larger model transfer strategy to smaller models.
One of the hottest subjects of speculation on Deepseek is the equipment he could have used. The question is particularly remarkable because the US government has introduced a series of Export controls and other commercial restrictions in recent years aimed at limiting China’s ability to acquire and manufacture advanced chips that are necessary to build an advanced AI.
In a search document From August 2024, Deepseek indicated that he had access to a group of 10,000 NVIDIA A100 chips, which were placed under us restrictions announced in October 2022. In a separate paper From June of the same year, Deepseek said that an earlier model he created called Deepseek-V2 was developed using clusters of Nvidia H800 computer chips, a less capable component developed by Nvidia for comply with American export controls.
A source in an AI company that forms large AI models, which has asked to be anonymous to protect their professional relationships, believes that Deepseek probably used around 50,000 Nvidia chips to build its technology.
Nvidia refused to comment directly on which of her Deepseek chips could have relying. “Deepseek is an excellent AI progression,” said a spokesperson for Nvidia in a press release, adding that the startup’s reasoning approach “requires a large number of NVIDIA GPU and high performance networking”.
However, the Deepseek models have been built, they seem to show that a less closed approach to develop AI is growing. In December, Clem Delangue, CEO of Huggingface, a platform that hosts models of artificial intelligence, predicted that A Chinese company would take the lead in AI due to the speed of innovation in open source models, which China has largely adopted. “It took place faster than I thought,” he says.