Join our daily and weekly newsletters for the latest updates and the exclusive content on AI coverage. Learn more
Brain systemsan AI hardware startup that was not difficult Nvidia’s domination On the artificial intelligence market, an important expansion of its data center footprint on Tuesday and two main corporate partnerships that position the company to become the main high -speed inference service provider.
The company will add six new AI data centers across North America and Europe, increasing its capacity for inference twenty times to more than 40 million tokens per second. Expansion includes facilities in Dallas, Minneapolis, Oklahoma City, Montreal, New York and France, with 85% of the total capacity in the United States.
“This year, our goal is to really meet all the demand and all the new request which, in our view, will be posted due to new models like Llama 4 and new Deepseek models,” said James Wang, director of product marketing at Cerebras, in an interview with Venturebeat. “This is our huge growth initiative this year to meet the almost unlimited demand that we see in all inference tokens.”
The expansion of the data center represents the ambitious betting of the company that the IA inference market at high speed – the process where the models of formed AI generate results for the real world applications – will develop considerably as companies are looking for faster alternatives to NVIDIA GPU -based solutions.

Strategic partnerships which bring a high -speed AI to developers and financial analysts
In addition to the expansion of infrastructure, Cerebras has announced partnerships with FaceThe popular AI developer platform, and AlphasenseA market intelligence platform widely used in the financial services industry.
THE Face Integration will allow its five million developers to access Cerebras inference With one click, without having to register with cerebras separately. This represents a major distribution channel for brains, in particular for developers working with open source models such as LAMA 3.3 70B.
“The embraced face is in a way the GitHub of AI and the center of any development of Open Source,” said Wang. “The integration is super kind and native. You just appear in their list of inference providers. You just check the box and you can then use deceptions immediately. »»
The AlphaSense partnership represents an important victory for the company’s customers, the financial intelligence platform from what Wang described as a “world supplier of the three best models of AI at closed source” in Cerebras. The company, which serves approximately 85% of fortune companies 100, uses deceptions to accelerate its research capacities powered by AI for market intelligence.
“This is a huge customer victory and a very important contract for us,” said Wang. “We accelerate them by 10 times, which took five seconds or more, we are essentially instant on the deceptions.”


How Cerebras wins the race at the IA inference speed while the reasoning models slow down
Cerebras has positioned himself as a specialist in high -speed inference, claiming his Plate-scale engine processor (WSE-3) Can run AI 10 to 70 times faster models than GPU -based solutions. This speed advantage has become more and more precious as AI models evolve towards more complex reasoning capacities.
“If you listen to Jensen’s comments, reasoning is the next great thing, even according to Nvidia,” said Wang, referring to the CEO of Nvidia, Jensen Huang. “But what he does not tell you is that reasoning makes the very slower because the model must think and generate a pile of internal monologue before it gives you the final answer.”
This slowdown creates an opportunity for Cerebras, whose specialized equipment is designed to accelerate these more complex workloads. The company has already obtained high -level customers, in particular Perplexity Ai And Mistral aiwho use deceptions to feed their research products and AI assistants respectively.
“We help perplexity to become the fastest IA search engine in the world. It’s just not possible otherwise, ”said Wang. “We help Mistral achieve the same feat. Now they have a reason for people to subscribe to the pro cat, while before, your model is probably not the same peak level as GPT-4. »»


The convincing economy behind the Cerebras challenge in Openai and Nvidia
Cerebras bets that the combination of speed and cost will make its attractive inference services even for companies already using leading models like GPT-4.
Wang stressed that meta- LAMA 3.3 70BAn open source model that Cerebras has optimized for its equipment, now marks in the same way on intelligence tests as the OPENAI GPT-4, while costing much less to execute.
“Anyone who uses the GPT-4 today can simply go to Llama 3.3 70B to replace an appointment,” he said. “The price of GPT-4 is (approximately) $ 4.40 in mixed terms. And Llama 3.3 is like 60 cents. We have about 60 cents, right? You therefore reduce the cost of almost an order of magnitude. And if you use deceptions, you increase the speed of another size order. »»
Inside the data centers to the Tornado Tornado Constructed for the resilience of the AI
The company makes substantial investments in resilient infrastructure as part of its expansion. Its installation of Oklahoma City, which must be published online in June 2025, is designed to withstand extreme weather events.
“Oklahoma, as you know, is a kind of tornado area. Thus, this data center is really evaluated and designed to be entirely resistant to tornadoes and seismic activity, “said Wang. “It will resist the strongest tornado ever recorded registered. If this thing is happening, this thing will continue to send lama tokens to the developers. »»
The installation of Oklahoma City, operated in partnership with Scale Datacenter, will house more than 300 forces CS-3 systems And includes triple redundancy power plants and personalized water cooling solutions specially designed for systems on the scale of the Cerebras brochure.


From skepticism to market leadership: how deceptions prove its value
The expansion and partnerships announced today represent an important step for the deceptions, which have worked to prove itself in a material market of AI dominated by Nvidia.
“I think what was a reasonable skepticism with regard to customer adoption, maybe when we launched for the first time, I think it is now completely put in bed, just given the diversity of the logos we have,” said Wang.
The company targets three specific areas where rapid inference offers the most value: vocal and video treatment in real time, reasoning models and coding applications.
“Coding is one of these types of intermediate reasoning and regular questions / answers that may take 30 seconds at one minute to generate the whole code,” said Wang. “The speed is directly proportional to the productivity of developers. So have speed there.
By focusing on high -speed inference rather than competition on all IA workloads, Cerebras has found a niche where it can claim leadership on even the biggest cloud suppliers.
“No one usually competes with AWS and Azure on their scale. We obviously do not reach the full scale like them, but to be able to reproduce a key segment … On the high -speed inference forehead, we will have more capacity than them, ”said Wang.
Why expansion centered on cerebras is important for the sovereignty of AI and future workloads
Expansion occurs at a time when the AI industry is increasingly focusing on inference capacities, while businesses go from generative AI experimentation to deploy it in production applications where speed and profitability are essential.
With 85% of its inference capacity located in the United States, Cerebras is also positioned as a key player in the progress of domestic AI infrastructure at a time when technological sovereignty has become a national priority.
“Cerebras turalizes the future of American AI managers with unmatched performance, scale and efficiency – these new global data centers will serve as a backbone for the next AI innovation wave,” said Dhiraj Mallick, COO of Cerebras Systems, in the announcement of society.
Like models of reasoning like Deepseek R1 And O3 of Openai Become more widespread, the demand for faster inference solutions should increase. These models, which can take a few minutes to generate answers on traditional equipment, work almost instantly on Cerebras systems, depending on the company.
For technical decision -makers evaluating AI infrastructure options, Cerebras expansion represents a new significant alternative to GPU -based solutions, in particular for applications where response time is essential for user experience.
It remains to be seen whether the company can really question Nvidia’s domination on the wider AI market, but its emphasis on high -speed inference and investment in substantial infrastructure demonstrates a clear strategy to carve a precious segment of the rapid AI landscape.