- AWS has launched new AI chips that compete with Nvidia’s GPUs.
- AWS says its goal is to offer more choices to customers, not to dethrone Nvidia in the AI chip market.
- Gadi Hutt, senior director at AWS, also talked about partnerships with Intel, Anthropic and AMD.
Amazon Web Services launched an upgraded line of AI chips this week, putting the company squarely in competition with Nvidia.
Except AWS doesn’t see it that way.
AWS’s new AI chips are not intended for use after Nvidia’s lunch, said Gadi Hutt, senior director of customer and product engineering at Annapurna Labs, the company’s chip design subsidiary. The goal is to give customers a lower-cost option because the market is large enough to accommodate multiple vendors, Hutt told Business Insider in an interview at AWS’s re:Invent conference.
“It’s not about overthrowing Nvidia,” Hutt said, adding, “It’s really about giving customers choice.”
AWS has spent tens of billions of dollars on generative AI. This week, the company unveiled its most advanced AI chip, called Trainium 2, which can cost about 40% less than Nvidia’s GPUs, as well as a new supercomputer cluster using those chips, called Project Rainier. Earlier versions of AWS’s AI chips had mixed results.
Hutt insists this is not a competition but a joint effort to increase the overall size of the market. The customer profiles and AI workloads they target are also different. He added that Nvidia GPUs would remain dominant for the foreseeable future.
In the interview, Hutt discussed AWS’ partnership with Anthropic, which is expected to be Project Rainer’s first customer. The two companies have worked closely together over the past year, and Amazon recently invested an additional $4 billion in the AI startup.
He also shared his thoughts on AWS’ partnership with Intel, whose CEO Pat Gelsinger just retired. He said AWS would continue to work with the struggling chip giant as customer demand for Intel server chips remained high.
Last year, AWS announced plans to sell AMD’s new AI chips. But Hutt said these chips were still not available on AWS because customers had not shown high demand.
These questions and answers have been edited for clarity and length.
There were many securities saying Amazon wants to get Nvidia with its new AI chips. Can you talk about it?
I usually look at these headlines and laugh a little because, in reality, it’s not about toppling Nvidia. Nvidia is a very important partner for us. It’s really about giving customers choices.
We have a lot of work ahead of us to ensure that we continually give more customers the ability to use these chips. And Nvidia isn’t going anywhere. They have a good solution and a solid roadmap. We just announced P6 instances (AWS servers powered by Nvidia’s latest Blackwell GPUs), so there is also continued investment in the Nvidia product line. It’s really about giving customers options. Nothing more.
Nvidia is a great AWS provider and our customers love Nvidia. I would not discount Nvidia in any way.
So you want to see Nvidia’s use case grow on AWS?
If customers think this is the way to go, they will do it. Of course, if it’s good for customers, it’s good for us.
The market is very large, so there is room for several sellers here. We are not forcing anyone to use these chips, but we are working very hard to ensure that our core principles of high performance and low cost will come to fruition for the benefit of our customers.
Does this mean that AWS is able to take second place?
It’s not a competition. There is no machine learning awards ceremony every year.
In the case of a client like Anthropic, there is very clear scientific evidence that larger compute infrastructure allows you to build larger models with more data. And if you do that, you get greater accuracy and more performance.
Our ability to scale capacity to hundreds of thousands of Trainium 2 chips gives them the ability to innovate on something they couldn’t have done before. Their productivity is multiplied by 5.
Is being #1 important?
The market is big enough. No. 2 is a very good position.
I’m not saying I’m No. 2 or No. 1, for that matter. But it’s really not something I think about. We’re so early in our journey in machine learning in general, industry in general, and also chips in particular, that we’re just serving customers like Anthropic, Apple and all the others.
We don’t even do competitive analysis with Nvidia. I don’t benchmark against Nvidia. I don’t need it.
For example, there is MLPerfa benchmark for industry performance. Companies that participate in MLPerf have performance engineers who work solely to improve MLPerf numbers.
It’s completely a distraction for us. We don’t participate because we don’t want to waste time on a benchmark that isn’t customer focused.
At first glance, it seems that helping businesses grow on AWS isn’t always beneficial to AWS’s own products because you’re competing against them.
We’re the same company that’s the best place Netflix works on, and we also have Prime Video. It’s part of our culture.
I will say that there are a lot of customers that are still using GPUs. Many customers love GPUs and have no plans to upgrade to Trainium anytime soon. And that’s great, because, again, we give them a choice and they decide what they want to do.
Do you think these AI tools will become more and more commonplace in the future?
I really hope so.
When we started this in 2016, the problem was that there was no operating system for machine learning. So we really had to invent all the tools around these chips to make them work as seamlessly as possible for our customers.
If machine learning becomes mainstream on both the software and hardware side, that’s a good thing for everyone. This means that it is easier to use these solutions. But executing machine learning in a meaningful way remains an art.
What are the different types of workloads customers might want to run on GPUs rather than Trainium?
GPUs are more of a general-purpose machine learning processor. All the researchers and data scientists in the world know how to use Nvidia quite well. If you invent something new, if you do it on GPU, then things will work.
If you invent something new on specialized chips, you’ll either need to make sure the compiler technology understands what you just built, or you’ll need to create your own compute core for that workload. We’re primarily focused on use cases where our customers tell us, “Hey, this is what we need.” Usually the clients we receive are those who see increasing costs as a problem and try to look for alternatives.
So the most advanced workloads are generally reserved for Nvidia chips?
Generally. If data scientists need to run experiments constantly, they will likely do so on a GPU cluster. When they know what they want to do, that’s when they have more options. This is where Trainium really shines, as it delivers high performance at a lower cost.
AWS CEO Matt Garman has previously said that the vast majority of workloads will continue to be on Nvidia.
That makes sense. We value high-spending customers and try to see how they can better control costs. When Matt talks about the majority of workloads, that means medical imaging, speech recognition, weather forecasting and all kinds of workloads that we’re not really focused on right now because we have big customers who ask us to do greater things. This statement is therefore 100% correct.
In a nutshell, we want to continue to be the best place for GPUs and, of course, Trainium when customers need it.
What has Anthropic done to help AWS in AI?
They have very strong opinions about what they need, and they come back to us and say, “Hey, can we add feature A to your future chip?” It’s a dialogue. Some ideas they came up with weren’t even feasible to implement in a piece of silicon. We actually implemented some ideas, and for others we came back with a better solution.
Because they are such experts at building foundation models, it really helps us build chips that are really good at what they do.
We have just announced the Rainier project together. This is someone who wants to use a lot of these chips as quickly as possible. It’s not an idea, we’re actually building it.
Can you talk about Intel? AWS’s Graviton chips are replacing many Intel chips in AWS data centers.
I will correct you here. Graviton does not replace x86. It’s not like we’re removing x86 and putting Graviton in place. But again, due to customer demand, over 50% of our recent CPU landings were Graviton.
This means that customer demand for Graviton is increasing. But we still sell a lot of x86 cores to our customers, and we think we’re the best place to do it. We don’t compete with these companies, but we treat them as good suppliers and we have a lot of business to do together.
How important is Intel to the future?
They will definitely continue to be a great partner for AWS. There are many use cases that work very well on Intel cores. We are still in the process of rolling them out. There is no intention of stopping. It’s really about following customer demand.
Is AWS still considering selling AMD’s AI chips?
AMD is a great partner for AWS. We sell many AMD processors to our customers as instances.
The machine learning product line is still under review. If customers make it clear they need it, there’s no reason not to deploy it.
And you don’t see that yet for AMD’s AI chips?
Not yet.
How supportive are Amazon CEO Andy Jassy and Garman of the AI chip industry?
They are very supportive. We meet them regularly. The company’s management is working to ensure that customers who need ML solutions get them.
There are also numerous collaborations within the company with science and service teams building solutions on these chips. Other Amazon teams, like Rufus, the AI assistant available to all Amazon customers, run entirely on the Inferentia and Trainium chips.
Do you work at Amazon? Do you have any advice?
Contact journalist Eugene Kim via encrypted messaging apps Signal or Telegram (+1-650-942-3061) or by email (ekim@businessinsider.com). Contact us using a non-professional device. Discover this one from Business Insider source guide for more tips on sharing information securely.