Amazon is investing more into its own AI chips as demand for faster and cheaper computing grows. Its Inferentia chips were made with this in mind. These chips power Amazon EC2 Inf1 and Inf2 machines, which are built to run deep learning and generative AI at scale.
The company said that the first Inferentia generation can reach up to 2.3x higher throughput and 70% lower cost per inference than similar EC2 machines. Customers such as Finch AI, Sprinklr, Money Forward and Amazon Alexa have already used these machines to cut running costs.
Inferentia2, the newer version, lifts performance once again. Amazon said it delivers up to 4x higher throughput and up to 10x lower latency than the earlier model. It also comes with stronger memory support, reaching 32GB of HBM per chip, which is four times the previous amount. This helps customers run far larger models, from language systems to image generators.
How Do Inferentia Chips Support Developers?
Amazon connects the hardware to its Neuron software kit so developers can run models without rebuilding everything from scratch. Neuron works with PyTorch and TensorFlow, which allows teams to keep their usual workflows.
The tool automatically casts high precision FP32 models into lower precision formats like FP16, BF16, INT8 and the newer FP8 option in Inferentia2. Amazon said this shortens the wait to get a model into production because teams do not need to retrain every model manually. The chip also supports dynamic input sizes, custom C++ operators and stochastic rounding to lift accuracy during heavy workloads.
This support widens the list of tasks that can run on the chips. Amazon said customers use Inferentia for language work, speech recognition, image generation, fraud detection and many other real world uses.
More from Artificial Intelligence
- Agentic Systems: Software That Thinks on Its Feet
- The Evolution Of AI Coding: Why AI Code Review Is The Next Wave
- AI Adoption Surges While Governance Lags: Report Warns of Growing Shadow Identity Risk
- Can AI And Tech Combat The Loneliness Issue In The UK?
- ChatGPT Turns 3: Praise, Problems And A Powerful New Tool
- AI Drives $80B Across European Economies, With Sweden Leading The Way
- Tovie AI Launches Agent Platform To Bring Scalable AI Automation To The Enterprise
- Solude Partners With Nuklai To Bring AI-Driven, Dynamic Customer Communication to Businesses
How Does Trainium3 Connect To Amazon’s Chip Plans?
Alongside Inferentia, Amazon is raising the ceiling on training power through its Trainium3 UltraServers. At its re:Invent event, Amazon announced that the new system packs up to 144 Trainium3 chips built on a 3nm process. The company said this delivers up to 4.4x more compute performance and 4x better energy performance than Trainium2 UltraServers.
Customers such as Anthropic, Karakuri, Metagenomi, NetoAI, Ricoh and Splash Music are using Trainium chips to trim training and inference costs by up to 50%. Decart is running real time generative video four times faster and at half the price of GPU machines. Amazon Bedrock is already serving live workloads on Trainium3.
Trainium3 also delivers 3x higher throughput per chip and produces 4x quicker response times. Amazon said these gains cut training times from months to weeks, which can bring new AI products to customers far sooner.
What Else Came From Amazon’s AI Announcements?
Amazon framed its AI chip progress as part of a larger push during re:Invent. The company set out new Nova models and opened Nova Forge, which gives customers access to model checkpoints so they can mix their own data with Amazon’s curated sets. Reddit and Hertz are already using these tools to speed up automation and development work.
It also introduced frontier agents built to work for hours or days without any input. These agents cover software development, security and DevOps tasks. Early users such as Commonwealth Bank of Australia and SmugMug have already used them to streamline work inside their teams.
The company also announced AWS AI Factories to bring Trainium chips, NVIDIA GPUs, and Bedrock services directly into customer data centres. HUMAIN in Saudi Arabia plans to build an “AI Zone” with up to 150,000 AI chips using this setup, which shows how far Amazon wants its hardware to reach.