Amid Shortage, AWS Renting Out Machine Learning GPUs

Amazon Web Services is now letting customers rent clusters of machine learning-capable GPUs from its cloud, giving businesses the opportunity to run short-term AI workloads at a time when AI processing power is in low supply.

Hitting general availability this week, the new Elastic Compute Cloud (EC2) Capacity Blocks for ML offering lets AWS customers "reserve hundreds of NVIDIA GPUs colocated in Amazon EC2 UltraClusters designed for high-performance ML workloads," the cloud giant said in its announcement Tuesday. 

Users can make reservations to use these specialized GPUs up to two months in advance, and for time slots as short as one day or as long as 14 days. They can also specify the cluster size they need, anywhere from 1 instance to 64. 

This model means businesses can avoid committing to months-long contracts to access the kind of compute capacity that AI and machine learning workloads require; they only pay for the compute power they need to use, when they need to use it.

These rentable machine learning GPU are "ideal for completing training and fine tuning ML models, short experimentation runs, and handling temporary future surges in inference demand to support customers' upcoming product launches as generative applications become mainstream," AWS says.  

Under the hood, the GPUs run on AWS' high-capacity P5 compute instances, and are "interconnected with second-generation Elastic Fabric Adapter (EFA) petabit-scale networking, delivering low-latency, high-throughput connectivity, enabling customers to scale up to hundreds of GPUs." 

Chipmakers have struggled to make enough GPUs to meet skyrocketing demand for generative AI workloads, leading to a months-long industrywide chip shortage. AWS' partnership with Nvidia, whose share of the entire GPU market is north of 80 percent, makes it well-positioned to extend machine learning capabilities to customers that would otherwise have to wait an indefinite amount of time.

"With AWS's new EC2 Capacity Blocks for ML, the world's AI companies can now rent H100 not just one server at a time but at a dedicated scale uniquely available on AWS," said Nvidia HPC chief Ian Buck, "enabling them to quickly and cost-efficiently train large language models and run inference in the cloud exactly when they need it."

More information on EC2 Capacity Blocks for ML, including pricing, is available here.

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.


Subscribe on YouTube