AWS-Nvidia Partnership Takes Multiple Steps Forward at GTC

Amazon Web Services this week described plans to bolster its AI infrastructure using new Nvidia technologies announced at the chip giant's ongoing GTC conference.

"AI is driving breakthroughs at an unprecedented pace, leading to new applications, business models, and innovation across industries," said Nvidia CEO Jensen Huang in a prepared statement. "Our collaboration with AWS is accelerating new generative AI capabilities and providing customers with unprecedented computing power to push the boundaries of what's possible."

The two companies have been partners for many years, but latterly their efforts have revolved mostly around building out their respective AI and machine learning infrastructures by integrating their technologies. With the launch of Nvidia's new NVIDIA Blackwell GPU platform at GTC this week, AWS is set to be the beneficiary of significantly more compute power to drive its AI efforts.

Project Ceiba
For instance, AWS' supercomputer project, dubbed "Ceiba," will run on the new GB200 NVL72 technology from Nvidia. AWS first unveiled Ceiba at last year's re:Invent conference, touting it as the "world's fastest GPU-powered AI supercomputer." Ceiba is targeted for heavy AI workloads, including those used for weather forecasting, robotics, advanced LLMs, autonomous cars and more.

Originally, Ceiba was intended to run on Nvidia's older Hopper chips. The use of the newer Blackwell chips, however, promises to increase performance sixfold.

Ceiba is a "first-of-its-kind supercomputer with 20,736 B200 GPUs is being built using the new NVIDIA GB200 NVL72, a system featuring fifth-generation NVLink connected to 10,368 NVIDIA Grace CPUs," AWS said in its announcement Tuesday. "The system scales out using fourth-generation EFA networking, providing up to 800 Gbps per Superchip of low-latency, high-bandwidth networking throughput -- capable of processing a massive 414 exaflops of AI."

AWS Customers will also be able to tap into the new Blackwell chips via Elastic Compute Cloud (EC2) instances.

"AWS plans to offer EC2 instances featuring the new B100 GPUs deployed in EC2 UltraClusters for accelerating generative AI training and inference at massive scale," said AWS. "GB200s will also be available on NVIDIA DGX Cloud, an AI platform co-engineered on AWS, that gives enterprise developers dedicated access to the infrastructure and software needed to build and deploy advanced generative AI models."

At re:Invent last year, AWS announced it would host Nvidia's DGX Cloud AI-training-as-a-service platform on its cloud.

Nvidia's new Blackwell technology will also enable more secure AI workloads in AWS by combining the GB200 chip with Amazon's Nitro hypervisor technology.

"The combination of the AWS Nitro System and the NVIDIA GB200 takes AI security even further by preventing unauthorized individuals from accessing model weights," said AWS. "The GB200 allows inline encryption of the NVLink connections between GPUs, and encrypts data transfers, while EFA encrypts data across servers for distributed training and inference."

AWS CEO Adam Selipsky touted his company's GTC announcements as the natural extension of its partnership with Nvidia, which has spanned more than a decade.

"Today we offer the widest range of NVIDIA GPU solutions for customers," he said. "NVIDIA's next-generation Grace Blackwell processor marks a significant step forward in generative AI and GPU computing."

About the Author

Gladys Rama (@GladysRama3) is the editorial director of Converge360.


Subscribe on YouTube