AWS Announces Tools To Manage Machine Learning Workloads
In a bid to reduce the time, compute resources and costs required to train machine learning models, Amazon Web Services (AWS) announced new tools at re:Invent this week to lighten the load.
For starters, the company is preparing new GPU instances optimized for large-scale machine learning training. The upcoming P3dn.24xlarge instances, expected to launch in the first week of December, will be powered by eight NVIDIA Tesla V100 GPUs, each with 32GB capacity, and support networking throughput up to 100Gbps. They will also have 96 Intel Xeon Skylake vCPUs.
"The faster networking, new processors, doubling of GPU memory, and additional vCPUs enable developers to significantly lower the time to train their ML models or run more HPC simulations by scaling out their jobs across several instances (e.g., 16, 32 or 64 instances)," according to AWS. "[I]n addition to increasing the throughput of passing data between instances, the additional network throughput of P3dn.24xlarge instances can also be used to speed up access to large amounts of training data by connecting to Amazon S3 or shared file systems solutions such as Amazon EFS."
Additionally, for developers working with TensorFlow, AWS said it has made changes to the framework that improve its ability to scale across multiple GPUs, making machine learning training tasks more resource-efficient. The changes are now generally available.
"By improving the way in which TensorFlow distributes training tasks across those GPUs, the new AWS-Optimized TensorFlow achieves close to linear scalability when training multiple types of neural networks (90 percent efficiency across 256 GPUs, compared to the prior norm of 65 percent)," AWS said in an announcement.
Another new service that's now generally available is Amazon Elastic Inference. Inference refers to the process in which a machine learning model makes predictions around brand-new data using what it has learned from the earlier training stage. The resource requirements and costs of inference can be significantly higher than those for training, according to AWS. Inference gives developers more options to manage the amount of compute power they buy, potentially cutting their spending by as much as 75 percent, according to the AWS announcement.
"Instead of running on a whole Amazon EC2 P2 or P3 instance with relatively low utilization, developers can run on a smaller, general-purpose Amazon EC2 instance and provision just the right amount of GPU performance from Amazon Elastic Inference," the company said. Starting at just 1 TFLOP, developers can elastically increase or decrease the amount of inference performance, and only pay for what they use."
For heavier workloads, AWS is developing a new inference processor called AWS Inferentia. Due in 2019, Inferentia is a dedicated inference chip that promises to be cost-effective, with low latency and high throughput capacity.
"Each chip provides hundreds of TOPS (tera operations per second) of inference throughput to allow complex models to make fast predictions," according to the AWS product page. "For even more performance, multiple AWS Inferentia chips can be used together to drive thousands of TOPS of throughput."