AWS Debuts Glue To Automate ETL Jobs
A new Amazon Web Services (AWS) solution promises to reduce the time it takes an organization to sort through its data for analytics projects.
The company announced the general availability of AWS Glue on Monday at the AWS Summit event in New York City. During the keynote presentation, Matt Wood, general manager of artificial intelligence at AWS, described the new service as an extract, transform and load (ETL) solution that's fully managed and serverless.
Glue is designed to drastically reduce the amount of time organizations might typically spend refining their data before they're able to analyze it. What is usually a months-long process could take just minutes using Glue, AWS said in its announcement.
"Data integration -- extracting data from various sources, normalizing it, and loading it into data stores -- often represents as much as 75 percent of the time required to implement an analytics project," according to the company. "Customers can spend months hand coding and editing ETL scripts, which frequently become more complex and error prone as data volumes grow, and new data sources are added."
Glue simplifies this process by automating the data integration. Glue works by first deploying "crawlers" across an organization's AWS resources to discover and categorize all of its data and metadata. Glue then builds an editable and sortable data catalog based on this gathered information.
Next, the service creates a customizable transformation code of that data. "Glue can automatically generate ETL scripts (in Python!) to translate your data from your source formats to your target formats," explained AWS developer evangelist Randall Hunt in a blog post.
Users can then schedule one or more ETL jobs, whether they're consecutive, recurring or on-demand. Glue will automatically scale resources up or down as the workload requires.
Pricing for Glue is available here.
Gladys Rama is the senior site producer for Redmondmag.com, RCPmag.com and MCPmag.com.