AWS Lake Formation Simplifies, Automates Data Lakes for Analytics

Amazon Web Services Inc. (AWS) has made AWS Lake Formation generally available, helping organizations simplify and automate the creation and management of data lakes.

Part of the Big Data analytics movement, data lakes provide a location to store data of various types -- structured or unstructured, in different formats, and so on -- for business-driven analytics, increasingly aided by machine learning.

In typical scenarios, however, many manual steps have to be taken to create and manage data lakes, and AWS Lake Formation is designed to handle tasks such as collecting, cleaning, and cataloging data, while also securely making it available for analytics.

"AWS Lake Formation significantly simplifies the process and removes the heavy lifting from setting up a data lake," AWS said in a news release. "AWS Lake Formation automates manual, time-consuming steps, like provisioning and configuring storage, crawling the data to extract schema and metadata tags, automatically optimizing the partitioning of the data, and transforming the data into formats like Apache Parquet and ORC that are ideal for analytics. AWS Lake Formation cleans and deduplicates data using machine learning to improve data consistency and quality."

AWS Lake Formation
[Click on image for larger view.] AWS Lake Formation (source: AWS)

The new service graduating from preview works with several other AWS services for analytics and other tasks -- Amazon S3 buckets are commonly used for storage, for example -- including Amazon Redshift (data warehouse), Amazon Athena (serverless interactive query service) and AWS Glue (extract, transform, and load [ETL] service). Support for Apache Spark analytics with Amazon EMR will follow over the next few months, along with Amazon QuickSight and Amazon SageMaker support.

AWS Lake Formation, which doesn't incur any extra charges beyond the AWS services used with it, is available in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland) and Asia Pacific (Tokyo) regions.

A new blog post details how to create and set up a data lake with the new service, and more information is available in the "Data Lakes and Analytics on AWS" site, the "What Is a Data Lake?" article and a "Data Lake Foundation on AWS" quick start and the AWS Lake Formation site, which includes a FAQ and other resources.

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube