Amazon QuickSight Gets Presto, Apache Spark Connectors for Big Data Visualization

Amazon Web Services Inc. (AWS) beefed up its Big Data visualization capabilities with the addition of two new connectors -- for Presto and Apache Spark -- to its Amazon QuickSight service.

Amazon QuickSight is a business analytics service providing visualization, ad-hoc analysis and other business insight functionality. The company claims it provides Business Intelligence (BI) capabilities at one-tenth the cost of traditional solutions.

It lets developers tie into data sources of various types, including: CSV and Excel files; popular databases such as SQL Server, MySQL and PostgreSQL; and a host of AWS services including Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon Athena, Amazon S3 and more.

With yesterday's announcement of the Presto and Spark connectors, those AWS sources now include Amazon EMR, a managed Hadoop framework for Big Data analytics across Amazon EC2 (compute) instances.

Apache Spark is a wildly popular open source tool famous for adding new capabilities to the original Hadoop-based ecosystem, while the lesser-known Presto is an open source, distributed SQL query engine that data developers use for interactive analytic queries leveraging data sources that range from gigabytes to petabytes in size.

Presto supports ANSI SQL standard operations such as complex queries, aggregations, joins and window functions.

"Presto's execution framework is fundamentally different from that of Hive/MapReduce," AWS said in a blog post yesterday. "Presto has a custom query and execution engine where the stages of execution are pipelined, similar to a directed acyclic graph (DAG), and all processing occurs in memory to reduce disk I/O. This pipelined execution model can run multiple stages in parallel and streams data from one stage to another as the data becomes available. This reduces end-to-end latency and makes Presto a great tool for ad hoc data exploration over large data sets. Presto can run on multiple data sources, including Amazon S3."

The post detailed how to create an EMR cluster, set up Presto and Lightweight Directory Access Protocol (LDAP) with Secure Sockets Layer (SSL) and use QuickSight to visualize the data.

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube