Amazon Kinesis Analytics Debuts for SQL-Based Big Data Querying

Amazon Kinesis Analytics was introduced yesterday by Amazon Web Services Inc. (AWS), leveraging standard SQL to query streaming data.

Spokesperson Jeff Barr said the reason for the new service is simple. "We want you, whether you are a procedural developer, a data scientist, or a SQL developer, to be able to process voluminous clickstreams from Web applications, telemetry and sensor reports from connected devices, server logs, and more using a standard query language, all in real time!" Barr said in a blog post yesterday.

"Today I am happy to be able to announce the availability of Amazon Kinesis Analytics," Barr continued. "You can now run continuous SQL queries against your streaming data, filtering, transforming and summarizing the data as it arrives. You can focus on processing the data and extracting business value from it instead of wasting your time on infrastructure. You can build a powerful, end-to-end stream processing pipeline in 5 minutes without having to write anything more complex than a SQL query."

The tool uses the Firehose and Streams components to provide real-time analysis via SQL queries. Firehose is used to automatically load streaming data into AWS services such as S3 (cloud storage), Redshift (data warehouse) and Amazon Elasticsearch Service (a search and analytics engine). Streams, meanwhile, is used to build custom applications to work with streaming data for a variety of needs.

Using Kinesis Analytics is done with a three-step workflow: configure an input stream from a console; write SQL queries with a built-in SQL editor and templates; configure an output stream, specifying where you want the processed results to be loaded, such as the aforementioned S3, Redshift or Elasticsearch Service.

Analytics tools can then be used to create alerts and respond to changing data, useful for IoT applications, for example. This can be done with the aid of built-in machine learning algorithms that provide stream processing functionality such as anomaly detection, top-K analysis and approximate distinct items, exposed as SQL functions.

Like other AWS services, Kinesis Analytics infrastructure can be scaled up and down as needed and users pay for what they use.

Along with IoT scenarios, Kinesis Analytics can be used for use cases such as serving up personalized content for Web surfers based on clickstream data, or the real-time placing of appropriate ads. The most common usage patterns, AWS said, are time-series analytics, real-time dashboards, and real-time alerts and notifications.

Barr provides a Kinesis Analytics "getting started" example in his blog post, and Ryan Nienhuis provides more in-depth guidance in a blog post yesterday -- the first of a two-part series -- titled "Writing SQL on Streaming Data with Amazon Kinesis Analytics – Part 1."

"Previously, real-time stream data processing was only accessible to those with the technical skills to build and manage a complex application," Nienhuis concluded. "With Amazon Kinesis Analytics, anyone familiar with the ANSI SQL standard can build and deploy a stream data processing application in minutes.

"This application you just built provides a managed and elastic data processing pipeline using Analytics that calculates useful results over streaming data. Results are calculated as they arrive, and you can configure a destination to deliver them to a persistent store like Amazon S3."

Nienhuis said part two of his blog series will delve into more advanced stream processing concepts.

About the Author

David Ramel is the editor of Visual Studio Magazine.