PartiQL: Introducing Amazon's New Query Language for 'All' Data
On Thursday Amazon Web Services (AWS) announced PartiQL, a new query language that the company says is designed to work with all data types, structures and storage situations.
"Every different type and flavor of data store may suit a particular use case, but each also comes with its own query language. The result is tight coupling between the query language and the format in which data is stored. ... This is a very large obstacle to the agility and flexibility needed to effectively use data lakes," the company commented in its explanation of why it created PartiQL.
"As long as your query engine supports PartiQL, you can process structured data from relational databases (both transactional and analytical), semi-structured and nested data in open data formats (such as an Amazon S3 data lake), and even schema-less data in NoSQL or document databases that allow different attributes for different rows," it continued. "We are open sourcing the PartiQL tutorial, specification, and a reference implementation of the language under the Apache2.0 license, so that everyone can participate, contribute, and use it to drive widespread adoption for this unifying query language."
AWS said it is already using PartiQL internally with its S3 Select, Glacier Select, RedShift Spectrum and Quantum Ledger products, among others. The company said that Couchbase has also signed on to support the query language.
Data lakes are large repositories of storage, often used by enterprises, that house data in its "raw" or "natural" format (whatever that might be) in a flat structure -- unlike data warehouses, which are generally very hierarchical and store data using folders or files -- with each item tagged with a unique identifier, tags and/or metadata. The data can then be pulled to a variety of uses, whether data-mining applications, machine learning, analytics or something else.
The centralized but schema-less structure of a data lake is becoming increasingly attractive to enterprises given the large amounts of data they have coming in from a wide variety of sources. However, data lake projects can easily fail for a variety of reasons, one of which is not having clear, consistent access to the data stored.
AWS explained in its blog post announcing PartiQL that it developed the solution because of its need to query/transform a wide variety of data types -- including tabular, nested and semi-structured -- located in numerous formats and storage devices.
"We therefore set out to create a language that offers strict SQL compatibility, achieves nested and semi-structured processing with minimal extensions, treats nested data as a first-class citizen, allows optional schema, and is independent of physical formats and data stores," it commented. "PartiQL...provides a simple and consistent way to query data across a variety of formats and services. This gives you the freedom to move your data across data sources, without having to change your queries. It is backwards-compatible with SQL, and provides extensions for multi-valued, nested, and schema-less data, which blend seamlessly with the join, filtering, and aggregation capabilities of standard SQL."
More information on PartiQL can be found here. A tutorial is located here.
About the Author
Becky Nagel is the vice president of Web & Digital Strategy for 1105's Converge360 Group, where she oversees the front-end Web team and deals with all aspects of digital projects at the company, including launching and running the group's popular virtual summit and Coffee talk series . She an experienced tech journalist (20 years), and before her current position, was the editorial director of the group's sites. A few years ago she gave a talk at a leading technical publishers conference about how changes in Web browser technology would impact online advertising for publishers. Follow her on twitter @beckynagel.