News
PartiQL: Introducing Amazon's New Query Language for 'All' Data
On Thursday Amazon Web Services (AWS) announced PartiQL, a new query language that the company says is designed to work with all data types, structures and storage situations.
"Every different type and flavor of data store may suit a particular use case, but each also comes with its own query language. The result is tight coupling between the query language and the format in which data is stored. ... This is a very large obstacle to the agility and flexibility needed to effectively use data lakes," the company commented in its explanation of why it created PartiQL.
"As long as your query engine supports PartiQL, you can process structured data from relational databases (both transactional and analytical), semi-structured and nested data in open data formats (such as an Amazon S3 data lake), and even schema-less data in NoSQL or document databases that allow different attributes for different rows," it continued. "We are open sourcing the PartiQL tutorial, specification, and a reference implementation of the language under the Apache2.0 license, so that everyone can participate, contribute, and use it to drive widespread adoption for this unifying query language."
AWS said it is already using PartiQL internally with its S3 Select, Glacier Select, RedShift Spectrum and Quantum Ledger products, among others. The company said that Couchbase has also signed on to support the query language.
Data lakes are large repositories of storage, often used by enterprises, that house data in its "raw" or "natural" format (whatever that might be) in a flat structure -- unlike data warehouses, which are generally very hierarchical and store data using folders or files -- with each item tagged with a unique identifier, tags and/or metadata. The data can then be pulled to a variety of uses, whether data-mining applications, machine learning, analytics or something else.
The centralized but schema-less structure of a data lake is becoming increasingly attractive to enterprises given the large amounts of data they have coming in from a wide variety of sources. However, data lake projects can easily fail for a variety of reasons, one of which is not having clear, consistent access to the data stored.
AWS explained in its blog post announcing PartiQL that it developed the solution because of its need to query/transform a wide variety of data types -- including tabular, nested and semi-structured -- located in numerous formats and storage devices.
"We therefore set out to create a language that offers strict SQL compatibility, achieves nested and semi-structured processing with minimal extensions, treats nested data as a first-class citizen, allows optional schema, and is independent of physical formats and data stores," it commented. "PartiQL...provides a simple and consistent way to query data across a variety of formats and services. This gives you the freedom to move your data across data sources, without having to change your queries. It is backwards-compatible with SQL, and provides extensions for multi-valued, nested, and schema-less data, which blend seamlessly with the join, filtering, and aggregation capabilities of standard SQL."
More information on PartiQL can be found here. A tutorial is located here.
About the Author
Becky Nagel serves as vice president of AI for 1105 Media specializing in developing media, events and training for companies around AI and generative AI technology. She also regularly writes and reports on AI news, and is the founding editor of PureAI.com. She's the author of "ChatGPT Prompt 101 Guide for Business Users" and other popular AI resources with a real-world business perspective. She regularly speaks, writes and develops content around AI, generative AI and other business tech. Find her on X/Twitter @beckynagel.