Amazon EMR 5.0 Supports 16 Open Source Hadoop-Related Projects

Amazon Elastic MapReduce (Amazon EMR) has been updated to version 5.0, a major release that sees a better UI, improved debugging, updates to apps and support for 16 open source Hadoop ecosystem projects.

"Today the team is announcing and releasing EMR 5.0.0," says a blog post penned today by Amazon Web Services Inc. (AWS) spokesman Jeff Barr. "This is a major release that includes support for 16 open source Hadoop ecosystem projects, major version upgrades for Spark and Hive, use of Tez by default for Hive and Pig, user interface improvements to Hue and Zeppelin, and enhanced debugging functionality."

To meet the goal of providing the newest technology generally available, "EMR 5.0 includes support for 16 Hadoop ecosystem projects including Apache Hadoop, Apache Spark, Presto, Apache Hive, Apache HBase and Apache Tez," Barr further explained.

According to its Web site, "Amazon EMR is a Web service that makes it easy to quickly and cost-effectively process vast amounts of data."

Furthermore, AWS says on the site, "Amazon EMR simplifies Big Data processing, providing a managed Hadoop framework that makes it easy, fast and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark and Presto in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB."

Spark and Hive were singled out by Barr for their upgrades.

"This release of EMR updates Hive (a SQL-like interface for Tez and Hadoop MapReduce) from 1.0 to 2.1, accompanied by a move to Java 8," he said. "It also updates Spark (an engine for large-scale data processing) from 1.6.2 to 2.0, with a similar move to Scala 2.11. The Spark and Hive updates are both major releases and include new features, performance enhancements, and bug fixes. For example, Spark now includes a Structured Streaming API, better SQL support and more."

The UI improvements come in updates to Zeppelin -- "a notebook for interactive data analytics" -- and Hue, "an interface for analyzing data with Hadoop." For example, Hue now features notebooks that facilitate multiple queries from one page.

Of special interest to data developers, debugging has been improved, Barr said, with functionality such as partial stack traces and links to S3-based log files accessible from a console.

Barr said EMR 5.0 clusters can now be spun up in any AWS Region, and an Aug. 23 Webinar is planned to introduce features of the new upgrade.

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube