BlueData Puts 'Big Data-as-a-Service' on AWS

Following a "directed availability" trial period, BlueData Software Inc. today announced its Big-Data-as-a-Service (BDaaS) platform is now generally available to everyone on the Amazon Web Services Inc. (AWS) cloud.

The company in June announced its BDaaS platform -- heretofore available only for on-premises installations -- was moving to public cloud services, starting with AWS, where it was in directed availability (targeting a select group of customers to ensure it meets expectations).

Those expectations were apparently appropriately met, as the company's EPIC software is now available for a free, two-week test run. It's not available in the AWS Marketplace, and the company didn't indicate when or if it will be.

BlueData touted the ability of EPIC to be deployed both on-premises and in the cloud, saying its "first and only" BDaaS offering accomplishes this by using embedded and managed Docker containers that provide portability across different infrastructure. Identical Docker-based images leveraging projects such as Apache Hadoop and Apache Spark can be used in-house or in the cloud.

With the ability to tap into AWS' S3 storage or local storage -- such as a Hadoop Distributed File Service (HDFS) data lake -- the company said its AWS offering provides enterprise-class security and cost controls for customers using multi-tenant deployments.

In a blog post today, the company said the new AWS offering was improved through insights gleaned from the directed availability program with select customers.

For example, "The QA team at a leading data integration software vendor wanted to test their new product on AWS with different commercial versions of Hadoop (such as Cloudera CDH, Hortonworks HDP, and MapR)," said exec Anant Chintamaneni in the blog post. "With BlueData EPIC, they could quickly create Docker application images with their own code and other Hadoop artifacts; this helped to significantly reduce their QA cycle times and improve team productivity."

Chintamaneni said the key benefits of the product as identified through the directed availability program include:

  • Simplified user experience for both administrators and data science teams, abstracting the AWS-specific infrastructure so they can focus on their Big Data needs.
  • Faster AWS onboarding for multiple teams and Big Data workloads, eliminating the need for DevOps expertise and reducing the cost and time involved.
  • Greater agility and flexibility, with self-service clusters pre-configured on Amazon EC2 for Spark, Hadoop, Kafka, Cassandra, and other Big Data applications.
  • Reduced AWS costs through the use of fine-grained resource quotas, start/stop controls, and cost reporting in a multi-tenant environment.
  • Faster time to insights with pre-built cluster integrations to Amazon S3 and in-place analytics against on-premises data.
  • Improved data governance with integrations to Amazon VPC (including site-to-site VPN), Active Directory, and Kerberos for authentication.

"BlueData is also the only BDaaS solution that allows data scientists, developers, and analysts to work with their data frameworks and tools of choice including Spark standalone; Hadoop distributions from Cloudera, Hortonworks, and MapR; other frameworks such as Kafka and Cassandra; Jupyter and Zeppelin notebooks; Python and R libraries; as well as other data science and analytics tools," the company said in today's statement.

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube