Databricks Tunes Apache Spark-Based Cloud Platform for Data Engineers -- AWSInsider

Databricks Tunes Apache Spark-Based Cloud Platform for Data Engineers

By David Ramel
04/12/2017

Big Data company Databricks Inc. launched a new edition of its Apache Spark-based platform -- specially tuned for data engineers -- on the Amazon Web Services Inc. (AWS) cloud.

Called Databricks for Data Engineering, the platform is designed to help data and machine learning (ML) engineers create and deploy highly optimized infrastructure for data processing in the cloud, the company said.

Those folks, when tasked with addressing business use cases such as real-time dashboards or fraud detection, undertake mission-critical operations such as cleansing, transforming and manipulating data, said Databricks, which added that data engineering is crucial for processing data in order to make business decisions or automating business processes with intelligent algorithms.

Powering that data processing in the Databricks platform is Apache Spark, an open source data processing engine that exploded in popularity based on capabilities that improved on the MapReduce component introduced with Apache Hadoop.

"The new offering enables more cost-effective data engineering using Spark while empowering data engineers to easily combine SQL, structured streaming, Extract, Transform, Load (ETL), and machine learning workloads running on Spark to rapidly and securely deploy data pipelines into production," the company said in a statement today. "Databricks for Data Engineering will complement the company's existing cloud platform by providing all enterprises with a unified data analytics platform that fosters seamless collaboration to accelerate data-driven decisions across the organization."

Specifically, the company said the optimized platform offers:

Performance optimization: Databricks I/O technology (DBIO) improves processing speeds with a tuned and optimized version of Spark for a wide variety of instance types, in addition to an optimized AWS S3 access layer -- accelerating data exploration by up to 10x.
Cost management: Cluster management capabilities such as auto-scaling and AWS Spot instances reduces operational costs by avoiding time-consuming tasks to build, configure and maintain complex Spark infrastructure.
Optimized integration: Comprehensive REST APIs to programmatically launch clusters and jobs and integrate tools or services -- such as Redshift, Kinesis and ML frameworks such as TensorFlow -- with the Databricks platform. An integrated data sources catalog makes every data source immediately available to all Databricks users without duplicating data ingest work.
Enterprise security: Turnkey security standards including SOC 2 Type 1 certification and HIPAA compliance, data encryption, detailed logs easily accessible in AWS S3 for debugging, and IT admin capabilities such as Single Sign-On with SAML 2.0 support and role-based access controls for clusters, jobs, and notebooks.
Collaboration with data science: Integration with the data science workspaces in Databricks, enabling a seamless transition between data engineering and interactive data science workloads.

Pricing for the optimized platform is based on data engineering workloads such as ETL and automated jobs ($0.20 per Databricks Unit plus the cost of AWS), the company said.

About the Author

David Ramel is an editor and writer at Converge 360.

Featured

Subscribe on YouTube

AWS Cloud Report

Email Address*Country*

Please type the letters/numbers you see above.

Upcoming Training Events

0 AM

VSLive! 3-Day Hands-On Training Seminar: Master Modern JavaScript: Unlock the Full Potential of Your Code
June 2-4, 2025

VSLive! 2-Day Hands-On Training Seminar: Asynchronous and Parallel Programming in C#
June 24-25, 2025

4-Hour Hands-on Workshop: MCP Demystified
June 30, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
July 15-18, 2025

Securing IT in the AI Era
July 23, 2025

VSLive! 4-Hour In-Depth Workshop: Immersive .NET Full Stack Training: C# Interfaces: Effective Usage while Avoiding Pitfalls
July 29, 2025

Visual Studio Live! @ Microsoft HQ
August 4-8, 2025

TechMentor @ Microsoft HQ
August 11-15, 2025

4-Hour VSLive! Workshop: Testability in .NET
August 27, 2025

Microsoft 365 Security Masterclass
August 25-26, 2025

Visual Studio Live! San Diego
September 8-12, 2025

Live! 360 2-Day Hands-On Seminar: Swimming in the Lakes of Microsoft Fabric and AI – A Hands-on Experience
September 18-19, 2025

VSLive! 2-Day Hands-On Training Seminar: Hands-On with .NET Web Development in 2025
October 7-8, 2025

Live! 360 Orlando
November 16-21, 2025

Artificial Intelligence Live! Orlando
November 16-21, 2025

Cloud & Containers Live! Orlando
November 16-21, 2025

Cybersecurity & Ransomware Live! Orlando
November 16-21, 2025

Data Platform Live! Orlando
November 16-21, 2025

Visual Studio Live! Orlando
November 16-21, 2025

TechMentor Orlando
November 16-21, 2025

VSLive! 4-Day Hands-On Training Seminar: Immersive .NET Full Stack Training: 4-Day Hands-On Experience
December 16-19, 2025

Visual Studio Live! Las Vegas
March 16-20, 2026

Free Whitepapers

> More TechLibrary

Free Webcasts

> More Webcasts