Confluent Cloud Bringing Apache Kafka-as-a-Service to AWS

Confluent Inc. is bringing its open source-based streaming data platform to the Amazon Web Services Inc. (AWS) cloud.

The company, headed by the original creators of Kafka, today announced an early access program to provide the streaming technology to AWS users before the new offering -- called Confluent Cloud-- is available to everyone. While AWS is the first public cloud host for the managed service, Microsoft Azure and Google Cloud support is planned for later.

Apache Kafka is a hot topic in the Big Data analytics world, serving as a publish/subscribe system adept at streaming records. That capability makes it similar to a message queue or enterprise messaging system, but its streaming capabilities also make it a great fit for setting up data pipelines to ingest streams of records for Big Data analytics and also for applications that process, store or work with that data in other ways.

According to a blog post yesterday from developer-focused research firm RedMonk, "Usage of Kafka continues to grow at an extremely fast pace across multiple industry segments. Kafka is becoming a core part of data pipelines at scale."

With its usage growing, Confluent, the primary commercial backer of the technology, wants to make it easier for developers to use in the cloud without worrying about setup and operations maintenance.

Company exec Neha Narkhede explained the reasoning for moving Kafka to the public cloud as a managed service in a blog post today. "It is clear to me that developers love Kafka; we want to use Kafka to serve as the company's central nervous system, a streaming backbone for all our microservices, a commit log to serve database changelogs, a basis for stream processing, but we don't want to operate Kafka," she said. "Why? Because running distributed systems in the cloud on your own is really, really hard. As developers, we expect distributed systems to just be available as managed services so we can focus on building applications; not manage infrastructure and data services."

While Confluent open sourced Kafka in its early days and provides it as a free download along with its enterprise offering, the company claims the AWS-hosted fully managed service provides advantages over the base technology, featuring expanded integration capabilities, additional tools to optimize and manage Kafka clusters and security.

Key features of Confluent Cloud as listed by the company include:

  • Access to the Kafka ecosystem: Unlike proprietary streaming services from cloud providers, Confluent Cloud offers the same open source Apache Kafka APIs that developers are familiar with, making it easy to leverage clients, connectors and tools supported by the Kafka and Confluent communities.
  • Cloud optionality: In a recent survey of over 350 Apache Kafka users, 52 percent are using Kafka in the public cloud. Nearly one-third (32 percent) of respondents who use Kafka in the cloud are running at least six applications with Kafka. Confluent Cloud is vendor agnostic, so developers can "lift and shift" Kafka infrastructure from any location into -- or out of -- any cloud.
  • Unleash developer velocity: Confluent Cloud is a hosted streaming data service that takes away the operational burden of running Kafka and lets developers focus on building streaming applications. Simple, resilient, secure and performant, teams can focus on what matters most -- building the company.
  • Reduce the ops burden: Troubleshooting or building distributed systems is difficult, operating them at scale is even harder. Kafka is no exception. As cluster usage grows, failures are more likely. With Confluent Cloud, it's easy to get Kafka up and running.

When using Confluent clients, the company said, developers can program in Java, Python, C/C++, Go and .NET.

When it hits general availability -- expected later this year -- it won't be the first Kafka-based offering on the AWS cloud, as the AWS Marketplace currently lists such services from other providers including Bitnami and Alooma.

Also, AWS has its own streaming data service, called Amazon Kinesis.

As RedMonk noted in its blog post yesterday: "Much of the enterprise momentum is all with Kafka for now, and on that front Confluent are the primary provider of a commercial offering. The most significant threat is from the cloud providers, namely Google Cloud Dataflow, Amazon Kinesis and Azure Event Hub from Microsoft and the various integration points which they offer."

About the Author

David Ramel is the editor of Visual Studio Magazine.