AWS Publishes New Guidance on Big Data Using Apache Kafka, Apache Cassandra

With the dizzying array of Big Data services available on the AWS cloud, all of the options can sometimes be hard to understand, a problem the company is addressing with continuing guidance that outlines best practices and other tips for using the various offerings.

Most recently, the cloud platform published "Best Practices for Running Apache Kafka on AWS" and "Best Practices for Running Apache Cassandra on Amazon EC2."

The former Apache Kafka post, published Friday (March 2), was written in partnership with Intuit, which shared lesson learned from its two years of running large-scale clusters for Kafka, an open source, distributed streaming platform that lets developers and other users build real-time streaming applications.

The post from Intuit, a specialist in business and financial management solutions, includes details on the following aspects of running Kafka clusters on AWS:

  • Deployment considerations and patterns
  • Storage options
  • Instance types
  • Networking
  • Upgrades
  • Performance tuning
  • Monitoring
  • Security
  • Backup and restore

With security always top-of-mind in enterprise cloud implementations relying on corporate data, the post covers: encryption at rest; encryption in transit; authentication; authorization; and more.

"Like most distributed systems, Kafka provides the mechanisms to transfer data with relatively high security across the components involved," the post said. "Depending on your setup, security might involve different services such as encryption, Kerberos, Transport Layer Security (TLS) certificates, and advanced access control list (ACL) setup in brokers and ZooKeeper." It then details how Intuit approached security around the above topics.

Meanwhile, the guidance for using Apache Cassandra, published Feb. 28, applies specifically to using the Amazon EC2 (cloud computing) service. Cassandra is a high-performance NoSQL database.

"Amazon EC2 and Amazon Elastic Block Store (Amazon EBS) provide secure, resizable compute capacity and storage in the AWS Cloud," AWS said. "When combined, you can deploy Cassandra, allowing you to scale capacity according to your requirements. Given the number of possible deployment topologies, it’s not always trivial to select the most appropriate strategy suitable for your use case."

AWS goes on to outline three Cassandra deployment options and associated best practices guidance touching on:

  • Cassandra resource overview
  • Deployment considerations
  • Storage options
  • Networking
  • High availability and resiliency
  • Maintenance
  • Security

Like the Kafka post, the guidance emphasizes security, also covering data encryption at rest and in transit, along with authentication and authorization.

"We recommend that you think about security in all aspects of deployment," AWS said. "The first step is to ensure that the data is encrypted at rest and in transit. The second step is to restrict access to unauthorized users. For more information about security, see the Cassandra documentation."

About the Author

David Ramel is an editor and writer for Converge360.


Subscribe on YouTube