What is Apache Kafka?

Apache Kafka and it’s APIs make building data-driven apps and managing complex back end systems simple. It is a system that serves as the soul of many companies’ architecture. It’s a platform that allows you to store absurd amounts of data.

Kafka is fault tolerant, because it is distributed. This means it resides on multiple machines. If one machine fails, the data is still intact. Kafka is horizontally scalable, meaning you can add more machines as your needs grow. There is no limit to how many machines you can have connected.

Kafka is essentially a commit log. A commit log is a data structure that only supports appends. You can’t modify or delete records from it. This makes all the data that Kafka stores, ordered. Kafka stores all of its records to disk, and does not keep anything in RAM. It stores its metadata related to data distribution and replication in a service called Zookeeper.

Kafka is perfect for use as the soul of a system’s architecture. It is a centralized medium that connects to different applications. It supports an event-driven architecture, which creatives nowadays benefit from greatly. Kafka can handle trillions of events each day. Thousands of companies are using Kafka, including a third of the Fortune 500. For example, Uber uses Kafka to manage passenger and driver matching.

Kafka has four major APIs. They are the Producer API, Consumer API, Connector API, and Streams API. Producer permits an application to publish streams of records. Consumer permits an application to consume streams of records. Connector uses the Producer/Consumer APIs internally to import/export data to/from other systems. Lastly, Streams can process the actual data streams and perform operations on the data.