ksqlDB is a specialized database for stream processing, using SQL-like syntax, offering REST and command line interfaces for easy integration with various applications, regardless of programming language, and facilitating development and deployment in Docker or on a single node.
Here’s some example ksqlDB code that does substantially the same thing as the Kafka Streams code we looked at previously:
CREATE TABLE rated_movies AS SELECT title,
release_year,
sum(rating) / count(rating) AS avg_rating
FROM ratings
INNER JOIN movies ON ratings.movie_id = movies.movie_id
GROUP BY title,
release_year;
This query in ksqlDB creates a table combining movie titles and release years as keys, with the average rating as values. ksqlDB enables querying this table via its REST API and integrates with Kafka Connect to connect to external data sources. Overall, it serves as a standalone stream processing engine using SQL, providing a unified approach for Kafka-based stream processing tasks.
☝️ ksqlDB is not a traditional database in the sense of persistently storing data. It operates more like a stream processing engine that processes data in real-time. While it does manage state internally for stream processing tasks, it typically relies on an underlying data store like Apache Kafka for durable storage of data. So, while it can retain some data for processing purposes, it's not a database in the traditional sense of storing data persistently for long-term retrieval.
For a more detailed introduction to ksqlDB, check out the ksqlDB 101 course.