in this guide we will learn how to install and configure Zookeeper and kafka on Ubuntu instances.
What is kafka?
Apache kafka is a distributed streaming platform.
It is a popular distributed message broker designed to efficiently handle large volumes of real-time data.
A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ.
It is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.
Three Capabilities of Streaming platform
- Publish and subscribe to the stream of records
- Stores streams of records in a fault-tolerant durable way
- Process streams of records
kafka is used for 2 types of Applications
- Helps to build real-time streaming data pipelines that can get data between applications.
- Helps to build real-time streaming applications that react to the streams of data.
5 Core API’s of kafka
Here is the lists of API’s in the Kafka system.
It allows the application to publish stream of records to one or more topics.
It allows an application to subscribe to topics and process the streams of records.
It helps to transform the input stream to an output stream.It allows applications to act as a stream processor , consuming input streams from topics and producing output streams to output topics.
Helps to build and run reusable producers or consumers that connect kafka topics to the existing applications.
It helps to manage and inspect kafka objects such as topics , brokers etc.
- A running Ubuntu server , You can check this article to spin up a Ubuntu server on AWS.
- Kafka requires a server with minimum 4GB of RAM to run.
- Java should be installed on the server
First we need to install Java on the system as Java is the dependency for the kafka to run.
Lets install OpenJDK 8 on the server.
Run the below command to install the Java.
sudo apt install openjdk-8-jre-headless
Once the package is installed , Check the installed Java version using the below command.
If you have installed multiple versions of Java in the same system and If you wish to switch the version of Java , Run the below command.
sudo update-alternatives --config java
It asks us to choose the version of java you require from the available installations.
provide the selection number and Enter.
Now that you have switched the version of Java as per the requirement.
Creating Kafka user
We should create a separate user for Kafka service as the Kafka service is communicated over the network.
Lets go ahead and create a kafka user.
sudo adduser --system --no-create-home --disabled-password --disabled-login kafka
The above command will just create a user without home directory.
Once the user is created , we can move on to kafka installation.
Download & Extract kafka
We need to download the kafka as the binary into the folder , extract it and start configuring kafka.
Lets create a folder named kafka under /opt
cd /opt/ mkdir kafka
Lets download the latest version of kafka in the /opt directory.
Once the package is downloaded , we have to extract it using below command , and store all the extracted files into the kafka folder.
tar xvzf kafka_2.13-2.5.0.tgz --directory kafka --strip-components 1
Now we have the required files and dependencies for the kafka.
We need to create directories to store the kafka data.
Create a folder named kafka under /var/lib
And a folder named data under /var/lib/kafka
Bu default Kafka doesn’t allow us to delete the topics. To delete the topic we need to edit server.properties file and add the required configuration.
Open the below file
add the below line at the end of the file.
delete.topic.enable = true
Also change the logging directory to /var/lib/kafka/data
Now , Save and close the file.
Changing Kafka Directory permissions
Remember that we have created a kafka user for the kafka service.
We need to change the ownership of the folders related to kafka.
chown -R kafka: /var/lib/kafka
chown -R kafka: /opt/kafka
Now it’s time to setup a service for to manage kafka service.But before that we need to install zookeeper package
Zookeeper is a top-level software developed by Apache that acts as a centralized service and is used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems.
Run the below command to install zookeeper,
apt-get install zookeeperd
We have installed zookeeper on the server.
Creating Service file for kafka & Zookeeper
We have to setup systemd unit files for the kafka and the zookeeper services and also to manage services such as start, stop , restart and enable on system boot.
First we will create service file for zookeeper under /etc/systemd/system directory.
Create a file named zookeeper.service and the add the below contents
Save and close the file.
Now we will create service file for kafka.
Create a file named kafka.service and add the below contents.
Description=High-available, distributed message broker
ExecStart=/bin/sh -c '/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties'
Save and close the file.
Starting Zookeeper & kafka
We have created service files to easily manage kafka and zookeeper services.
Run the below commands to start the kafka service and enable on system boot.
systemctl start kafka
systemctl enable kafka
To check the status of kafka service.
systemctl status kafka
Run the below commands to start the Zookeeper service and enable it to start on the system boot.
systemctl start zookeeper
systemctl enable zookeeper
Check the status of the zookeeper
systemctl status zookeeper
Now the kafka and zookeeper services are Up and Running.
Kafka runs on the port 9092 and the Zookeeper runs on the port 2181.
Testing kafka Setup
Now that we have kafka and the zookeeper services are implemented and lets test the kafka installation.
For this testing , We’re going to create a Topic and we will publish and consume the messages to ensure that the kafka is working as expected.
There are two core components when publishing messages in kafka.
Producer : Which Publishes a messages to the topics
Consumer : Which reads messages and data from the topics
Lets create a topic named Test , using below command.
/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Test
You will get a message , Create topic Test
We can create producer using the below script.
Using the producer script we are going to publish a message “Testing kafka” into the Test topic.
echo "Testing Kafka" | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Test > /dev/null
We can create consumer using the below script.
The below command will consume messages from the Test topic .
We will use –from-beginning flag , Which will consume all the messages that was published to the topic before the consumer was created.
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test --from-beginning
you will receive the message that was published to the topic as an ouput while running the above command.
We have successfully implemented a Kafka and zookeeper on the Ubuntu server.
Hope you find it helpful.Please do check out my other articles.