in this guide we will learn how to install and configure Zookeeper and kafka on Ubuntu instances.

What is kafka?

Apache kafka is a distributed streaming platform.

It is a popular distributed message broker designed to efficiently handle large volumes of real-time data.

A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ.

It is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

Three Capabilities of Streaming platform

  • Publish and subscribe to the stream of records
  • Stores streams of records in a fault-tolerant durable way
  • Process streams of records

kafka is used for 2 types of Applications

  • Helps to build real-time streaming data pipelines that can get data between applications.
  • Helps to build real-time streaming applications that react to the streams of data.

5 Core API’s of kafka

Here is the lists of API’s in the Kafka system.

PRODUCER API

It allows the application to publish stream of records to one or more topics.

CONSUMER API

It allows an application to subscribe to topics and process the streams of records.

STREAMS API

It helps to transform the input stream to an output stream.It allows applications to act as a stream processor , consuming input streams from topics and producing output streams to output topics.

CONNECTOR API

Helps to build and run reusable producers or consumers that connect kafka topics to the existing applications.

ADMIN API

It helps to manage and inspect kafka objects such as topics , brokers etc.

PreRequisites

  • A running Ubuntu server , You can check this article to spin up a Ubuntu server on AWS.
  • Kafka requires a server with minimum 4GB of RAM to run.
  • Java should be installed on the server

Installing JAVA

First we need to install Java on the system as Java is the dependency for the kafka to run.

Lets install OpenJDK 8 on the server.

Run the below command to install the Java.

sudo apt install openjdk-8-jre-headless

Once the package is installed , Check the installed Java version using the below command.

java -version
kafka

If you have installed multiple versions of Java in the same system and If you wish to switch the version of Java , Run the below command.

sudo update-alternatives --config java

It asks us to choose the version of java you require from the available installations.

provide the selection number and Enter.

kafka

Now that you have switched the version of Java as per the requirement.

Creating Kafka user

We should create a separate user for Kafka service as the Kafka service is communicated over the network.

Lets go ahead and create a kafka user.

sudo adduser --system --no-create-home --disabled-password --disabled-login kafka

The above command will just create a user without home directory.

Once the user is created , we can move on to kafka installation.

Download & Extract kafka

We need to download the kafka as the binary into the folder , extract it and start configuring kafka.

Lets create a folder named kafka under /opt

cd /opt/
mkdir kafka

Lets download the latest version of kafka in the /opt directory.

wget https://downloads.apache.org/kafka/2.5.0/kafka_2.13-2.5.0.tgz

Once the package is downloaded , we have to extract it using below command , and store all the extracted files into the kafka folder.

tar xvzf kafka_2.13-2.5.0.tgz --directory kafka --strip-components 1

Now we have the required files and dependencies for the kafka.

Configuring Kafka

We need to create directories to store the kafka data.

Create a folder named kafka under /var/lib

And a folder named data under /var/lib/kafka

mkdir /var/lib/kafka
mkdir /var/lib/kafka/data

Bu default Kafka doesn’t allow us to delete the topics. To delete the topic we need to edit server.properties file and add the required configuration.

Open the below file

vi /opt/kafka/config/server.properties

add the below line at the end of the file.

delete.topic.enable = true

Also change the logging directory to /var/lib/kafka/data

log.dirs=/var/lib/kafka/data

Now , Save and close the file.

Changing Kafka Directory permissions

Remember that we have created a kafka user for the kafka service.

We need to change the ownership of the folders related to kafka.

chown -R kafka: /var/lib/kafka
chown -R kafka: /opt/kafka

Now it’s time to setup a service for to manage kafka service.But before that we need to install zookeeper package

Installing Zookeeper

Zookeeper is a top-level software developed by Apache that acts as a centralized service and is used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems.

Run the below command to install zookeeper,

apt-get install zookeeperd

We have installed zookeeper on the server.

Creating Service file for kafka & Zookeeper

We have to setup systemd unit files for the kafka and the zookeeper services and also to manage services such as start, stop , restart and enable on system boot.

First we will create service file for zookeeper under /etc/systemd/system directory.

Create a file named zookeeper.service and the add the below contents

[Unit]Requires=network.target remote-fs.target
After=network.target remote-fs.target
[Service]Type=simple
User=kafka
ExecStart=/opt/kafka/bin/zookeeper-server-start.sh /opt/kafka/config/zookeeper.properties
ExecStop=/opt/kafka/bin/zookeeper-server-stop.sh
Restart=on-abnormal
[Install]WantedBy=multi-user.target

Save and close the file.

Now we will create service file for kafka.

Create a file named kafka.service and add the below contents.

[Unit]Requires=zookeeper.service
After=zookeeper.service
Description=High-available, distributed message broker
After=network.target
[Service]Type=simple
User=kafka
ExecStart=/bin/sh -c '/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties'
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal
[Install]WantedBy=multi-user.target

Save and close the file.

Starting Zookeeper & kafka

We have created service files to easily manage kafka and zookeeper services.

Run the below commands to start the kafka service and enable on system boot.

systemctl start kafka
systemctl enable kafka

To check the status of kafka service.

systemctl status kafka
kafka

Run the below commands to start the Zookeeper service and enable it to start on the system boot.

systemctl start zookeeper
systemctl enable zookeeper

Check the status of the zookeeper

systemctl status zookeeper
kafka

Now the kafka and zookeeper services are Up and Running.

Kafka runs on the port 9092 and the Zookeeper runs on the port 2181.

Testing kafka Setup

Now that we have kafka and the zookeeper services are implemented and lets test the kafka installation.

For this testing , We’re going to create a Topic and we will publish and consume the messages to ensure that the kafka is working as expected.

There are two core components when publishing messages in kafka.

Producer : Which Publishes a messages to the topics

Consumer : Which reads messages and data from the topics

Lets create a topic named Test , using below command.

/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Test

You will get a message , Create topic Test

We can create producer using the below script.

Using the producer script we are going to publish a message “Testing kafka” into the Test topic.

echo "Testing Kafka" | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Test > /dev/null

We can create consumer using the below script.

The below command will consume messages from the Test topic .

We will use –from-beginning flag , Which will consume all the messages that was published to the topic before the consumer was created.

/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test --from-beginning

you will receive the message that was published to the topic as an ouput while running the above command.

Conclusion

We have successfully implemented a Kafka and zookeeper on the Ubuntu server.

Hope you find it helpful.Please do check out my other articles.