How To Setup Kafka And Zookeeper On Centos 7

In this guide we will learn how to install and configure kafka and Zookeeper on Centos 7.

What Is Kafka?

  • Apache kafka is a distributed streaming platform.
  • It is a popular distributed message broker designed to efficiently handle large volumes of real-time data.
  • A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ.
  • It is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

Three Capabilities Of Streaming Platform

  • Publish and subscribe to the stream of records
  • Stores streams of records in a fault-tolerant durable way
  • Process streams of records

Kafka Is Used For 2 Types Of Applications

  • Helps to build real-time streaming data pipelines that can get data between applications.
  • Helps to build real-time streaming applications that react to the streams of data.

5 Core API’s Of Kafka

Here is the lists of API’s in the Kafka system.

PRODUCER API

It allows the application to publish stream of records to one or more topics.

CONSUMER API

It allows an application to subscribe to topics and process the streams of records.

STREAMS API

It helps to transform the input stream to an output stream.It allows applications to act as a stream processor , consuming input streams from topics and producing output streams to output topics.

CONNECTOR API

Helps to build and run reusable producers or consumers that connect kafka topics to the existing applications.

ADMIN API

It helps to manage and inspect kafka objects such as topics , brokers etc.

PreRequisites

  • A Centos 7 server with sudo or root privileges.
  • Kafka requires a server with minimum 4GB of RAM to run.
  • Java should be installed on the server

Installing JAVA

First we need to install Java on the system as Java is the dependency for the kafka to run.

Lets install OpenJDK 11 on the server.

Run the below command to install the Java.

sudo yum install java-11-openjdk.x86_64 -y

Once the package is installed , Check the installed Java version using the below command.

java -version

If you have installed multiple versions of Java in the same system and If you wish to switch the version of Java , Run the below command.

sudo update-alternatives --config java

It asks us to choose the version of java you require from the available installations.

provide the selection number and Enter.

Now that you have switched the version of Java as per the requirement.

Creating Kafka User

We should create a separate user for Kafka service as the Kafka service is communicated over the network.

Lets go ahead and create a kafka user.

sudo useradd -r -s /bin/false kafka

The above command will just create a user without home directory.

Once the user is created , we can move on to kafka installation.

Download & Extract Kafka

We need to download the kafka as the binary into the folder , extract it and start configuring kafka.

Lets create a folder named kafka under /opt

cd /opt/
mkdir kafka

Lets download the latest version of kafka in the /opt directory.

wget https://downloads.apache.org/kafka/2.5.0/kafka_2.13-2.5.0.tgz

Once the package is downloaded , we have to extract it using below command , and store all the extracted files into the kafka folder.

tar xvzf kafka_2.13-2.5.0.tgz --directory kafka --strip-components 1

Now we have the required files and dependencies for the kafka.

Configuring Kafka

We need to create directories to store the kafka data.

Create a folder named kafka under /var/lib

And a folder named data under /var/lib/kafka

mkdir /var/lib/kafka
mkdir /var/lib/kafka/data

Bu default Kafka doesn’t allow us to delete the topics. To delete the topic we need to edit server.properties file and add the required configuration.

Open the below file

vi /opt/kafka/config/server.properties

add the below line at the end of the file.

delete.topic.enable = true

Also change the logging directory to /var/lib/kafka/data

log.dirs=/var/lib/kafka/data

Now , Save and close the file.

Changing Kafka Directory Permissions

Remember that we have created a kafka user for the kafka service.

We need to change the ownership of the folders related to kafka.

chown -R kafka: /var/lib/kafka
chown -R kafka: /opt/kafka

Now it’s time to setup a service for to manage kafka service.But before that we need to install zookeeper package.

Installing Zookeeper

Zookeeper is a top-level software developed by Apache that acts as a centralized service and is used to maintain naming and configuration data and to provide flexible and robust synchronization within distributed systems.

Lets download the latest version of Zookeeper from official website into /opt directory in the system.

wget http://apachemirror.wuchna.com/zookeeper/zookeeper-3.6.1/apache-zookeeper-3.6.1-bin.tar.gz

Extract the downloaded package using the below command.

tar -xvzf apache-zookeeper-3.6.1-bin.tar.gz

You will find a folder named apache-zookeeper-3.6.1-bin.

Lets setup zookeeper following the below commands.

cd apache-zookeeper-3.6.1-bin
mkdir data
cp conf/zoo_sample.cfg conf/zoo.cfg

Start the Zookeeper server using the below command.

bin/zkServer.sh start

Zookeeper is started.

Also we need to create a zookeeper user and the zookeeper files and folders should be owned by it.

sudo useradd -r -s /bin/false zookeeper
chown zookeeper: -R apache-zookeeper-3.6.1-bin

Creating Service File For Kafka & Zookeeper

We have to setup systemd unit files for the kafka and the zookeeper services and also to manage services such as start, stop , restart and enable on system boot.

First we will create service file for zookeeper under /usr/lib/systemd/system directory.

Create a file named zookeeper.service and the add the below contents

vi /usr/lib/systemd/system/zookeeper.service
[Unit]
Description=Zookeeper Service

[Service]
Type=simple
WorkingDirectory=/opt/apache-zookeeper-3.6.1-bin/
PIDFile=/opt/apache-zookeeper-3.6.1-bin/data/zookeeper_server.pid
SyslogIdentifier=zookeeper
User=zookeeper
Group=zookeeper
ExecStart=/opt/apache-zookeeper-3.6.1-bin/bin/zkServer.sh start
ExecStop=/opt/apache-zookeeper-3.6.1-bin/bin/zkServer.sh stop
Restart=always
TimeoutSec=20
SuccessExitStatus=130 143
Restart=on-failure

[Install]
WantedBy=multi-user.target

Run the below command to start the Zookeeper service , If the zookeeper service is already running , kill the process and run the below commands.

systemctl daemon-reload
systemctl start zookeeper

Check the status of the zookeeper service,

systemctl status zookeeper

Enable the service to auto start on system boot up,

systemctl enable zookeeper

Lets setup systemd service file for kafka.

Create a file named kafka.service and add the below contents to it.

vi /usr/lib/systemd/system/kafka.service
[Unit]
Requires=kafka.service
After=kafka.service
Description=High-available, distributed message broker
After=network.target

[Service]
Type=simple
User=kafka
ExecStart=/bin/sh -c '/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties'
ExecStop=/opt/kafka/bin/kafka-server-stop.sh
Restart=on-abnormal

[Install]
WantedBy=multi-user.target

Save and close the file.

Run the below commands to start the service,

systemctl daemon-reload
systemctl start kafka

Check the status of kafka service using the below command.

systemctl status kafka

Enable the service to auto start on system boot up,

systemctl enable kafka

Now the kafka and zookeeper services are Up and Running.

Kafka runs on the port 9092 and the Zookeeper runs on the port 2181.

Testing Kafka Setup

Now that we have kafka and the zookeeper services are implemented and lets test the kafka installation.

For this testing , We’re going to create a Topic and we will publish and consume the messages to ensure that the kafka is working as expected.

There are two core components when publishing messages in kafka.

Producer : Which Publishes a messages to the topics

Consumer : Which reads messages and data from the topics

Lets create a topic named Test , using below command.

/opt/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic Test

You will get a message , Created topic Test

We can create producer using the below script.

Using the producer script we are going to publish a message “Testing kafka” into the Test topic.

echo "Testing Kafka" | /opt/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic Test > /dev/null

We can create consumer using the below script.

The below command will consume messages from the Test topic .

We will use –from-beginning flag , Which will consume all the messages that was published to the topic before the consumer was created.

/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic Test --from-beginning

you will receive the message that was published to the topic as an ouput while running the above command.

Conclusion

We have successfully implemented Kafka and zookeeper on Centos 7.

Hope you find it helpful.Please do check out my other articles.