Building a Zookeeper ensemble

Some notes on setting up and configuring a Zookeeper ensemble. Many guides use some shortcuts to setting up the ensemble such as running multiple Zookeeper instances on a single machine or running them in Docker. Both require some workarounds to make it work. As a better example, this guide runs three Zookeeper instances across three different VMs.

It’s not quite production ready. There are still some caveats for this development grade setup:

  • My servers are Google Compute Engine (GCE) instances on the default network with a public IP. In production, consider using a custom network, internal IP only and firewall rules to restrict access.
  • My Zookeeper instances are not secured. Zookeeper supports secured encrypted connections for server-server (leader election) and client-server connections. We’ll look at Zookeeper security in a future post.
  • Zookeeper data is written to the Operating System disk. In production, consider attaching a fast dedicated disk for this. You’ll need more than the 7GB or so unused space on the OS disk.
  • Servers are GKE e2-small instances with 2 GB memory and 2 vCPUs. On production, a minimum of 4 GB available memory is recommended to avoid swapping to disk.
  • I run Zookeeper from command line under the current user account. In production, run it as a service using a dedicated service account.

My starting point is three Debian GNU / Linux 10 (buster) instances, one in each of the three Availability Zones of europe-north1 Region. This is a sensible production configuration as it provides resilience against failure of a zone. If you’re running your ensemble on-premises, run each instance in a different data centre (if possible) to provide resilience against loss of a data centre.

Install and configure a single Zookeeper instance

Starting from a fresh Debian GNU / Linux 10 (buster) instance we need to install Java Runtime Environment (JRE). This is easy enough with apt-get. I’ve also installed wget and netcat utilities as we’ll need them later.

sudo apt-get update
sudo apt-get install default-jre wget netcat

Next, download the current Zookeeper binary package (3.6.3 at time of writing), extract it and set the owner to the current user account.

wget https://archive.apache.org/dist/zookeeper/stable/apache-zookeeper-3.6.3-bin.tar.gz
tar -xzf apache-zookeeper-3.6.3-bin.tar.gz
sudo mv apache-zookeeper-3.6.3-bin /usr/local/zookeeper
sudo mkdir /var/lib/zookeeper
sudo chown -R $USER: /var/lib/zookeeper
rm apache-zookeeper-3.6.3-bin.tar.gz 

Now we need a basic Zookeeper configuration file. We’ll run this instance stand-alone for now so we can start from the provided example:

sudo cp /usr/local/zookeeper/conf/zoo_sample.cfg /usr/local/zookeeper/conf/zoo.cfg

Edit the zoo.conf and just set the dataDir to the directory we created earlier:

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/var/lib/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60

That’s all we need for a single instance. Start it up:

/usr/local/zookeeper/bin/zkServer.sh start

and verify it starts by tailing the logs:

tail -n 100 -f /usr/local/zookeeper/logs/zookeeper-$USER-server-zookeeper-1.out

Start 2 additional servers

We need two additional instances for our ensemble. For high availability we’ll run these instances in different GCE zones. We could create two new Debian GNU / Linux 10 (buster) instances and then repeat the steps above. However, it’s simpler just to clone the first instance.

In Google Compute Engine this is easily done by creating a machine image from the running instance and then creating two new instances from the image. Don’t forget to set the machine name and the region / zone of the new instances.

Once this is done, we now have three identical servers running in three zones of the same region. Assuming my GCE projectId is 12345 and I’m using europe-north1 region then the internal network hostnames of my three servers are:

  • zookeeper-1.europe-north1-a.c.zookeeper-12345.internal
  • zookeeper-2.europe-north1-b.c.zookeeper-12345.internal
  • zookeeper-3.europe-north1-c.c.zookeeper-12345.internal

Configure the Zookeeper ensemble

To configure the Zookeeper ensemble, each instance needs a unique ID in a file called myid. On each server, run

cat > /var/lib/zookeeper/myid

and then type in the unique ID number (1 for the first server, 2 for the second, 3 for the third). This creates a file called myid that contains a number and nothing else.

Then add the ensemble configuration to each server’s zoo.cfg. Create one line for each instance in the ensemble in this format:

server.x=fqdn:2888:3888

where:

  • x is the instance’s unique ID (1, 2 or 3 for our 3 node ensemble)
  • fqdn is the Fully Qualified Domain Name of the server it runs on
  • 2888 is the default peer connection port
  • 3888 is the default leader election port

So our configuration looks like

server.1=zookeeper-1.europe-north1-a.c.zookeeper-12345.internal:2888:3888
server.2=zookeeper-2.europe-north1-b.c.zookeeper-12345.internal:2888:3888
server.3=zookeeper-3.europe-north1-c.c.zookeeper-12345.internal:2888:3888

Copy this block to each server’s zoo.cfg

Try it!

With the new configuration in place, start the first server with the same command as before:

/usr/local/zookeeper/bin/zkServer.sh start

and again tail the logs:

tail -n 100 -f /usr/local/zookeeper/logs/zookeeper-$USER-server-zookeeper-1.out

You’ll see that the server fails to start:

021-10-13 16:43:09,550 [myid:1] - WARN  [QuorumConnectionThread-[myid=1]-3:[email protected]] - Cannot open channel to 2 at election address zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/10.166.0.3:3888
java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
        at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.base/java.net.Socket.connect(Socket.java:609)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager.initiateConnection(QuorumCnxManager.java:383)
        at org.apache.zookeeper.server.quorum.QuorumCnxManager$QuorumConnectionReqThread.run(QuorumCnxManager.java:457)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)

This is expected. The Zookeeper ensemble will not start until it reaches quorum (2 out of 3 servers) so this server will wait until one of the other servers starts. When you run the start command on the second, the error clears on the first instance and the ensemble is started.

021-10-13 16:44:54,491 [myid:1] - INFO  [LeaderConnector-zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/1
0.166.0.3:2888:[email protected]] - Successfully connected to leader, using address: zookeeper-2.europe-north1-b.c.zookeeper-12345.internal/10.166.0.3:2888
2021-10-13 16:44:54,517 [myid:1] - INFO  [QuorumPeer[myid=1](plain=[0:0:0:0:0:0:0:0]:2181)(secure=disabled):QuorumP
[email protected]] - Peer state changed: following - synchronization

Run the start command on the third server to get a fully healthy three node Zookeeper ensemble.

Verify Zookeeper ensemble status

Unfortunately, there’s no single command to list all nodes in a Zookeeper ensemble. The best way is to use one of the Zookeeper four letter words to query the status of each server.

First, we need to whitelist the stat four letter word. Add this line to the zoo.cfg and restart each Zookeeper instance:

4lw.commands.whitelist=stat

Then send the stat four letter word to each server in the ensemble. Each server will return its status including mode: follower or leader. We expect one leader and all other servers to be followers.

The commands we’ll run are:

echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181

Here’s a healthy Zookeeper ensemble:

[email protected]:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
 /10.166.0.3:53434[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/1.4286/6
Received: 11
Sent: 10
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: follower
Node count: 5
[email protected]:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
 /10.166.0.3:37170[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 1/2.0/4
Received: 10
Sent: 9
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: leader
Node count: 5
Proposal sizes last/min/max: 48/48/48
[email protected]:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
 /10.166.0.3:43686[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0.0/0
Received: 1
Sent: 0
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: follower
Node count: 5

Here’s an ensemble where one server is offline. Zookeeper is quorate with two out of three servers so this ensemble is still functioning:

[email protected]:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
 /10.166.0.3:53468[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/1.4286/6
Received: 12
Sent: 11
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: follower
Node count: 5
[email protected]:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
Zookeeper version: 3.6.3--6401e4ad2087061bc6b9f80dec2d69f2e3c8660a, built on 04/08/2021 16:35 GMT
Clients:
 /10.166.0.3:37204[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 1/2.0/4
Received: 11
Sent: 10
Connections: 1
Outstanding: 0
Zxid: 0x400000006
Mode: leader
Node count: 5
Proposal sizes last/min/max: 48/48/48
[email protected]:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
zookeeper-3.europe-north1-c.c.zookeeper-12345.internal [10.166.0.4] 2181 (?) : Connection refused

Finally, here is a non-quorate ensemble. Two servers are offline and so the remaining server reports that it cannot serve requests:

[email protected]:~$ echo stat | nc zookeeper-1.europe-north1-a.c.zookeeper-12345.internal 2181
zookeeper-1.europe-north1-a.c.zookeeper-12345.internal [10.166.0.2] 2181 (?) : Connection refused
[email protected]:~$ echo stat | nc zookeeper-2.europe-north1-b.c.zookeeper-12345.internal 2181
This ZooKeeper instance is not currently serving requests
[email protected]:~$ echo stat | nc zookeeper-3.europe-north1-c.c.zookeeper-12345.internal 2181
zookeeper-3.europe-north1-c.c.zookeeper-12345.internal [10.166.0.4] 2181 (?) : Connection refused

Leave a Reply

Your email address will not be published. Required fields are marked *