April 26, 2026
More and more software projects are moving to an event-driven architecture. The biggest reason is maintainability, and let me tell you, they are right. Monolithic architectures are not maintainable if you work in a team of over 20 software engineers. Full stop
One of the ways to set up an event-driven architecture, and also the most preferred one, is Kafka. Kafka is an open-source project from the Apache Foundation. In its essence, it is a distributed log, enabling you to produce and consume messages from topics. We could explore Kafka’s theory further, but I’m not the best person to teach it, since there are books that already do a good job of explaining it.
However, I realized that to use those books effectively, you need an airtight environment for testing. How else are you going to spin up a Kafka cluster with three brokers? This is where HashiCorp Vagrant and Ansible come into play. With those two technologies, we will create and configure a virtual machine (VM). It will have Debian as the operating system (OS) and Kafka installed and ready to use. We will keep it simple, though, and only create one VM instead of three.
First of all, we need to create the Vagrantfile. I will use a base box that I myself have created, as this gives me more control and security about what I deploy on my machine. But you can of course use any base box that runs Debian, for example bentoo/debian-13 by Progress Chef.
Vagrant.configure("2") do |config|
config.vm.box = "debian-13"
config.vm.provision "ansible" do |ansible|
ansible.playbook = "playbook.yml"
end
end
So far, so good. You can always fine-tune the memory and CPU of the VM later if needed. But I like to keep things simple.
The location playbook is also defined in the Vagrantfile, which will configure the VM as soon as we spin it up. This is where it gets interesting.
Ansible is one of those technologies that you don’t know to acknowledge until you actually use it. There is no other way to put it. It has a simple syntax, is very efficient, and takes a ton of work off your plate. And above all, the idea of idempotency in your configuration gives you a sense of stability and flexibility for testing.
Anyway, let’s get to work. What will we have to configure in the Ansible playbook for Kafka to work?
Kafka runs on Java, which means we need the JDK to run the Kafka binaries.
Kafka itself, installed from the official apache.org website; we also need to add Kafka to PATH so the binaries work as expected.
We will use a Kafka folder in the Vagrant user’s home directory to store all our configuration files. This folder is also where Kafka will store its log data.
I’ll skip the earlier steps in Ansible to keep this blog post concise. If you’re interested in seeing those steps, you can find a link to the repository at the end of the article.
Once we installed the dependencies and copied the configuration files to the guest OS, we can create our Kafka cluster ID. Of course, to keep everything idempotent, we should first check if there already is a Kafka cluster running on the guest. And only if there is no existing cluster ID, we may create a new one. This is the concept of idempotency in Ansible.
- name: Check if cluster ID exists
ansible.builtin.stat:
path: "{{ kafka_cluster_id }}"
register: cluster_id_file
- name: Generate Kafka cluster ID if not exist
block:
- name: Generate Kafka cluster ID
ansible.builtin.command: "{{ kafka_bin }}/kafka-storage.sh random-uuid"
register: kafka_uuid
- name: Save cluster ID
ansible.builtin.copy:
content: "{{ kafka_uuid.stdout }}"
dest: "{{ kafka_cluster_id }}"
when: not cluster_id_file.stat.exists
The syntax is pretty self-explanatory. I included the FQDM of the Ansible modules for better readability. If you want to check out what the different parameters for each module do, you can have a look at the modules in the Ansible documentation. Search for one of the modules, for example command module, and open the documentation in Ansible. There, you can find detailed explanation about every parameter for the module.
The last step is to configure our cluster with three brokers.
- name: Read existing cluster ID
ansible.builtin.command: "cat {{ kafka_cluster_id }}"
register: kafka_uuid
- name: Configure Kafka brokers
ansible.builtin.command: >
{{ kafka_bin }}/kafka-storage.sh format
-t {{ kafka_uuid.stdout }}
-c {{ kafka_config }}/kafka{{ item }}.properties
args:
creates: "{{ kafka_data }}/kafka{{ item }}/meta.properties"
loop:
- 1
- 2
- 3
This Ansible task loops over the three configuration files in our config directory. It creates a broker for each file with the cluster ID we created in the previous step. This command also outputs a file in our data directory, which we specify in the configuration files. We use this output file to check if the cluster is already configured with the three brokers. If the file already exists, we skip this step.
That’s it, and the best part is that we don’t have to do manual configuration again because it’s all automated. We just need to create a new VM with Vagrant, and we get a brand new environment with our predefined settings from scratch. This is great not only for testing but also for production, and we should use it whenever possible. The more we use it, the bigger our playbook gets, and that means our production increases exponentially.
If you’re interested in the full source code, you can find it on my GitHub.