Test driving Kamal deploy

What is Kamal

Kamal is a general-purpose tool to deploy web applications anywhere from bare metal to cloud VMs. It was created at 37signals to help them move off expensive cloud servicesto their own servers. It was created for Ruby on Rails originally, but works for applications that can be containerized, which is basically anything.

How I got pulled in

One of Kamal’s propositions is to manage deployments while avoiding the complexity of Kubernetes. Kubernetes has recently become super over-hyped with all the usual symptoms - organizations want to move to Kubernetes no matter whether it is suitable for their workloads or whether they need such flexibility in scaling. When they migrate to Kubernetes, they do not use it properly, and they lack the resources or expertise. We see our clients often do not need Kubernetes for their apps, so I was curious to see whether Kamal could be a viable alternative to container services in public clouds such as AWS ECS, Azure Container Apps, or to platforms as a service such as Heroku or Render. There is one other particular scenario we see on the market where Kamal could also provide the benefit of simplicity - using containerized applications in on-premise infrastructure.

Testing setup and scope

I decided to test drive Kamal in two scenarios:

Two Ruby on Rails applications deployed to a fleet of AWS EC2 instances
Deploying a static website to a VM hosted on VMWare hypervisor

The questions I wanted to answer through this testing were:

How easy is it to set up and use?
Is it able to manage multiple applications on the same “cluster” of servers?
Can it reliably manage TLS certificates automatically?
What are the options for built-in proxying and handling incoming traffic?
How to perform one-off tasks?

I deliberately excluded testing one of Kamal’s features called accessories. It is the ability to deploy application support services such as databases, cache, etc. alongside the applications. A PostgreSQL database, Redis cache, and RabbitMQ queues were relevant for the Rails applications. I decided not to test this because Redis and RabbitMQ are no longer relevant with Ruby on Rails 8. For caching and queuing, Rails provides built-in replacements - Solid Cache and Solid Queue, both with a database backend. That leaves just the database on the table. And with the database, I consider using a managed service in most cases (such as AWS RDS) the best option.

The Smooth Parts

Installation and basic usage

There is not much to say here for the “human operator” part of the installation. With Rails, you have everything set up out of the box when you bootstrap a new application. For bringing your own application, just follow the super short get started guide and you’re good to go.

Once you have Kamal installed, you have the kamal command available. Usage via CLI is pretty straightforward, although the commands can be a little cryptic sometimes until you get familiar with them. Such as in the case when an upgrade of running kamal-proxy was required because of Kamal’s upgrade itself. And you have: kamal proxy restart, kamal proxy boot, and kamal proxy reboot alongside kamal proxy start and kamal proxy stop commands. You have to go to the docs to find out which one gives you a container based on the new image version.

Kamal claims you can just throw a bunch of plain, unconfigured servers at it and it performs whatever configuration is necessary. I assume it means it can install and configure Docker Engine on the target system. I very quickly tested that to find out it does that by using official Docker Engine installation packages. That would probably limit which distros can be used to those supported. On the other hand, you are free to use a system where Docker engine is installed. And that’s what I did by using the latest Fedora Core.

Workloads handling

From a viewpoint, Kamal turned out to be a wrapper around the docker command - when you execute a kamal command, it is “translated” to an appropriate Docker command or commands, and those are executed on a particular or all host server(s). You can see docker commands executed in the log output. If you are familiar with the log output of Capistrano, the log output resembles that. That is nice and helps you debug your configuration if necessary.

I found it handy that there are particular kamal commands for steps in the deployment process as well as one aggregate command to perform the whole process. You can execute kamal deploy to do it all, or you can separately execute commands to build and push the application image to the repository, pull the image on servers, and provision the containers.

The deployments are zero-downtime as kamal-proxy handles switching the traffic between the old container and new one. Standard health checks are used for that. Configuration is easy in Kamal’s config file.

Multiple apps deployed are not an issue - just include the same host servers in multiple applications and they are deployed by Kamal. The only minor drawback is that you suddenly have to a certain degree the same configuration in multiple application repositories, and you can control each particular app when executing kamal app [subcommand] only from that particular directory. Of course, unless you type --config-file= and point it to the appropriate Kamal config file.

The Mixed Bag

When a private container registry is used, such as AWS ECR in my case, credentials are required to be able to call any Kamal command, including those that do not use the registry. I’m not sure whether this is required by Docker Engine, but it was a distraction to provide those in case I just wanted to see the logs from kamal-proxy when troubleshooting.

TLS certificates handling

It turned out quickly that Kamal can automatically handle TLS certificates using Let’s Encrypt only in case the deployments are to a single server. Fortunately, there is a way to tell kamal-proxy to use custom certificates in case the app is deployed to multiple servers. With multiple servers, there has to be a load balancer in front of them, and I find encryption between the load balancer and Kamal hosts crucial for achieving zero-trust architecture.

Also, I experienced problems when issuing a certificate for a single server, but it turned out it could be an issue on the Let’s Encrypt side as these problems appeared and disappeared suddenly. They manifested in logs as:

2025-04-29T12:56:28.635918620Z {"time":"2025-04-29T12:56:28.635730333Z","level":"INFO","msg":"Request","host":"kpoc-app1.zonio.net","port":80,"path":"/.well-known/acme-challenge/i2_BMdRH6c1BZVkL1pYOjTRKxiwSYQpul0_CQhdTmaE","request_id":"71c6b7be-7d4d-4e65-a66f-a91bd34b1ffb","status":200,"service":"","target":"","duration":20586,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":87,"resp_content_type":"","client_addr":"23.178.112.211","client_port":"55495","remote_addr":"23.178.112.211","user_agent":"Mozilla/5.0 (compatible; Let's Encrypt validation server; +https://www.letsencrypt.org)","proto":"HTTP/1.1","scheme":"http","query":""}
2025-04-29T12:56:33.735203318Z {"time":"2025-04-29T12:56:33.734788723Z","level":"INFO","msg":"http: TLS handshake error from 178.255.168.1:65264: acme/autocert: unable to satisfy \"https://acme-v02.api.letsencrypt.org/acme/authz/2369242787/512897904057\" for domain \"kpoc-app1.zonio.net\": no viable challenge type found"}

The DNS challenge request from Let’s Encrypt received HTTP 200 and seems to be fulfilled successfully first. But an error was thrown when trying to access the app over HTTPS later.

The Tough Spots

Host servers configuration

When configuring the Docker registry Kamal should use, there is an option to use Ruby ERB in the configuration file. The documentation shows an example of that for AWS ECR use:

registry:
  server: <your aws account id>.dkr.ecr.<your aws region id>.amazonaws.com
  username: AWS
  password: <%= %x(aws ecr get-login-password) %>

The values could be taken from executing commands (as shown for the password value), from environment variables, or elsewhere. I expected that the same approach would be possible for configuring the host servers, and I would provide the IP addresses to use from OpenTofu output. But it turned out not to be possible (at least as of Kamal version 2.5.3). Hmmm, idea for PR 🤔

Image for cron tasks

Maybe that was too high an expectation or just a general misunderstanding based on how jobs work in Kubernetes, but I assumed there would be a way to specify another image for one-off tasks. That’s not the case as there is only a single image built, and it will be used for any Kamal roles defined in the configuration file. So for running a rake task regularly, I ended up with adding cron to the application image, creating a wrapper so cron logs output to docker, and configuring it to run rake. The configuration can be taken from the diffs below if anyone is interested.

diff --git a/Dockerfile b/Dockerfile
index 4de88ee..c6f27ac 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -16,8 +16,9 @@ WORKDIR /rails

 # Install base packages
index 4de88ee..c6f27ac 100644
--- a/Dockerfile
+++ b/Dockerfile
@@ -16,8 +16,9 @@ WORKDIR /rails

 # Install base packages
 RUN apt-get update -qq && \
-    apt-get install --no-install-recommends -y curl libjemalloc2 libvips postgresql-client && \
-    rm -rf /var/lib/apt/lists /var/cache/apt/archives
+    apt-get install --no-install-recommends -y curl libjemalloc2 libvips postgresql-client cron && \
+    rm -rf /var/lib/apt/lists /var/cache/apt/archives && \
+    rm -rf /etc/cron.*/*

 # Set production environment
 ENV RAILS_ENV="production" \
diff --git a/bin/cron-executor.sh b/bin/cron-executor.sh
new file mode 100755
index 0000000..2c547e9
--- /dev/null
+++ b/bin/cron-executor.sh
@@ -0,0 +1,8 @@
+#!/bin/bash -e
+
+PATH=$PATH:/usr/local/bin
+
+cd /rails
+
+echo "CRON: ${@}" >/proc/1/fd/1 2>/proc/1/fd/2
+exec "${@}" >/proc/1/fd/1 2>/proc/1/fd/2
diff --git a/config/crontab b/config/crontab
new file mode 100644
index 0000000..f6eba30
--- /dev/null
+++ b/config/crontab
@@ -0,0 +1 @@
+* * * * * /rails/bin/cron-executor.sh bin/rails db:version
diff --git a/config/deploy.yml b/config/deploy.yml
index f8260c6..41f1f11 100644
--- a/config/deploy.yml
+++ b/config/deploy.yml
@@ -7,7 +7,15 @@ image: remastermedia/rails8_kamal_poc
 # Deploy to these servers.
 servers:
   web:
-    - 54.236.57.179
+    hosts:
+      - 54.236.57.179
+  cron:
+    hosts:
+      - 54.236.57.179
+    cmd:
+      bash -c "(env && cat config/crontab) | crontab - && cron -f -L 2"
+    options:
+      user: root
   # job:
   #   hosts:
   #     - 192.168.0.1

Conclusion

Overall, Kamal seems to be a viable option for application deployments when the setup does not require features and scale of more complex alternatives such as Kubernetes or public cloud container services. Some trade-offs may have to be made, but they are offset by the price of the infrastructure. I am unable to evaluate Kamal from a reliability standpoint as there has been no production use yet. The wording means that when you read this post, it is going to be served from Kamal-deployed infrastructure. As of now, when preparing it, I experienced a single failure only when kamal-proxy stopped for an unknown reason. The log shows only:

2025-05-29T22:11:46.809160748Z {"time":"2025-05-29T22:11:46.806896428Z","level":"INFO","msg":"Server stopped"}

when a deployment was finished before. And then when the kamal proxy logs command was issued, it recovered itself.

2025-05-30T09:47:26.874603248Z {"time":"2025-05-30T09:47:26.873943355Z","level":"INFO","msg":"Restored saved state","path":"/home/kamal-proxy/.config/kamal-proxy/kamal-proxy.state"}
2025-05-30T09:47:26.876984049Z {"time":"2025-05-30T09:47:26.876741477Z","level":"INFO","msg":"Server started","http":80,"https":443}

Anyway, Kamal is worth for you to trial.

2025-06-24 UPDATE

I found the cause why Kamal proxy suddenly stopped without any log message as described above. The cause was that I used Fedora Core OS that get’s automatically updated. And when the update requires reboot, then it is performed by default. Thus is just sent SIGTERM to Kamal proxy leaving no log message as everything was correct. After the reboot Docker containers with Kamal components did not start automatically.

Fortunately pretty straightforward explanation 😅