My Weird Homelab: Docker Swarm and Nix
Author | Max Niederman |
---|---|
Published | 2023-08-28 |
Tags | |
Description |
How I use Nix to manage a Docker Swarm cluster declaratively. |
I’ve been using Docker Swarm since I first started homelabbing in 2021. It’s overall quite good software, easy to set up, and simple to use. It may not be as powerful as Kubernetes, but it’s plenty for my usecase, which is self-hosting a few services like a media server, Discord bots, and game servers.
As the number of services I hosted grew, though, my Compose YAML files grew ever larger and more repetitive.
Having recently discovered the joys of NixOS’s declarative system configurations, I decided Nix would be a good choice to generate the Compose files.
Nix → YAML
The core of my solution is to take Nix expression and convert them to docker-compose.yaml
files:
let
format = nixpkgs.formats.yaml { };
in
format.generate "docker-compose.yml" {
# just a regular Docker Compose file
version = "3";
networks = {
internal.driver = "overlay";
};
services = {
app = {
image = "me/myapp";
networks = [ "internal" ];
};
db = {
image = "redis";
networks = [ "internal" ];
};
};
}
evaluates to a docker-compose.yml
file in the Nix store.
This already offers significant advantages over writing YAML directly. Repetition can be reduced by using programmatic abstractions. For example, if I needed to mount the same set of many volumes in many services, I could just bind the list of volumes to a variable, like this:
let
myVolumes = [
"/foo/bar:/foo/bar:ro"
"/foo/baz:/foo/baz:ro"
"/foo/qux:/foo/qux:ro"
];
in
{
services = {
svc1 = {
# ...
volumes = myVolumes;
};
svc2 = {
# ...
volumes = myVolumes;
};
svc3 = {
# ...
volumes = myVolumes;
};
};
}
NixOS and NixOps
In addition to the Docker Swarm stacks, there’s a bunch of software and configuration on each machine in the cluster:
- NFS Mounts/Exports
- Static IPs and Custom Nameservers
- Tailscale
- Prometheus Node Exporter
- SSH Keys
To configure these declaratively, I run NixOS on each node and deploy new system configurations to all of them at once using NixOps.
Using NixOps also allows me to test deployments on virtual machines, and if I ever decide to move my homelab to the cloud, it also supports deploying to AWS, GCP, and Hetzner.
Shared Networks
Most networks work exactly the same way as with writing Compose files manually:
just specify them in the networks
attribute of the stack and then reference them in services.<svc_name>.networks
.
However, my setup uses a few networks which are shared between multiple stacks.
Namely public
, which is used to expose services through Traefik, my reverse proxy,
and monitoring
, which is used primarily to expose metrics to my Prometheus database.
These networks can’t be can’t be declared using Docker Compose;
instead, they’re created imperatively and then referenced in the networks
attribute:
# create the network imperatively one time
docker network create --driver overlay --subnet 10.0.10.0/24 public
# reference it in each Compose files
networks:
public:
external: true
This is annoying, because you have to manually keep track of each existing network and their subnets.
I wanted a more declarative way to manage these networks,
so next to my stack specifications I created a file called networks.nix
:
{
public = {
subnet = "10.0.10.0/24";
};
monitoring = {
subnet = "10.0.20.0/24";
};
}
To deploy the networks,
I generate two shell scripts, homelab-networks-create
and homelab-networks-destroy
,
by iterating over the network definitions in networks.nix
.
The network definitions can also be referenced in any of the stack specifications, so there’s a single source of truth for the definition of each shared network.
Persistent Storage
Managing persistent state is perhaps my least favorite thing about writing software. It takes a fun task and turns it into a complex but ultimately boring dance of databases and redundancy.
With that in mind, I’ve tried to keep my clusters’ persistent storage as simple as possible: one machine exports an NFS filesystem which is mounted by every other machine. The lack of redundancy isn’t ideal, but so far it’s caused none of the many issues I’ve had.
One problem this boring approach doesn’t solve is organization: nearly every service needs to mount at least one directory, and it was very cumbersome to ensure every directory existed and had the correct permissions set. This was particularly annoying when I needed to delete the state of a stack and then recreate all the directories from scratch.
To get around this,
I added an extra property to each stack definition file called binds
:
rec {
binds =
let
gen = lib.stacks.getBindTarget "mystack";
in {
a = gen "a";
b = gen "b";
};
# the compose file is moved to an attribute of the stack definition
compose = {
services.foo = {
# ...
volumes = [
"${binds.a}:/a"
"${binds.b}:/b"
];
}
};
}
This way, the lib.stacks.getBindTarget
function imposes a particular directory structure for every stack.
Then, I generate a homelab-deploy
script which
creates all the bind directories and sets their permissions by reading from the binds
property,
as well as calling docker stack deploy
to start the service containers.
Was it Worth the Effort?
There are definitely a few things I would change about my implementation today, but overall I’m quite happy with it.
Most of all, it’s extremely nice to be able to deploy changes without
manually copying files over or even directly ssh
-ing into anything.
I also have all my configurations in a single Git repository,
and could easily change my hardware by just editing a few files.
I think there’s some real potential for integrating Nix with a purpose-built orchestrator, although I don’t really know enough to work on the problem myself.