My Weird Homelab: Docker Swarm and Nix

Author Max Niederman
Published 2023-08-28
Tags
Description

How I use Nix to manage a Docker Swarm cluster declaratively.

I’ve been using Docker Swarm since I first started homelabbing in 2021. It’s overall quite good software, easy to set up, and simple to use. It may not be as powerful as Kubernetes, but it’s plenty for my usecase, which is self-hosting a few services like a media server, Discord bots, and game servers.

As the number of services I hosted grew, though, my Compose YAML files grew ever larger and more repetitive.

Having recently discovered the joys of NixOS’s declarative system configurations, I decided Nix would be a good choice to generate the Compose files.

Nix → YAML

The core of my solution is to take Nix expression and convert them to docker-compose.yaml files:

let
    format = nixpkgs.formats.yaml { };
in
format.generate "docker-compose.yml" {
    # just a regular Docker Compose file

    version = "3";

    networks = { 
        internal.driver = "overlay";
    };

    services = {
        app = {
            image = "me/myapp";
            networks = [ "internal" ];
        };

        db = {
            image = "redis";
            networks = [ "internal" ];
        };
    };
}

evaluates to a docker-compose.yml file in the Nix store.

This already offers significant advantages over writing YAML directly. Repetition can be reduced by using programmatic abstractions. For example, if I needed to mount the same set of many volumes in many services, I could just bind the list of volumes to a variable, like this:

let 
    myVolumes = [
        "/foo/bar:/foo/bar:ro"
        "/foo/baz:/foo/baz:ro"
        "/foo/qux:/foo/qux:ro"
    ];
in
 {
    services = {
        svc1 = {
            # ...
            volumes = myVolumes;
        };
        svc2 = {
            # ...
            volumes = myVolumes;
        };
        svc3 = {
            # ...
            volumes = myVolumes;
        };
    };
}

NixOS and NixOps

In addition to the Docker Swarm stacks, there’s a bunch of software and configuration on each machine in the cluster:

To configure these declaratively, I run NixOS on each node and deploy new system configurations to all of them at once using NixOps.

Using NixOps also allows me to test deployments on virtual machines, and if I ever decide to move my homelab to the cloud, it also supports deploying to AWS, GCP, and Hetzner.

Shared Networks

Most networks work exactly the same way as with writing Compose files manually: just specify them in the networks attribute of the stack and then reference them in services.<svc_name>.networks.

However, my setup uses a few networks which are shared between multiple stacks. Namely public, which is used to expose services through Traefik, my reverse proxy, and monitoring, which is used primarily to expose metrics to my Prometheus database.

These networks can’t be can’t be declared using Docker Compose; instead, they’re created imperatively and then referenced in the networks attribute:

# create the network imperatively one time
docker network create --driver overlay --subnet 10.0.10.0/24 public
# reference it in each Compose files
networks:
    public:
        external: true

This is annoying, because you have to manually keep track of each existing network and their subnets. I wanted a more declarative way to manage these networks, so next to my stack specifications I created a file called networks.nix:

{
    public = {
        subnet = "10.0.10.0/24";
    };
    monitoring = {
        subnet = "10.0.20.0/24";
    };
}

To deploy the networks, I generate two shell scripts, homelab-networks-create and homelab-networks-destroy, by iterating over the network definitions in networks.nix.

The network definitions can also be referenced in any of the stack specifications, so there’s a single source of truth for the definition of each shared network.

Persistent Storage

Managing persistent state is perhaps my least favorite thing about writing software. It takes a fun task and turns it into a complex but ultimately boring dance of databases and redundancy.

With that in mind, I’ve tried to keep my clusters’ persistent storage as simple as possible: one machine exports an NFS filesystem which is mounted by every other machine. The lack of redundancy isn’t ideal, but so far it’s caused none of the many issues I’ve had.

One problem this boring approach doesn’t solve is organization: nearly every service needs to mount at least one directory, and it was very cumbersome to ensure every directory existed and had the correct permissions set. This was particularly annoying when I needed to delete the state of a stack and then recreate all the directories from scratch.

To get around this, I added an extra property to each stack definition file called binds:

rec {
    binds =
        let
            gen = lib.stacks.getBindTarget "mystack";
        in {
            a = gen "a";
            b = gen "b";
        };
    
    # the compose file is moved to an attribute of the stack definition
    compose = {
        services.foo = {
            # ...
            volumes = [
                "${binds.a}:/a"
                "${binds.b}:/b"
            ];
        }
    };
}

This way, the lib.stacks.getBindTarget function imposes a particular directory structure for every stack. Then, I generate a homelab-deploy script which creates all the bind directories and sets their permissions by reading from the binds property, as well as calling docker stack deploy to start the service containers.

Was it Worth the Effort?

There are definitely a few things I would change about my implementation today, but overall I’m quite happy with it.

Most of all, it’s extremely nice to be able to deploy changes without manually copying files over or even directly ssh-ing into anything. I also have all my configurations in a single Git repository, and could easily change my hardware by just editing a few files.

I think there’s some real potential for integrating Nix with a purpose-built orchestrator, although I don’t really know enough to work on the problem myself.