NixOS is a good server OS, except when it isn't

Ever since I built my first NixOS system (I started by building a custom image to upload on DigitalOcean), I’ve been bothered by one thing: the default installation size is large. To give you an idea, this simple system (using flakes):

nixpkgs.lib.nixosSystem {
    system = "x86_64-linux";
    modules = [
        (nixpkgs.outPath + "/nixos/modules/profiles/minimal.nix")
        (nixpkgs.outPath + "/nixos/modules/profiles/headless.nix")
        {
            fileSystems."/".device = "/dev/sda1";
            boot.loader.systemd-boot.enable = true;
        }
    ];
}

ends up taking ~900MB of disk size on my system1! Minimal and headless!

When I started working on improving this, I expected the eventual blog post to be very different than what it became, but you can’t win everything in life. There’s a bit of pain ahead.

The context

I really like Nix and NixOS (I wouldn’t be spending time helping with their documentation otherwise). After spending some time managing NixOS servers, I really can’t see myself going back to other systems unless required by some external factor. I’m also working on a system that has worker machines which will spin up a bunch of microVMs.

Naturally, I wanted to use NixOS both for the worker machines and the microVMs themselves. Currently, the system on the microVMs is taking ~210MB (including kernel) of disk space, but it’s based on Alpine. The worker machines are already using NixOS, but I’d like them to be as lean as possible.

NixOS makes it very simple to manage a server from the outside. You can push an entirely new system configuration without the server changing its behaviour, and then almost atomically switch the server to the new configuration. You can easily configure the whole thing deterministically, deploy the same configuration to multiple servers, and even deploy the same configuration under a VM so you can locally test things if you wish to.

I envisioned a world where all my worker machines ran the bare minimum software required for things to work, which would be an amazing help to lock the system down, prevent any escalations in case some piece of software was broken into, and would also make deployments and tests faster.

And if I could achieve something like that on those machines, why not extend this to the OS running on the microVMs and keep things really lean? This would be super helpful to cut boot times as much as possible, short of using a unikernel.

I knew from my previous experience with NixOS that it didn’t generate lean images by default, so a couple days ago I started looking into this to see if I could fix things, or at least significantly improve them.

Important: what follows is an analysis of NixOS specifically for the purposes stated here. I don’t really care about having some runtime-flexible server OS which lets me install packages and configure things ad-hoc, I want a thin, locked-down server with the single purpose of running the software I declare, and not a single extra tool in it. (yes, I know I essentially want containers without containers. I comment on this at the end.)

Figuring out package dependencies and their sizes

A curious thing about the Nix ecosystem is that it has some pretty powerful tools, but they’re severely underdocumented, sometimes functionality is hidden by their naming, and/or some tools have some really specific assumptions, which makes it harder to use them more generally.

One such tool is nix-store --query (to be honest, it is way more known than some other more “obscure” tools). More specifically, nix-store --query --tree will give you a tree of packages2, starting from a package you specify, and show you the dependencies of that package, and their dependencies, and so on… Running it will give you some output like this:

/nix/store/g4ppw7x76dyykj33x99xzf30zq5ym29z-nixos-system-nixos-24.05.20240323.44d0940
├───/nix/store/09fpwkb108ckhljahy7p84if7m8qh1wh-firmware
├───/nix/store/0v0wrr6ngh9d487lhwicwr5z61kz40zw-kmod-31
│   ├───/nix/store/1rm6sr6ixxzipv5358x0cmaw8rs84g2j-glibc-2.38-44
│   │   ├───/nix/store/3sxwxqzkkrgpgaibkm27ggb9kjbzdy31-xgcc-13.2.0-libgcc
│   │   ├───/nix/store/n9sq1bvghs9z0qg6cmwg27y4jmszwgqi-libidn2-2.3.7
│   │   │   ├───/nix/store/77yhmwrwism02371kzyda4d127kdwdnf-libunistring-1.1
│   │   │   │   └───/nix/store/77yhmwrwism02371kzyda4d127kdwdnf-libunistring-1.1 [...]
│   │   │   └───/nix/store/n9sq1bvghs9z0qg6cmwg27y4jmszwgqi-libidn2-2.3.7 [...]
...

To complement that, nix-store --query --size gives you (roughly) the size that a specific package takes on disk. It’s slightly more complicated than this, but for our purposes it will be enough to understand how much disk is used.

There are some tools which help visualise all this information in cool ways. Two of my favorites are nix-tree and nix-visualize. However, ideally I wanted an interactive graph so I could see each node in the graph by their size on disk, and inspect their dependencies, search things, and so on. nix-visualize was the closest of the tools to give me a graph, but it wasn’t interactive and the node sizes weren’t based on disk usage, so I decided to write my own.

It took me some hours to come up with code that generated a graphviz file, with node sizes based on disk usage. Coupled with vscode-interactive-graphviz, I felt like I had a good approach to interactively working with the graph, but the visualisations turned out to be too crowded. I tried to add some more space into things, but it was kind of a hack because graphviz likes to be the one to position elements. In the end, I gave up on that idea and decided to just generate a CSV, which worked way better than I expected. No wonder we still use spreadsheets for a lot of things!

The repository with the code and the final config of the NixOS system from this post is here.

An investigation of a minimal, headless NixOS system

With a way to see each package, its disk usage, and all its dependencies, let’s look at the minimal, headless system I mentioned in the beginning of the post. The one that takes ~900MB.

A list of the heaviest packages by disk usage, viewed with the Edit csv extension for VSCode

A list of the heaviest packages by disk usage, viewed with the Edit csv extension for VSCode

Each subsection below will be a small report of me investigating some items in this CSV3. It starts “easy” and gets progressively more complicated. Feel free to skim and skip any part if you don’t feel like it.

Getting rid of Nix (~179MB reduction)

The heaviest item in that list is a mysterious “source” package. A quick look into what the heck could be taking 170MB of disk space shows it’s actually a complete copy of Nixpkgs!

❯ ls /nix/store/amxd2p02wx78nyaa4bkb0hjvgwhz1dq7-source
CONTRIBUTING.md  README.md    doc        lib          nixos
COPYING          default.nix  flake.nix  maintainers  pkgs

Searching for that package’s pos (just an identifier I used in the code that generates the CSV and the graphviz files) shows that it’s only used by this other package:

etc-nix-registry.json using that package

That package is a single file which doesn’t have a lot in it other than a link to the “source” package. A search through Nixpkgs shows the file coming from here, the actual content of registry coming from here, and the source attribute being set here.

I’m building this system with flakes and I’m using that nixosSystem function from Nixpkgs’s flake.nix, which means by default I get this extra 170MB in the system. I think it would’ve been easy to just undo what Nixpkgs’s flake.nix is doing, but if you look at the list of the heaviest packages again, you’ll see that Nix itself is the 10th heaviest package in the system.

Nix also pulls a lot of dependencies, each one taking quite some space as well (for example, aws-sdk-cpp-1.11.207 eats another 5.7MB by itself, and is only used by Nix).

After some thinking, I realised that I don’t need Nix in any of these systems. I definitely don’t need it in a microVM, but I also don’t need it in my servers, because I’m building their configurations in an external machine and deploying the built bits directly. So let’s add this to the system configuration:

nix.enable = false;

After rebuilding the system, we’re at ~733MB.

Getting rid of Perl, Python (~242MB reduction)

After removing Nix, the 2nd heaviest package is Python3, and 3rd is Perl.

Python only comes in because of install-systemd-boot.sh (truly a shame, why waste so much disk space like this!), and Perl comes in through a bunch of perl-envs (search for perl-5.38.2-env in the CSV and you’ll see them). Those perl-envs are all used in the top-level package, so let’s figure out where they’re being used there:

❯ grep -nr 'perl-5.38.2-env' /nix/store/7z0y5sscnpx4hczzkjh3jvjgn2mq3106-nixos-system-nixos-24.05.20240323.44d0940
/nix/store/7z0y5sscnpx4hczzkjh3jvjgn2mq3106-nixos-system-nixos-24.05.20240323.44d0940/dry-activate:23:/nix/store/d3qxgm4ffhi2ixx3n9clwqlr6z21dd8i-perl-5.38.2-env/bin/perl \
/nix/store/7z0y5sscnpx4hczzkjh3jvjgn2mq3106-nixos-system-nixos-24.05.20240323.44d0940/activate:43:/nix/store/d3qxgm4ffhi2ixx3n9clwqlr6z21dd8i-perl-5.38.2-env/bin/perl \
/nix/store/7z0y5sscnpx4hczzkjh3jvjgn2mq3106-nixos-system-nixos-24.05.20240323.44d0940/activate:63:/nix/store/zkmm5iha0rsm4ypwfc67byq52gz0jb8b-perl-5.38.2-env/bin/perl /nix/store/rg5rf512szdxmnj9qal3wfdnpfsx38qi-setup-etc.pl /nix/store/jq5a0yw04ichvggf7dx80xc438z2v1gv-etc/etc
/nix/store/7z0y5sscnpx4hczzkjh3jvjgn2mq3106-nixos-system-nixos-24.05.20240323.44d0940/bin/switch-to-configuration:1:#! /nix/store/8mlvyl3sab5hxpxz2naz5g2sfd42a40q-perl-5.38.2-env/bin/perl

To make it easier to parse this bunch of text: Perl is used in the dry-activate, activate, and bin/switch-to-configuration scripts. dry-activate only needs Perl to run the update-users-groups.pl script, while the activate script runs the same script and also setup-etc.pl, and bin/switch-to-configuration is a Perl script from the beginning.

Fun fact: the minimal profile disables man pages and most other documentation bits, but the perl man pages are the only thing that still get included in the system because of the perl-envs!

I thought Perl was going to be hard to remove, but I was determined to at least take a look. After all, judging only by the naming, update-users-groups.pl doesn’t seem like the kind of thing I need - I don’t expect my servers to create any extra users or groups dynamically, so there’s nothing to update.

(note that I have no idea what update-users-groups.pl actually does, this was just my thinking from reading its name)

I decided to search Nixpkgs for that script name to get an idea of how it was being added to the system. It was through this search that I stumbled upon a Nixpkgs tracking issue called Perlless Activation - Tracking Issue.

Someone decided it wasn’t a good idea to have Perl in the base NixOS system for slightly different reasons, and they did a lot of work to get rid of it! Luckily for me, I could piggyback off their work and include the following module in my system configuration:

modules = [
    ...
    (nixpkgs.outPath + "/nixos/modules/profiles/perlless.nix")
];

After rebuilding the system, we’re at ~491MB. As a bonus, Python is now gone as well!

Deduplicating systemd (~14MB reduction)

systemd is now the 2nd heaviest package. It has some stuff inside that I think could be removed, but since it’s an integral part of the system, let’s overlook it for now. Going through the list of packages, what’s this in 5th place?

systemd-minimal is also in the list of packages!

For some reason, our NixOS system has both systemd and systemd-minimal! A look through which packages use systemd-minimal show that only dbus uses it. It comes from here.

Nixpkgs has a lot of packages, and sometimes due to circular dependencies or to keep the size of dependencies smaller, it introduces variants of packages/functions that have reduced functionality. If you contribute to Nixpkgs, chances are that at some point, someone will give some feedback on ways you can use variants that have a smaller dependency chain, or smaller size (a common example is using stdenvNoCC instead of stdenv). systemd-minimal probably exists to avoid certain circular dependencies, but I’m not sure. It’s defined here.

In any case, I’d like to get rid of systemd-minimal, since we already have the full systemd in our system anyway. There is no easy way to override the package used by the NixOS module that brings in dbus, so we’ll have to add a Nixpkgs overlay to change the dbus package directly:

nixpkgs.overlays = [
(
    self: super:
    {
        dbus = super.dbus.override {
            systemdMinimal = self.systemd;
        };
    }
)
];

After rebuilding the system, we’re now at ~477MB.

Removing udev, lvm, sudo and security wrappers (~30MB reduction)

This is where things start to get very messy. While looking through the list of heaviest packages, I saw an “hwdb.bin” package which seems linked to udev. I don’t know about udev too much, but it feels like it’s only needed for scenarios that won’t happen on the kind of servers I want to manage.

In case it is actually used for something important and this breaks the system, I have a feeling that a workaround could be hard-coded and wouldn’t require udev anyway. I’d gladly go into that rabbit hole, but (spoiler alert) you’ll see that I gave up well before that.

There’s an option to disable it:

services.udev.enable = false;

While looking through the stuff adjacent to udev, I noticed that lvm is also enabled by default. Similar reasoning to udev, I don’t think I’d need lvm for these servers, so I disabled it.

services.lvm.enable = false;

Before proceeding with more of this, I took a break and looked at some more packages in the list. At that point, it became clear that I’d have to butcher a LOT of NixOS config to remove many packages in there. I made a decision to continue with this exercise, but make it less about keeping a working system throughout and more about understanding the efforts to just get rid of some of these packages. To get to a barebones system, I’d need to remove a lot of them.

While looking through the lvm stuff, I noticed fuse2 and fuse3 are hard-coded by default (and changing those gets complicated quickly). I saw they’re used by some security wrappers, which also set other security wrappers for mount, umount, sudo, and a bunch of other binaries. This is needed because Nix doesn’t support sid/gid binaries by design, so NixOS has a binary that dynamically sets some capabilities and permissions, and then executes any other binary with the elevated bits.

I don’t like having this functionality. Instead of a single wrapper binary which receives an argument with the binary to execute with elevated permissions, I’d rather have X wrapper binaries with hardcoded paths and no parametrisation of any kind (one for each of the X things I want to execute), and that’s only IF I actually need this functionality.

For anything I want to run in these servers, I think I can configure the proper permissions through systemd unit configs.

The security wrappers module doesn’t have an enable option to toggle it off, so one way to get rid of it completely is to add it to the disabledModules attribute. This requires me to provide dummy options that were provided by the security wrappers module earlier, because when building a NixOS system, by default every module gets evaluated (most of them just won’t do anything because they’re not enabled). Some of these modules set additional wrappers, so the dummy options are needed to make the module system happy.

({ lib, pkgs, ... }: {
    disabledModules = [ "security/wrappers/default.nix" ];

    options.security = {
        wrappers = lib.mkOption {
            type = lib.types.attrs;
            default = { };
        };
        wrapperDir = lib.mkOption {
            type = lib.types.path;
            default = "/run/wrappers/bin";
        };
    };

    config = {
        # ...
    };
})

I think doing this could break some script that calls mount or umount or fuse (because those are hardcoded in the security wrappers module), but I also think that most scripts that use those are being run directly as root, so I’m not sure.

To finish this section, let’s also disable sudo completely because it’s useless without its security wrapper.

security.sudo.enable = false;

We’re at ~447MB now.

Some other minimal shenanigans

At this point, the 10th and 11st heaviest packages are util-linux and util-linux-minimal, respectively. Well, this seems similar to that systemd-minimal thing from a while ago!

Let’s look at where these are being used:

  • util-linux
    • system-path, a bunch of systemd services and a “mount-pstore” shell script.
  • util-linux-minimal
    • fuse2 and fuse3, and etc-systemd-system.conf

Removing fuse is very annoying (although to be honest, with all the mess in the config so far, it wouldn’t even look that bad anymore). But we can at least try to make them use util-linux instead of util-linux-minimal, right?

To get there, let’s look at how these packages are declared in all-packages.nix. We’ll need an overlay, but trying to change fusePackages to use the normal util-linux will hit an infinite recursion error, so I’ll start by overlaying fuse3:

nixpkgs.overlays = [(
    self: super: {
        fuse3 = (self.lib.dontRecurseIntoAttrs (self.callPackage (nixpkgs.outPath + "/pkgs/os-specific/linux/fuse") { })).fuse_3;
    }
)];

This builds nicely, but the moment I try to do this with fuse2, the infinite recursion error is back. Sigh. Whatever.

Browsing through the list of packages, I also see systemd-minimal-libs sneaking in there. It’s being used by a bunch of other packages, and it’s equally difficult to add more overlays to get rid of it. Infinite recursions all around.

This is where I look at the current system config, look at all the notes I made of things to look into that I haven’t yet (the list is right there in the next section), think about how much worse it’ll get by trying to fix all of this, and give up.

Things I noted, but didn’t look at

  • With Nix gone, the heaviest package was the linux kernel at ~136MB. I know I can get it down to ~50MB easily (the kernel used by default on NixOS has a lot of modules and extra things that a server doesn’t need), so I left that for later because it was easy.

  • One disadvantage of a perlless system is that it can’t switch to a different configuration at runtime, because the script that does this is written in Perl. This isn’t an issue for most of the servers in my scenario. MicroVMs don’t need that, and I’d be perfectly ok just killing a bunch of other servers and starting new ones with the updated configuration.

    However, I made a note to look into that perl script and figure out how much work it would be to build a replacement. This is something that needs to happen anyway at some point, and would benefit the NixOS community at large.

  • A bunch of packages build with all locales and some internationalisation content. Those end up taking some good space, so I made a note to look at how to simplify this and get rid of most of the locales and files I wouldn’t need.

  • The system has both libressl and openssl packages. libressl is only used for netcat, but I don’t really need it in the servers. In fact, NixOS includes a lot of utilities by default (and marks them as required, making it super annoying to remove them) which aren’t really needed on the servers.

  • A bunch of extra default config files that aren’t needed (such as bashrc) could be removed. This would also remove some packages that they use, such as bash-completion, which won’t ever be needed in the servers.

  • coreutils and util-linux are both kind of heavy, but it’s very likely that the scripts and things that use them only really need a few binaries from each one. Perhaps an overlay that filters the binaries only to the list that are used would help free up quite a bunch of disk space.

  • nix-store has a command to optimise disk space by finding identical files and hard-linking them. This could be helpful in some cases, but might not be possible in others, depending on how a server is imaged, or how new configuration gets pushed to it. It could be useful to decrease the disk space used by some of those “-minimal” packages (as long as they share exactly the same file).

Leaving this to the future

There is a huge audience that uses NixOS as a personal OS, and a lot of the defaults and modules present in NixOS reflect that. NixOS can still be used as a server OS, but it requires a very different set of configurations, and it still ends up not being adequate in every situation.

I can (and will) still apply many of the configs I used in this post to my existing servers and make them leaner, which will already be useful, because it cuts ~300MB of stuff I don’t need. I got some experience and figured out some tools to help me investigate these issues more whenever I feel the need to.

But over the 2 days I spent looking at this, I concluded that trying to mold NixOS into the shape I envisioned just isn’t the way to go, but I also don’t like the other option if I want to stick with it, which is creating a “fork” of NixOS that is very opinionated and completely focused on server scenarios.

I was trying to bring NixOS to a bare minimum, which is an exercise similar to building containers with the bare minimum required for the software in the container to run. I think this is a worthy endeavour. I think we have all the tools in regular non-docker, non-kubernetes linux to get to a similar outcome, except we won’t need docker or kubernetes or whatever in this new land, thus removing quite a bunch of complexity from the systems we build.

But doing it on top of NixOS currently feels like a bad path to take.


  1. actually the top-level system as given by the .config.system.build.toplevel attribute, which covers essentially everything the system needs to run. ↩︎

  2. I’m going to use the term “package” to mean “store object” as defined by the Nix manual, because for most people this is an easier way to reason about store objects. ↩︎

  3. If you want to look at the same CSV I used, you can download it, but you won’t be able to inspect the store paths unless you happen to build the same configuration with the same Nixpkgs version. ↩︎