Setting up a ZFS backup with Syncoid, discovering why it's slow

For many years I’ve been running a NAS with ZFS to store my files. Some time ago, I decided to stop being lazy and set up a backup ZFS system, because RAID is not a backup . For now, the backup machine is sitting a few feet away from the NAS, which doesn’t really protect me against some catastrophic scenarios, but at least it’s a start. At some point I’ll figure out a way to have an off-site backup solution that I’m comfortable with (privacy, price, easy to use are things I care about).

With this context, I want to tell two short stories.

Doing ZFS things without needing sudo

This is about how to get syncoid running in a way that’s acceptable for me. Syncoid is pretty great to sync ZFS datasets and snapshots between machines. I didn’t find anything easy enough that handled as many edge cases as syncoid does, so I decided to use it. The catch is that if you try to sync a ZFS dataset between two machines, something like syncoid pool/dataset user@remote:pool/dataset, you’ll eventually see syncoid throwing a sudo error: sudo: no tty present and no askpass program specified. That’s because it’s trying to run a sudo command on the remote, and sudo doesn’t have a way to ask for a password with the way syncoid’s running commands in the remote.

Searching online, I found many people just saying to enable SSH as root, which might be fine on a local network, but I don’t really like this.

ZFS has a neat feature that lets you give permissions on certain operations to other users, which means you won’t need sudo to perform those operations. It’s pretty simple, and this is what I settled on for a syncoid + sanoid setup.

On the machine with the data you want to backup:

$ sudo zfs allow -u sidhion send,snapshot,hold,mount,destroy pool0

On the backup machine:

$ sudo zfs allow -u sidhion compression,mountpoint,create,mount,receive,rollback,destroy pool0

If it’s not obvious: sidhion is the name of the user and pool0 the name of the pool involved in the backup.

Original solution using passwordless sudo

Instead, I’m more comfortable just enabling passwordless sudo for zfs commands on my user. Getting this done was very simple:

sudo visudo /etc/sudoers.d/zfs_receive_for_syncoid

And then fill it with the following:

<your user> ALL=NOPASSWD: /usr/sbin/zfs *

If you really want to put in the effort, you can even take a look at which zfs commands that syncoid is actually invoking, and then restrict passwordless sudo only for those commands. It’s important that you do this for all commands that syncoid uses. Syncoid runs a few zfs commands with sudo to list snapshots and get some other information on the remote machine before doing the transfer. I had initially limited passwordless sudo only for zfs receive *, and spent quite some time to figure out why syncoid was always trying to sync from the first snapshot — in reality it just wasn’t able to list snapshots on the remote machine, so it thought that there were none!

Debugging slow transfer speeds

After having some fun with the issue above, I noticed that the transfer speeds were really low, nearing 11MiB/s on a gigabit link. My machines are somewhat old, but not that old that they can’t handle gigabit ethernet, so I decided to investigate.

I ran iperf -s in one of the machines, and iperf -c <remote ip> -d on the other machine to check whether this was a networking problem or something else (syncoid does some compression and buffering to try to make things faster, so there could be something going on there). To my surprise, I got close to 100MiB/s in one direction, and about 20MiB/s in the other direction. Looks like it’s network-related.

I ran ethtool on both ends to check if there was anything weird going on, and surely enough, the remote machine reported a speed of 100Mb/s, while the ZFS NAS reported 1000Mb/s. Both machines support gigabit ethernet, so after a bit of thinking, I thought this might be a bad cable. To quickly confirm this theory, I checked my router, which helpfully lights an extra LED when the link is gigabit. There was only one LED coming from the remote machine, so that was that. Replaced the cable with a different one, and the transfer speeds increased 6 to 7 times. Yay!

As I write this, syncoid is still syncing the entire dataset to the other machine, but from what I’ve seen, looks like I’ll be a happy user of this tool. I’ve been thinking about investigating Nix and NixOS and eventually migrate these two ZFS machines (which are currently on Ubuntu) to NixOS, and make my life easier in the future whenever I need to set things up in another machine. Nix and NixOS kind of remind me of the Yocto project , something I’ve worked with many years ago when developing firmware for some devices. I really enjoyed Yocto, it was likely one of the first open source projects that I thought was really well-polished. I might make a post about Nix and NixOS in the future if/when I get to explore it some more.