My Personal Backup System

24 Dec, 2022 · 5 min read · #backup #restic #rclone #b2

Computing devices and online services can fail catastrophically and take our data with them. It is crucial that we have a robust system to backup and restore our data, to protect against such events. This post details what I wanted from the backup system for my personal data and the tools I use to achieve them. This system has served me well over the last 5 years, across fat-fingerings and disk failures.

I wanted my backup system to meet the following criteria.

  1. It must follow the 3-2-1 rule. 3 copies of data, on 2 different storage mediums, with one copy being off-site.
  2. It must support versioned backups.
  3. It must not be limited to specific types of data, like photos or documents.
  4. It must work on Debian Stable, my primary operating system. It will be nice if it also worked on Mac OS X and Windows.
  5. It must not be cloud-only. I must be able to recover quickly from local storage, without downloading GBs of data over the Internet.
  6. I must be able to change the cloud storage provider with minimal effort, ideally with minor changes to configuration.
  7. The backup must be encrypted on the client, without the cloud storage provider having access to the encryption key.
  8. The organization of encrypted files in the backup must not leak information about the organization of my original files.
  9. The storage must be charged per GB used. Pricing plans that offer ranges of storage don’t always fit my needs closely.

I do not care about backing up directly from my mobile. I use Syncthing to sync the directories I care about, like Camera, from my mobile to my desktop.

The Tools #

I use Restic and Rclone to manage my backups. Restic handles chunking, encryption, deduplication and versioning. Rclone handles syncing the Restic repository to cloud storage. Rclone supports a large number of cloud storage providers and Restic can access them through Rclone. Between these, I’m almost completely cloud provider agnostic.

The Tools

I created a Restic repository in an external Hark Disk Drive (HDD) and wrote a shell script that does the following.

  1. Validate that it is run as root. It needs to be run as root as it backs up data owned by multiple users in the computer.
  2. Read the password to the Restic repository.
  3. Copy the password database beside the Restic repository. This contains the password to the repository and must be outside it during recovery.
  4. Backup the data directories of the users to the repository on the HDD.
  5. Forget snapshots (versions), other than daily snapshots in the current week, weekly snapshots in the current month, and monthly snapshots in the last 6 months.
  6. Prune the repository, if the --prune flag is provided.
  7. Sync the contents of the repository to the B2 bucket, from the HDD.
  8. Sync the password database to Google Drive. This keeps it outside B2, but allows restoration without any local source. Details of how I manage my passwords are in this post.
#!/bin/bash -e

if [ `whoami` != "root" ]; then
	echo "This command must be run as root." 1>&2
	exit 1
fi
read -s -p 'Enter Password: ' RESTIC_PASSWORD; export RESTIC_PASSWORD; echo

cskr_home=/home/cskr
user2_home=/home/user2
primary=/media/cskr/backup

cp -a $cskr_home/data/personal.kdbx $primary

for i in $cskr_home $user2_home; do
	echo "Backing up $i/data..."
	restic --quiet --repo $primary/personal backup --exclude 'lost+found' $i/data
	echo "Forgetting expired snapshots of $i/data..."
	restic --quiet --repo $primary/personal forget --path $i/data --keep-within-daily 7d --keep-within-weekly 1m --keep-within-monthly 6m
done

if [ "$1" = '--prune' ]; then
	echo "Pruning the repository..."
	restic --quiet --repo $primary/personal prune
fi

echo 'Syncing to B2...'
rclone -P sync $primary/personal b2:cskr-backup-personal

echo 'Syncing Passwords to Google Drive...'
rclone -P copy $cskr_home/data --include personal.kdbx gdrive:

While Restic can backup directly to B2, I found it to be extremely slow. It also used more chargeable B2 requests than syncing with Rclone. These were my observations about 5 years ago. It may be better now. I’m sticking to Restic+Rclone as it is nice to not chunk, encrypt, and deduplicate twice.

Automation #

As it stands, I need to plug the HDD in and execute the script to take a backup. This has not been a problem as I do it anytime I add large amounts of data, say photos from a vacation. I also managed to remember to execute the script fairly regularly, even if there is no large change in data. This is not ideal and you may want to automate backups. This requires solving 2 problems.

Firstly, your automation needs access to the password to your Restic repository. If you use a systemd timer, you can use the EnvironmentFile key to set the RESTIC_PASSWORD environment variable. Ensure that the access to this file is appropriately restricted. Alternatively, recent versions of systemd (250+) support reading the credentials from an encrypted file, with the key stored in TPM. This makes securing your Restic password easier.

Secondly, you need to work around having to plug the HDD in for backup. You can leave it plugged in, but that’s not convenient. As a compromise, the scheduled execution can perform the backup directly to B2, bypassing the HDD. While this compromises the 3-2-1 rule, it will ensure that you have at least one backup even if you forget to do it manually. You can continue to execute the script manually when you remember. Your HDD and B2 will get synchronized when you do that, but the automated snapshots made directly in B2 will be lost. You can also modify the script to sync the HDD from B2, instead of the other way around. That’ll solve the problem of lost snapshots, but will incur a cost for the download.

Test Recovery #

Your backup is only as good as your ability to recover from it. With a system like this, you should test recovery both from HDD and the cloud storage provider. When recovering from B2 for testing, I restore a single file using --include, to avoid downloading all the data. I also run restic check occasionally, against both the HDD and B2.

If you liked what you read, consider subscribing to the RSS feed in your favourite feed reader.