As I mentioned in the post about my backup system, I run restic check
regularly, to test the integrity of my backup. It has never reported any error
in my repository. It performs a quick, shallow check, and does not verify that
all the data is intact. I also restore a random file, whenever I run restic check
, to test recoverability.
Shortly after I wrote the previous post, I decided to run restic check --read-data
for the first time ever. This reads every file in the repository
and simulates a full restore. To my utter horror, it reported many errors like
these!
Pack ID does not match, want 5e66c2ac, got e80051de
pack d64be86d contains 1 errors: [Blob ID does not match, want 8ebf2c10, got 350f6ba1]
The “Pack ID does not match” error occurs when the SHA256 hash of the contents of a file, in the “data” directory of the Restic repository, does not match its name. This generally indicates that the file is corrupt. I have successfully restored my entire data from this repository, many times in the past. Clearly the repository was healthy then. When did it go bad?
Mysteriously, the set of broken pack and blob IDs changed each time I ran the
check! Why would a different set of files be corrupt each time? When I ran
sha256sum
on the flagged files, the hashes matched their names. This made no
sense! At this point, I suspected that Restic had a bug, but I could not find
anything wrong in the code.
Some online discussions hinted at the possibility of the hardware being at fault and it struck me. I had build a new desktop PC in December of 2021. Firefox tabs had been crashing occasionally, when playing videos, ever since I switched to this PC. I assumed that this had something to with the drivers or the DRM plugin. The crashes were rare enough to not motivate me to dig deeper. Could a bad piece of hardware have been the issue all along?
I ran memtest86 and sure enough, it reported errors within seconds. This broke SHA256 computation at random points in the checks, resulting in different packs and blobs being flagged. This may also have broken the encryption of any data backed up from the new desktop.
The PC has a dual-channel memory kit. I was able to isolate the fault to one of
the sticks, by running memtest86 against each individually. I removed the
faulty stick and ran restic check --read-data
again a few times. It still
flagged a few packs and blobs, but the set was consistent across runs. Checking
the hashes manually confirmed that these files were indeed corrupt. I repaired
the repository, by following this comment. I ran restic check --read-data
again and everything looked good. Subsequently, I raised a
warranty claim to replace the G.SKILL memory kit, through Acro Engineering
Company.
It is possible that the faulty RAM has corrupted not just the backups, but also any original file I created or modified on the desktop. I checked all the files I consider to be critical and none of them appear to have been affected. I still do not know if some file that I did not check is corrupt. 🤷
All opinions are my own. Copyright 2005 Chandra Sekar S.