How I lost my dissertation files (despite 7 different backup plans)

This was originally posted on my blog, Notes from a Linguistic Mystic in 2015. See all posts

So, remember the dissertation I was working on? That little thing that took two years, 170 pages, 50+ participants and thousands of lines of code? The crowning achievement of 12 years of higher education?

Well, a big chunk of the work I did is gone, because I made some bad decisions, and had some very bad luck. I’d like to share what I did wrong, and how to not be me.

“Huh, that’s weird”

In early June, my logic board in my Macbook Pro failed, and took the hard drive with it. I’d been having kernel panics, and a few periodic drive read errors, but I caught it early. When I brought it to the Genius bar, the diagnostic failed, and Apple replaced everything, as it was (barely) still under warranty. It came back to me with a new SSD and logic board.

I restored my data to the newly wiped computer from a two-day old backup, and I also took this is an opportunity to clean up a bit. I got rid of some programs I wasn’t really using anymore, threw out some files and bad music, and eventually, felt pretty good about my computing life. My computer was lean, fast, with brand new parts, and I thought I’d recovered from a dead hard drive with no issues. But I never opened the dissertation folder.

Two weeks ago, a colleague asked me for a script I used to create some of the stimuli for my dissertation. Easy, I said. I’ve got that in my “dissertation” folder. I opened the folder, knowing just where it would be, but it contained nothing but a corrupted PDF with comments from my committee. Whether it was lost to the data corruption, lost in a bad restore, or just lost, it was gone. Everything else was gone.

“OK, this is why I have backups.”

I’ve had a number of hard drive failures over my life, so, when it comes to data, I’ve had a hardcore backup schema. At any given moment, I have:

Three small portable backup drives using Apple’s “Time Machine”, which I swap out periodically
A USB hard drive playing “Time Capsule”, attached to my wireless router and automatically backing up using Time Machine every few minutes
Two “cold storage” time machine drives, one at home and one off site, which I only update every once in a while
An offsite internet backup service (Crashplan), keeping copies of deleted files as well as the past versions.

Theoretically speaking, in order to lose all of my data, I would have to experience 6 hard drive failures and lose access to the cloud.

Or, I’d just have to f*** up really badly.

How I f***ed up really badly, Part 1

I didn’t know when the data had disappeared, but it was gone, and I needed to get it back.

Over the next few hours, I went through every one of the backups above, and found that amazingly, each one had failed because of two really poor choices, and one bad stroke of luck.

Really poor choice #1: I “refreshed” most of my backups when I got my computer back

After the clean install, I was feeling cocky. My computer was clean, decluttered, and running great, and everything looked fine. So, given that my backup drives were already starting to get full with all that old data (“Who needs old data!?”), and I needed to repartition them anyways, I decided to wipe and re-start every single backup drive except my offsite “cold storage” drive. I was confident enough that between Crashplan and the offsite storage, I’d be fine even if there was some missing data, even if there was a problem, and “starting fresh” would be a great idea.

This meant that my oldest backup on any of these drives was June 16th. The day after my “Clean” install. So, on every single drive, instead of 2+ years of backup data, the oldest one had the same corrupted folder as my hard drive.

This choice alone brought my data down from 7 backups, to just two. But that’s fine, two is enough. Unless I f***ed up really badly.

How I f***ed up really badly, Part 2

I’ve used Crashplan for a while now, and liked it a lot. There are reasonable privacy controls, it’s fast, easy, and reliable, and it even saves deleted files for a period you specify. It’s also much more reliable and faster than SpiderOak, my previous solution.

So, once I realized my backups didn’t have my back, I logged in to the Crashplan interface, hoping to restore my files that way. But they weren’t there, either. For that matter, my entire year of deleted file and revision history was gone too. I couldn’t figure out why, until I realized that:

Really poor choice #2: I didn’t understand the nuances of how Crashplan worked

During that restore process, I changed my username on my Mac, to fix a long-standing error. This shouldn’t play a role, except for one minor detail: Crashplan doesn’t save deletion history for folders that are no longer being backed up, and the username of the home folder matters.

When I set Crashplan up again on the newly wiped machine, I selected my new home folder. It matched all the files to the old folder, and since the data had already been uploaded, it was just a matter of minutes before my backup was up to date, and my old home folder was “gone” to the system.

That evening, at 1am, Crashplan’s automated cleanup robots decided that since I no longer cared about the old username’s home folder (which no longer exists), it could delete all of the deleted file history for that old folder, and focus on the new username’s folder, which had no file history at all.

Just like that, at the whim of a bot doing its job properly, my deleted file history disappeared, leaving only the same corrupted folder that I had everywhere else.

At this point, the data existed in just one place: my “offsite” cold storage drive. But I still had a copy, so I’d be fine.

Unless I was really unlucky.

How I was really unlucky

Know the saying “Two is one, one is none”?

Stroke of bad luck #1: One was none.

When I plugged in my offsite drive, I wound up with a “Click-Click-Click” of death, and although my machine could see the drive, it couldn’t decrypt the backup data, no matter what I tried. Whether it was the heat in storage or just my luck running out after 4 years of using the drive, my “just in case” drive was dead, and my data with it.

Learn from me, damnit

Even though I did a lot of things right (by having many backups in a few different forms), I made a few bad choices, and it burned me. In the name of helping my readers avoid these errors, I have a few suggestions, many of which are obvious, but still escaped me:

1) Phase out old backups over time, not all at once

This whole issue would have been avoided had I just kept more old backups. My desire to “clean up” and “start fresh” here burned me bad. What I should have done, if I wanted a clean slate, was to wipe one drive at a time, every six months or so. That way, I’d have had at least one set of historical backups, even as I cleaned things out and repartitioned.

2) Know the Details of your Backup Service

After reading the documentation, Crashplan worked exactly as it was supposed to, here. I removed a folder from the scope of the backup, and it removed all old versions of that folder. This is the right behavior for privacy, for organization, and for minimizing space used. But because I didn’t understand how it worked with username changes, I thought I had old versions that I didn’t, and made bad decisions because of it.

3) Keep a couple of “cold” backups

It’s a very good idea to have data someplace that you simply don’t touch very often. Sure, the data will be a bit out of date, but I would pay good money for a copy of my dissertation files circa November. The purpose of this is not to recover gracefully from a recent failure, but to save your bacon in case “the big one” hits. Whether these are DVDs, a hard drive left with a family member, or even an old computer left unwiped in your closet, it’s important to have a copy of your data that’s safe, offline, and immune to viruses, data corruption, and bad decisions. Had I not had a hard drive failure, I’d have been just fine thanks to my offsite backup.

4) Don’t trust your “perfect system”

All of this would have been avoided had I, shortly after finishing the dissertation, just burned everything to a DVD for archiving. That way nothing could have wiped it out short of a house-fire. I even thought about doing this, but I had enough confidence in my redundant backup system that I didn’t think I needed to bother digging out the DVDs.

Stupid, stupid, stupid.

Redundancy doesn’t prevent stupidity

Although a lot was, all is not lost. I’d stored the sound file data in a different folder, and by searching lab computers, Google Drive backups, asking my advisor and colleagues for scripts I’d shared, and a few very lucky “emailed to myself” or “copied to my website” moments, over the following weeks, I was able to find copies of the text itself, and all the data I will need to reproduce my findings for publication, albeit with a fair amount of duplicated work. A few other folders were affected, but no others of them were as important. I can’t say I dodged the bullet, but I survived it.

Nevertheless, remember that no matter how redundant, well-formed, or multi-tiered your backup plan is, it can’t save you from yourself. My biggest problem here is that I didn’t fully understand the mechanisms I had in place, and I made a stupid decision using this bad information, and it cost me.

Don’t repeat my mistakes.