The blog of dlaa.me

A brief bit 'bout backups [My current backup strategy]

I've seen a few references to backup strategies on blogs and discussion lists lately and thought I'd write a bit about the strategy I recently decided on and implemented. Of course, everyone has their own approach to file management, their own comfort level for security, and their own ideas about what's "best". That's life and I'm not going to try to persuade anyone that my way is better than their way - but I will outline my way in case it's useful for others, too. :)

The setup: My machine is running Windows 2003 Server and I try to keep as much unnecessary stuff off it as possible (no games, no P2P programs, no weird drivers, etc.). Along the same lines, all user accounts on the server are members of the restricted access Users group, not the Administrators group. The machine has one hard drive for storing the operating system and all programs (60 GB) and another hard drive for storing all data (320 GB). The data drive has a Mirror directory under which all data to be backed up is stored. The Mirror directory is ACLed to allow the Users group read/write access. Non-private subdirectories of it are shared out for read-only access by Users. I have an external USB 2.0 drive enclosure for backing up to (200 GB) that is normally powered off and that I mirror the Mirror directory to every couple of days or so. The external drive is ACLed to allow only members of the Backup Operators group to make changes. My data consists of the usual personal stuff (email, source code, etc.), all digital photos I've ever taken, all digital video I've ever taken, sentimental stuff (like wedding videos, baby's ultrasound video, etc.), and some of my music collection in WMA Lossless format. Very little data changes day-to-day, so a simple tool like RoboCopy (free with the Windows 2003 Resource Kit) is more than enough to keep the backup directory in sync (use RoboCopy's /MIR switch to make this easy). Along with the rest of the data is a file that records the MD5 hash of every file in the backup. As my data storage needs increase (which they do each time I take a picture or shoot a video!), I'll eventually buy a new large hard drive and swap it for the smallest of the two data drives currently in use. As long as my storage needs don't grow too rapidly, I'm figuring the cost of upgrading to be about $100 each year (that's the cost of a mid-sized drive like the 320 GB I purchased a few months ago). I'm counting on storage capacity to continue increasing like it has so that I'll always be able to buy $100 drives when I need to increase the storage space.

Benefits provided by this approach:

  • All the data I care about is stored in two independent locations, so there's no single point of failure. (Duh, that's why it's a backup.)
  • Hard drive media doesn't suffer from the same "bit rot" problems that can render writable CDs/DVDs unreadable after just a couple of years.
  • The backup drive is completely separate from the primary drive, so if I ever make a mistake and delete something important, I can easily recover it from the backup. (Some RAID-based solutions immediately mirror all changes and therefore don't have this benefit.) Similarly, a destructive virus on my main machine can't immediately destroy all copies of any data.
  • I look over the list of changes whenever I perform the mirroring to the external drive, so I have an additional opportunity to catch accidental deletions, mysterious changes, etc..
  • I have immediate access to all of my data from any machine in my home. If I decide to look at old photos, I can access them just as easily as the photos I took yesterday.
  • All family members store their data under the Mirror directory (via appropriately ACLed shares), so everybody's data is automatically backed up.
  • In the event of a slow-moving catastrophe (ex: a flood) I can easily grab the external backup drive and take it with me wherever I go. All data will be accessible from any other Windows computer in the world.
  • The overall cost was minimal to set up (~$100) and should be minimal to maintain (~$100/year).
  • Data is separate from applications, so I can reinstall or upgrade the operating system whenever I want without worrying about the data itself.
  • User accounts have limited privileges and are therefore less likely to accidentally compromise the machine when reading email or surfing the web.
  • The MD5 hashes mean that it's easy to verify the contents of my backup drive and that I'll be able to detect data corruption problems if they ever happen.
  • The backup drive is ACLed so that I can't accidentally delete data on it.

Problems this approach does not solve:

  • Both drives are at the same physical location, so all data can be lost in the event of a sudden catastrophe (ex: fire, earthquake). Possible mitigation: Set up a third external drive (after the first upgrade) and keep that drive somewhere far away. It may not be big enough to hold everything, but I'm happy to exclude music from the off site backup. Drawback: Inconvenience of updating the off site drive.
  • "Old data" is lost quickly. For example: if I accidentally delete an important file, I need to detect that mistake at the time of the next mirroring or else that file is gone for good. Possible mitigation: Multiple backup drives at staged intervals (ex: 1 week, 1 month, 3 months). Drawback: Cost.
  • A thief who steals the computer or external drive might have access to personal data. Possible mitigation: Encryption. Drawback: Inconvenience of decrypting files to use them and/or backing up EFS keys.
  • This solution may not scale well if my data storage needs increase faster than storage technology does. Possible mitigation: Move to a different backup strategy. Drawback: That strategy will have its own problems.

I think this overview touches on pretty much all of the key points of this strategy. It's obviously not a perfect solution, but it meets most of my requirements and I'm pretty happy with how it's been working out so far. However, I'm always open to improvements - if you have any suggestions, I'd love to hear them!

Tags: Technical