The blog of

"You're gonna need a bigger boat." [A brief look at data storage requirements in today's world]

I've previously blogged about my data storage/backup strategy. Briefly, I've got one big drive in my home server that stores all the data my family cares about: mostly music, pictures, and videos (with a little bit of other stuff for good measure). To protect the data, I've got another equally big external drive that I connect occasionally and use for backups by simply mirroring the content of the internal drive.

As things stand today, the internal drive is 320GB and the external drive is 300GB, but I've hit the wall and am almost out of space to add new files. Looking at hard drive prices these days, the sweet spot (measured in $/GB) seems to be with 500GB drives at about $140 (PATA or SATA). Any smaller than that and the delta from 300GB isn't enough to be interesting - any larger than that and the cost really goes up.

I was already prepared to buy a new drive every year or so to allow for growth, so I was curious if getting a 500GB drive now would do the trick. I wrote a quick program to look at every file I backup and tally up the size according to the date the file was created. The C# program walks the whole directory tree, sums the sizes by date, and writes out a simple CSV file with the results. The idea here is to chart the rate at which I'm adding data in order to predict when I'd run out of space next. (Yes, it's easy to come up with more sophisticated heuristics, but this is really just a back-of-the-envelope calculation and doesn't need to be perfect to be meaningful.)

Last night I opened the CSV file in Excel and charted the data. The resulting chart looks like this:

Data Storage Space (GB)

The blue line represents the cumulative size of the data I had at each point in time (horizontal axis) measured in GB (vertical axis) - you can see that I'm just above 300GB today. The red line is Excel's exponential trend line for the same data - it matches the blue line almost perfectly, so it seems pretty safe to say that my data storage needs are increasing exponentially. I was kind of afraid of that, because it means the 500GB drives I've been considering are likely to fill up within the next 8 months!

Clearly, I need to be prepared to spend more on hard drives than I'd initially planned to - or else I'm going to need to significantly change how I do things. I've got some ideas I'm still considering, but charting this data was a good wake-up call that drive capacity isn't increasing as rapidly as I might like. :)

I think that data storage and backup are issues that will affect all of us pretty soon (if they're not already). Backing up to DVDs doesn't scale well once you need more than 10 or so DVDs, and backing up over the network just doesn't seem practical when you're talking about numbers this large. Even ignoring the need to backup, simply storing all the data you have is rapidly becoming an issue. With downloadable HD movie/TV content becoming popular, high megapixel still/video cameras being commonplace, and fast Internet connections becoming the norm, it seems to me that content is outpacing storage right now.

Here's hoping for a quantum leap in storage technology!

Updated on 2007/03/14: I've just posted the source code for the program I wrote to gather this data.

Tags: Technical