Are your photos, music, and documents at risk?

Tim Bray has an insightful article on how a user in charge of their own computing environment should go about ensuring the safety of their data. (“Data” sounds very stuffy, but how would you feel if the friend who fixes your computer turned round and told you all your photos and music were gone forever? What about that CV that took you hours to perfect? Your dissertation?)

Tim says, “The single greatest threat to my data is me”, and I totally agree with him. Hardware failures are thankfully rare (although my backups are strongly motivated by once having three machines on which I store data experience hard drive failures in the space of a few months) and so human error—through carelessness, accident, lack of knowledge, or third-party—is the most common cause I have had to attempt data recovery.

Tim’s rules for preventing catastrophe sound simple enough, but I fear that while they are easy enough for compscis to follow, it’s less simple for other users.

  1. Don’t use proprietary file formats.
  2. Don’t erase anything.
  3. Store everything twice.
  4. Do occasional ad-hoc and regular full backups.

(2) and (3) are the most important: even if the second copy is on the same computer, you are at least protected from many calamitous events such as your word processor going haywire and replacing your British-spelt words with their American counterparts when you do a word count[1]. The low cost of USB memory sticks should also make it relatively easy for people to store backups of their important documents away on something less prone to failure and corruption than a floppy disk. Of course, memory sticks are not large enough for photographic and audio data, but a CD or DVD writer can come to your aide there. Not deleting anything obviously takes away the possibility that you’ll lose data by deleting the wrong thing, and in these days of plentiful disk space, and the relatively small size of documents, it seems sensible.

Unfortunately I am not sure any of these rules are as easy for a non-compsci to achieve as they should be. The day-to-day tools used by computer scientists almost universally result in rules (1) and (3) being achieved as a by-product of their use—an IMAP client for email and revision control for source code and documents are the two that protect my most important data. But for someone who uses tools such as Office, or pop3 for email, making any sort of copy (open or proprietarily formatted) is a process that must be initiated by themselves, or at least they must invest in an additional tool to perform it on their behalf.

So how can the tools of the “knowledge worker” be improved so that the rules are as easy to achieve as for a “computer worker”? Experience shows that tasks such as backups not on the critical path are only performed on a less than ad hoc basis if the process is entirely automated. In fact, one of the best systems I have ever used is a hidden, read-only .snapshot directory that stores regular (hourly, daily, weekly, and montly) copies of your home directory, using file system magic to avoid wasting space. Sadly, until that’s a standard feature on all computers, users are just going to have to invest some time, effort, and/or money in following Tim’s rules, or risk losing their data.

[1] This actually happened to a friend of mine while he was writing his 60,000 word thesis.