The Dead Disk Blues, part 2

Sigh. It’s not like I didn’t have any warning there was something wrong with the disk. A few weeks ago, I had got a “delayed write failure” error message when I was downloading some stuff. Windows suggested that this might be due to a faulty network connection or faulty hardware. Because I had been performing a network operation at the time, and because traditionally our wireless network has always been a bit tricky, I was inclined to think that the network was the problem.

When I rebooted the PC at that point, the checkdisk program kicked in, and reported that the file I had been working with had something wrong. It was able to fix it, though, and the problem didn’t repeat itself. Fine, I thought. The network problem must have screwed up a couple of sectors on the disk, but everything’s okay now.

Despite having read several web pages that spoke of imminent disk failure, did I even consider this as a possibility? Uh-uh. Did I take the opportunity to do a compete data backup just in case? Nope.

Then yesterday evening I was performing some very disk-intensive tasks: converting a bundle of SHN files (shortened audio files–a lossless way of compressing sound files) into WAV files I could play. There was about 400MB of SHN data, which, when extracted, turned into about 750MB of WAV.

This was the first time I had used the SHN format, and the first time I had used the program for extracting them (mkwACT). So when I started getting more “delayed write failure” messages, I blamed it on the application. I thought that maybe the program was producing data faster than the hard disk could write. Hence, a “delayed write”. This might have been “failing” because of a bug in the application code.

Bzzt–wrong!

Repeat after me: “Delayed write failure” means back up your data NOW because your hard disk is about to DIE.

In the process of completely extracting the WAV files, I had to reboot my PC four times because of complete lock-up system crashes. That’s more crashes in one evening that I’ve had in the whole year I’ve been running Windows XP at home.

On two of the reboot occasions, scandisk ran, and reported a couple of disk errors. It also reported that it was able to repair them. I went into the Disk Management section of “My Computer”, but it told me that the drive was “healthy”. It wouldn’t complete a scan of it, though. And then there were the funny clicking sounds….

So repeat after me: Funny clicking noises mean back up your data NOW because your hard disk is about to DIE.

But did I take the opportunity to back up my data? Nope.

How many warning signs did I need???

Repeat after me: AAARGH.

So what have I lost? Well, fortunately or unfortunately (I haven’t decided yet), it was my data drive that died. I have (or had) two hard disks in my computer. The first one contains my Windows installation, all program files, and most programs’ immediate data and settings. (Which is why I’m still able to write and post this from my PC.) It also holds my email, and my address books. The second hard disk holds (held) all of my music, photos, software, backups (!), documents, spreadsheets, and other “stuff.”

Very fortunately, we invested in a brand new hard drive a few months ago, which we are using as a nearline storage unit kind of thing. This was just before I started my annual Linux experiment, and as a precaution I backed up both internal drives onto this external unit. So it’s only two months of data that is gone.

The software I had on the drive was all downloaded from the net, and is all replaceable. (Our broadband connection will be very handy here.) Likewise, the music on my drive was all ripped from my CD collection, and can be ripped again without too much effort.

In terms of documents and spreadsheets, I haven’t actually done much in the last few months. Most of what I have written in that time has gone up on the Sunpig web site, or has been sent via email, and is therefore not actually lost. And most of the spreadsheets we update regularly (book catalogue, and accounts spreadsheet) are located on our server, and not on my hard disk. So document-wise, I’m not too badly off, either.

What really hurts is the photos. All of the photos we’d taken during October and November were on the disk that died, and those were the only copies. A very small number of them had been uploaded to the Sunpig web site, and a few more are currently serving as our desktop background pictures, but we have probably lost about 150-200 photos, and maybe 5-10 video clips.

It’s possible that we could get these back. There are companies that specialise in recovering data from dead hard disks. However, this can be a very expensive process. An optimistic estimate would probably be a couple of hundred pounds. And are they really worth that much? It’s very sad to lose the photos, but this is probably too expensive. Instead, we’re just going to try to learn a lesson from this disaster, and move on.

The lesson is, if you insist on using computers, sooner or later, you will suffer a catastrophic hardware failure. The question you have to ask yourself is, “how much data are you comfortable losing?”

A day, I can put up with. A week would be a nuisance, but the chances are I wouldn’t have actually made very many file changes, or saved new photographs. In any given month, there will usually be a bundle of photos of Alex that would be a shame to lose. In two months, that shame doubles, and becomes actively painful.

Amongst the photos we lost, there are some from a day trip to Glasgow, a bunch we took when Andy came up to visit, and a whole big stack of pictures that Abi took on a trip to Hewit’s. It makes me sad to know that we’ve lost them.

So what am I going to do?

Well, the first thing is to define a new backup strategy. I think what I’m going to do is get a new, large disk (60GB or so), and use this as my main PC disk (Disk A). I’ll put everything on it: Windows, apps, and data. Then, I’ll use my current main disk (13GB) and use it as a “staging” backup unit (Disk B). The nearline unit will serve as our primary backup unit (Disk C), and then we’ll have the usual variety of CD-Rs for longer-term and off-line archiving.

The process will be: on a nightly basis, a backup script will copy fresh data from Disk A to Disk B. Once a month, I’ll do a manual copy from Disk B to Disk C. Quarterly (probably), I’ll burn new CD-Rs from Disk C. Disk B and C will only be cleaned up when they get to 90% capacity, so that there will generally be a certain amount of overlap between them.

And most importantly: stick to this procedure. It’s worthless unless I actually do the backups on the intended schedule.

On the positive side, when I had my PC open, I noticed that the light thrumming noise it makes is not coming from the CPU fan, as I’d previously thought. It’s being produced by the fan on my graphics card. This is a good thing, because it means I can shut my PC up completely by paying another visit to the nice folks at QuietPC.com, and buying one of their silent video card heatsinks. Nifty!

Also, I had been planning to buy a bunch of new computer components in the new year. After this whole disaster, I’m going to have to add a new hard disk into the mix as well. And I might just be forced to buy them all before Christmas… 🙂

See also:

2 comments

  1. I said it in person, and I’ll say it on the web as well.

    My informal lack of survey indicates that 100% of hard drive failures come to light after switching on your computer. That’s right. You switched it on! You should have known the hard drive was going to fail!

    Hindsight is always 20/20.

  2. for the end user there are two types of hdd fail, two understand them you have to know a little about hdd, if you look at the bottum of the hdd there is a gree pcb (printed circit board) in the inside of the hdd are platers much like ald 45’s there is also a head, much like a needle on a phone. if the head hits the hdd platter than there is phisical hdd damage, BUT the pcb coud go bad this could couse the hdd not to boot or write errors ect. if you go to e-bay and find a hdd that is the same type you can replace the pcb and recover the hdd data. if you get grinding noise or banging sound it could be the heads crashing on the platter then replacing the pcb will not matter.

Comments are closed.