59 Rescue Files from Damaged Hard Drives

Rescue Files from Damaged Hard Drives

figs/expert.gif figs/hack59.gif

When your hard drive is damaged or is on its last leg, use Knoppix to recover what's left on the drive and attempt to restore it.

Hard drives continue to get larger and more complicated, and at least in the desktop IDE market, hard drives seem to be getting less and less reliable. If you don't believe me, search the Internet for "IBM Deathstar" (referring to problems in the 60GXP and 75GXP series of hard drives). While a three-year warranty guarantees you a replacement drive, if your drive fails, there is no way to receive replacement data. When your hard drive starts to fail, you might notice that it becomes much louder than it used to be and makes a loud clicking noise that sounds a bit like your hard drive is crushing ice. Your drive has the click of death. In addition to general file-access failures, the click of death is the main indicator that your hard drive is dying and should be backed up immediately.

Unfortunately, most backup and imaging utilities operate on the assumption that they are running on fully functioning hardware. When a hard drive is dying, many backup utilities won't be able to handle the different access errors. If your drive has gotten so bad that you can't even boot from it, your best chance of creating a backup is to image the drive [Hack #48] . But even the faithful dd program exits out with an error if it hits a bad block in a file, so if you try to image a failing hard drive, you end up with an incomplete image.

Knoppix comes with a tool called dd_rescue (http://www.garloff.de/kurt/linux/ddrescue) that aims to pick up where dd leaves off when reading from questionable drives. When dd_rescue comes across a bad block, it simply skips it and moves on by default, or it can be set to move on after a certain number of failures. On a failing drive, this means you can create an image of a full partition with some holes here and there, and then use fsck to try to repair some of the damage on the filesystem. By using Knoppix for this recovery, you access the drives as little as possible, so you are only putting strain on the bad drive long enough to make a single copy, and then you can browse around the image from a fully functioning drive.

While you can do the complete drive rescue with the dd_rescue tool, there is a helper frontend tool called dd_rhelp that automates and speeds up much of the process. Dd_rescue doesn't stop when it hits bad sectors, but it does slow down significantly. If your drive has a number of bad blocks in a row, it can take dd_rescue a long time to move past them into recoverable data. If the drive is going to fail quickly, this means your drive can fail while dd_rescue is waiting on bad blocks. Dd_rhelp speeds up this process by assuming that bad blocks are generally in groups. When dd_rhelp sees that dd_rescue has hit a bad block, it skips ahead a number of blocks and reads from that point in reverse until it hits another bad block. It uses this method to map out sections of bad blocks on the drive and attempts to recover the good blocks first. Then, when it has recovered the good blocks, it goes back and tries to recover from the group of bad blocks.

Time is precious when a drive is failing, so dd_rhelp tries to spend more time recovering good data, and then goes back to recover questionable data if it can. There are other benefits to dd_rhelp, such as it can use the logs that dd_rescue generates to resume a rescue operation that you have stopped with Ctrl-C. Also, dd_rhelp generates nice ASCII output that shows you where it is on your drive and which bad blocks it has discovered.

So your drive has the click of death, and some files are missing. Don't panic. You should still be able to recover most or all of your data. First, you need something to store the disk image on. You are using Knoppix, so you can save the image to any drive that Knoppix supports, including locally mounted drives, USB drives, and remote file servers. This drive must be large enough to hold a complete image of the failing disk partition, so even if you have 7 GB free on a 10-GB drive, you still need 10 GB of space on a second drive to back up the image.

Boot Knoppix. Open a browser and go to http://www.garloff.de/kurt/linux/ddrescue/. Knoppix includes dd_rescue v1.02, but dd_rhelp requires v1.03. Download Version 1.03 or greater to your home directory, create a local bin directory to hold the binaries (so the new dd_rescue is run instead of the one shipped with Knoppix), and extract dd_rescue to that directory:

[email protected][knoppix]$ mkdir -p ~/.dist/bin

[email protected][knoppix]$ tar xzf dd_rescue-1.03

.tar.gz dd_rescue

[email protected][knoppix]$ mv dd_rescue ~/.dist/bin

Now browse to http://www.kalysto.ath.cx/utilities/dd_rhelp/index.en.html and download the latest version of the dd_rhelp tool to your home directory. Open a terminal, extract the files from the dd_rhelp-version.tar.gz file that you have downloaded, and change to the directory it creates. Then compile the program and copy the new dd_rhelp binary to your local bin directory with dd_rescue:

[email protected][knoppix]$ tar xzf dd_rhelp-0.0.5


[email protected][knoppix]$ cd dd_rhelp-0.0.5


[email protected][dd_rhelp-0.0.5]$ ./configure && make

checking for a BSD-compatible install... /usr/bin/install -c

checking whether build environment is sane... yes

checking for gawk... gawk

checking whether make sets $(MAKE)... yes

checking for a BSD-compatible install... /usr/bin/install -c

checking for bash... /bin/sh

configure: creating ./config.status

config.status: creating Makefile

config.status: creating src/include/begin-sh

config.status: creating src/include/copyright-sh

config.status: creating src/include/end-sh

config.status: creating src/include/vars-sh

rm -f dd_rhelp

echo "#!/bin/sh" > dd_rhelp

cat ./src/include/begin-sh >> dd_rhelp

cat ./src/include/copyright-sh >> dd_rhelp

cat ./src/include/GPL-sh >> dd_rhelp

echo "# TODO : " >> dd_rhelp

cat ./TODO | sed 's/^/# /g' >> dd_rhelp

cat ./src/include/vars-sh >> dd_rhelp

echo "# Including 'libcolor.sh'" >> dd_rhelp

cat ./src/include/libcolor.sh >> dd_rhelp

echo "# Including 'libcommon.sh'" >> dd_rhelp

cat ./src/include/libcommon.sh >> dd_rhelp

cat ./src/dd_rhelp-sh >> dd_rhelp

cat ./src/include/end-sh >> dd_rhelp

chmod ugo+x dd_rhelp

[email protected][dd_rhelp-0.0.5]$ cp dd_rhelp ~/.dist/bin/

Mount the drive to which you are saving the image with read/write access. You don't need to mount the problem drive (if the drive is far enough gone, you aren't able to mount it anyway). Then run dd_rhelp:

[email protected][knoppix]$ sudo mount -o rw /dev/hdb1 /mnt/hdb1

[email protected][knoppix]$ sudo dd_rhelp /dev/hda1 /mnt/hdb1/hda1_rescue.img

=== launched via 'dd_rhelp' at 0k, 0 >>> ===

dd_rescue: (info): ipos:   1048444.0k, opos:   1048444.0k, xferd:   1048444.0k

                *  errs:      0, errxfer:         0.0k, succxfer:   1048444.0k

             +curr.rate:     8339kB/s, avg.rate:     7564kB/s, avg.load:  7.9%

dd_rescue: (warning): /dev/hda1 (1048444.0k): Input/output error!

dd_rescue: (info): ipos:   1048444.5k, opos:   1048444.5k, xferd:   1048444.5k

                *  errs:      1, errxfer:         0.5k, succxfer:   1048444.0k

             +curr.rate:      812kB/s, avg.rate:     7564kB/s, avg.load:  7.9%

dd_rescue: (warning): /dev/hda1 (1048444.5k): Input/output error!

dd_rescue: (info): ipos:   1048445.0k, opos:   1048445.0k, xferd:   1048445.0k

                *  errs:      2, errxfer:         1.0k, succxfer:   1048444.0k

             +curr.rate:     1057kB/s, avg.rate:     7564kB/s, avg.load:  7.9%

dd_rescue: (warning): /dev/hda1 (1048445.0k): Input/output error!

dd_rescue: (info): ipos:   1048445.5k, opos:   1048445.5k, xferd:   1048445.5k

                *  errs:      3, errxfer:         1.5k, succxfer:   1048444.0k

             +curr.rate:      994kB/s, avg.rate:     7564kB/s, avg.load:  7.9%

dd_rescue: (warning): /dev/hda1 (1048445.5k): Input/output error!

dd_rescue: (info): /dev/hda1 (1048446.0k): EOF

Summary for /dev/hda1 -> /mnt/hdb1/hda1_rescue.img:

dd_rescue: (info): ipos:   1048446.0k, opos:   1048446.0k, xferd:   1048446.0k

                   errs:      4, errxfer:         2.0k, succxfer:   1048444.0k

             +curr.rate:     1042kB/s, avg.rate:     7564kB/s, avg.load:  7.9%

[email protected][knoppix]$

Replace /dev/hda1 with the partition that you are recovering, and /mnt/hdb1 with the mount point where you are saving the image. As dd_rhelp scans the drive, it prints out all of its progress, including any errors it finds. When it finishes, you should have two files in your recovery drive: the image and a log from dd_rescue, in case you want to audit its progress.

Now, run fsck on the image to attempt to repair any filesystem errors that might have occurred [Hack #57] by typing this command:

[email protected][knoppix]$ sudo fsck -y /mnt/hdb1/hda1_rescue.img

fsck 1.35 (28-Feb-2004)

e2fsck 1.35 (28-Feb-2004)

/mnt/hdb1/hda1_rescue.img: clean, 12/131072 files, 187767/262111 blocks

The -y option tells fsck to automatically repair any filesystem errors it finds. Mount the image with the -o loop option, and you should be able to access your files at that mount point as if it were a hard drive:

[email protected][knoppix]$ sudo mount -o loop /mnt/hdb1/hda1_rescue.img /mnt/hda1

     Python   SQL   Java   php   Perl 
     game development   web development   internet   *nix   graphics   hardware 
     telecommunications   C++ 
     Flash   Active Directory   Windows