home | links | feminist | studies | wishlist | cooking | !blog

Resuscitating A Hard Drive For Fun And Profit

This is the story of a failing 40 GB ATA hard disk with Linux ext3 filesystems. Some of the facts may be applicable to other incidents, however in part this descritption is specific to the Linux ext3 file system and the utilities working on partitions of that format. The partition table was still readable, so no recovery had to be done there, but other documents detailing faulty master boot records (MBR) and partition tables are likely to be available. As a rule of thumb, with an appropriate offset dd_rescue should work fine in that case.

After making a few quick changes to the index.cgi for the radio broadcast recording archive, I wanted to test whether the ogg->mp3 transcoder worked better now. Strangely enough, the webbrowser hung. I suspected the local gateway (this was at the house of my parents) and tried with lynx from somewhere else: same result. Soon enough, ssh sessions also slowed down and a login was almost impossible, only telling me that /bin/bash could not be read when it finally succeeded. I found a screen with a rootshell, and after confirming that the boot/root disk was showing read errors, I constructed a tar command line to back up the system. I wrote a mail to my girlfriend, the admin of the server, and went to play scrabble.

end_request: I/O error, dev 16:06 (hdc), sector 30052776
hdc: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error }
hdc: read_intr: error=0x40 { UncorrectableError }, LBAsect=59774729, sector=30052778

A few hours later I got a phone call from her, telling me that there was something wrong with the server and that it was being rebooted. In panic I answered that that was a very bad idea, but it was already too late: lilo still worked and the kernel would boot, but it could not find init any more. There was nothing left to do but hope that the backup job had copied all the important data ...

No init found.  Try passing init= option to kernel.

The next day, after I had managed to boot from a Debian GNU/Linux Sarge netinstall boot CD, it became clear that it had been a bad idea to try and backup almost the entire system: only a few hundred ko1 had been saved, and those were from /var/lib/apt and /var/lib/dpkg, where Debian stores package metadata, alternatives symlinks and (de)installation scripts. We were fresh out of a backup.

$ cat /backupdir/tar.in
tar: Removing leading `/' from member names
tar: /var/lib/apt/lists: Cannot stat: Input/output error

It was a long night, with G. setting up a brand new system and me trying to get our data off the unresponsive disk. My googling was relatively futile and dd conv=noerror did not do me a lot of good, fortunately G. found a link to dd_rescue, a program which can decrease its read block size in case of error and continue reading. This particular incarnation by Kurt Garloff writes sparse files, which also let me see the progress of the operation, here only 1972 ko of the 2 Go have been recovered (no, the size of the file is not 2 G, but calling dd_rescue with -r to start from the end and read backwards would create a sparse file of that size):

1972 -rw-r--r--    1 root     root      2043904 Nov  3 00:08 hdc5

It was unnerving to see dd_rescue wait an eternity for every bad sector. It would have taken days that to restore the whole 40 Go that way, in the end I stopped the dd_rescue process whenever it encountered an error (which it can do by itself with -e 1) and after two hours I got as many gigaoctets.

dd_rescue: (info): ipos:         0.0k, opos:         0.0k, xferd:         0.0k
                *  errs:      0, errxfer:         0.0k, succxfer:         0.0k
             +curr.rate:        0kB/s, avg.rate:        0kB/s, avg.load:  0.1%
dd_rescue: (warning): /dev/hdc6 (0.0k): Input/output error!
dd_rescue: (fatal): maxerr reached!
Summary for /dev/hdc6 -> hdc6:
dd_rescue: (info): ipos:         0.0k, opos:         0.0k, xferd:         0.0k
                   errs:      1, errxfer:         0.0k, succxfer:         0.0k
             +curr.rate:        0kB/s, avg.rate:        0kB/s, avg.load:  0.1%

The only solution I could conceive of was to wrap dd_rescue in a perl script that would skip a configurable amount of data when an error was encountered. The script should try to read backwards from that position in order to determine the end position of the bad area. After some swearing I saved the script under the name of ddrs. I also tried reducing the block size and drive redahead in order to avoid running into bad sectors; this might or might not have helped, I could not see much of a difference. fdisk -m 0 -a 0 -A 0 -d 0 /dev/hdc

/usr/local/bin/ddrs /dev/partition [startpos] [blocksize in ko]

ddrs takes the last component of the partition name as a result filename,
if this does not suit you, make a symlink

With a skip size of 1 Mo (1024), the process still took over a day for 40 Go, but more than 90 % of the "dead" disk could be read. A pattern emerged: generally, the more frequently used blocks were less likely to be recoverable, blocks written to rarely could often be read. To find that out, of course I still had to access the damaged filesystem.

Disk Content analysis

After dd_rescuing and ddrs'ing, the filesystems still would not pass an e2fsck (I tried, read only of course). I had expected this, but that even giving alternate superblocks 8193, 16385, 32769 etc did not work was unnerving. I had already moved to using my good friend debugfs, when I discovered recent versions of mke2fs used a default option of -s sparse-super-flag 1. mke2fs -n /dev/device (n is the pretend option, it does not really overwrite the file system) told me where to find backup superblocks.

# mke2fs -n /dev/hdc5
mke2fs 1.27 (8-Mar-2002)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
128256 inodes, 256024 blocks
12801 blocks (5.00%) reserved for the super user
First data block=0
8 block groups
32768 blocks per group, 32768 fragments per group
16032 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376

Disclaimer: an e2fsck call with an appropriate backup superblock (see above) might deliver all the files you will ever want, but it will modify the image, so I would recommend making a copy first. My approach was slightly different: I tried finding dirs in all inodes of the disk (tedious), ran strings (rather od) over the disk to find ascii data and searched for gzip/bzip2 headers at supposed block boundaries (the latter proved hardly successful, but ymmv).

# debugfs -s 163840 -b 4096 -c hdc6
debugfs 1.27 (8-Mar-2002)
163840: Bad superblock number - debugfs
Segmentation fault (core dumped)
# debugfs  -c hdc6
debugfs 1.27 (8-Mar-2002)
hdc6: Bad magic number in super-block while opening filesystem
debugfs:  cd /
cd: Filesystem not open
debugfs:  open -s 163840 -b 4096 -c hdc6
hdc6: catastrophic mode - not reading inode or group bitmaps
debugfs:  cd /
debugfs:  ls -l
      2   40755 (2)      0      0    4096 28-Jun-2004 21:59 .
      2   40755 (2)      0      0    4096 28-Jun-2004 21:59 ..
     11   40755 (2)      0      0   16384 20-Dec-2001 10:43 lost+found
  32705       0 (2)      0      0       0  1-Jan-1970 01:00 aoe
debugfs:  cd aoe   
aoe: Ext2 inode is not a directory 

After working around what looked like a Debian Woody bug which kept me from passing a backup superblock on the command line (see above), the debugfs internal open command with the same syntax worked. Unfortunately many of the directories were unreadable. I brute force tried all the inodes for directory content, someone might want to suggest a better solution. The number of inodes was taken from debugfs' stats command, but mke2fs -n, dumpfs, e2fsck and a number of other tools would lead to the same result.

perl -e '$i=0;print "open -s 4096000 -b 4096 -c hdc6\n";for ($i=0;$i<
    3172288 ;$i++){print "ls <$i>\n"}' \
 | DEBUGFS_PAGER=cat /sbin/debugfs -c hdc6 2>&1 \
 | grep -vE "^debugfs:  \$|Ext2 inode is not a directory"  > hdc6.ls

Now I had lots of directories to walk through. With a list generated from a command like the one above, the lost directories could be accessed and rdumped to the filesystem. What remained were the files whose directories were unreadable (and the unreadable files themselves, but except for repeated dd_rescue or professional data restoration there is not much hope for those). strings -8 -t d partition > partition.od8 or od.sarge -s8 -Ad partition > partition.od8 generated a string index which was about two orders of magnitude smaller than the original disk, mor for the root fs, less for data and home filesystems. 200 Mo files were greppale enough compared with partitions of many gigaoctets. Caveat: old versions of od use 32 bit signed long ints, thus wrapping to -2*220 after 2 Go, strings is therefore the more stable solution.

# time grep \\.profile partition.ls partition.od8
partition.ls: 196230  (20) .bash_logout    196231  (16) .tclshrc    196232  (16) .profile
partition.ls: 196230  (20) .bash_logout    196231  (16) .tclshrc    196232  (16) .profile
partition.od8:75089812 ${HOME:-.}/.profile
partition.od8:80743184 .profile
partition.od8:25947525617 .profile

real    0m7.072s
user    0m0.620s
sys     0m1.670s

If it is not obvious, the data from the strings-file can be used to copy blocks, as at least for small files the probability of entire files being saved in one piece is high. dd bs=4096 skip=`perl -e 'print int(75089812/4096)'` count=10 if=partition of=outfile in this case yields an entirely unusable part of a binary (${HOME:-.}/.profile was part of an environment string, so ...), but it's the spirit that counts. A second ddrs run with a smaller block size did not yield much: for i in hdc2 hdc3 hdc6; do ddrs /dev/$i 0 100 | tee $i.bad; done


In the end, I got almost all I wanted. The mysql directories were lost and unrecoverable, as disk thrashing seems to have ruined that part of the disk entirely, maybe in the last powered on hours of the usage life, but the backups on the same disks could fortunately be read without a hitch. My script could only be found with the strings method, interestingly enough it was saved twice in two successive blocks; they were identical. The mailing list config files were easily identified by some of the known recipients (and the db format mailman uses), and the mailing list archives were accessible with debugfs (cd <inode>).

$ grep eptember partition.ls
 130056  (24) 2002-June.txt    829057  (24) 2002-September
# debugfs hdc3
debugfs 1.27 (8-Mar-2002)
hdc3: Bad magic number in super-block while opening filesystem
debugfs:  ls
ls: Filesystem not open
debugfs:  open -b 4096 -s 163840 -c hdc3
hdc3: catastrophic mode - not reading inode or group bitmaps
debugfs:  cd <829057>
debugfs:  cd ..
debugfs:  cd ..
debugfs:  ls
 178861  (12) .    991661  (12) ..
 325136  (20) mailman.mbox    585487  (16) mailman
debugfs:  rdump mailman .
rdump: EXT2 directory corrupted while dumping ./mailman/2002-November
rdump: EXT2 directory corrupted while dumping ./mailman/attachments/20040921
rdump: EXT2 directory corrupted while dumping ./mailman/2004-September

What I Learned Today

  1. Always make a backup on a different disk, as a backup on a failing disk is not worth a lot.
  2. Pesky peecee bioses do not boot from disabled Drives, so you should activate the cdrom drive you want to boot your rescue system from.
  3. If you get a chance to copy data from a dying disk, save the important stuff (cgi-scripts, mysql-dirs, sql dumps, source trees, ...) first, as there is a lot more garbage than saveworthy data.
  4. On a badly damaged disk, dd_rescue itself is too slow, as it tries to read each bad sector again and again, each try blocking for minutes.
  5. Backup superblocks are not located at 8193 any more, despite a range of e2* tools telling you so, but at 32768, 98304, 163840, 229376, ...
  6. Woody debugfs does not accept superblock and block size arguments on the commandline, therefore you should open -s superblock -b blocksize -c filesystemcopy.
  7. strings -td helps creating a index that is far easier to grep than the original disk image.
  8. debugfs is a nifty tool, albeit less so since the unerase functionality has been defeated with ext3.

  1. kilooctet, as in 1024 octets, better known as bytes, or KiB in this case