Resuscitating A Hard Drive For Fun And Profit
This is the story of a failing 40 GB ATA hard disk with Linux ext3 filesystems. Some of the facts may be applicable to other incidents, however in part this descritption is specific to the Linux ext3 file system and the utilities working on partitions of that format. The partition table was still readable, so no recovery had to be done there, but other documents detailing faulty master boot records (MBR) and partition tables are likely to be available. As a rule of thumb, with an appropriate offset dd_rescue should work fine in that case.
After making a few quick changes to the index.cgi for the radio broadcast recording archive, I wanted to test whether the ogg->mp3 transcoder worked better now. Strangely enough, the webbrowser hung. I suspected the local gateway (this was at the house of my parents) and tried with lynx from somewhere else: same result. Soon enough, ssh sessions also slowed down and a login was almost impossible, only telling me that /bin/bash could not be read when it finally succeeded. I found a screen with a rootshell, and after confirming that the boot/root disk was showing read errors, I constructed a tar command line to back up the system. I wrote a mail to my girlfriend, the admin of the server, and went to play scrabble.
end_request: I/O error, dev 16:06 (hdc), sector 30052776 hdc: read_intr: status=0x59 { DriveReady SeekComplete DataRequest Error } hdc: read_intr: error=0x40 { UncorrectableError }, LBAsect=59774729, sector=30052778
A few hours later I got a phone call from her, telling me that there was something wrong with the server and that it was being rebooted. In panic I answered that that was a very bad idea, but it was already too late: lilo still worked and the kernel would boot, but it could not find init any more. There was nothing left to do but hope that the backup job had copied all the important data ...
No init found. Try passing init= option to kernel.
The next day, after I had managed to boot from a Debian GNU/Linux Sarge netinstall boot CD, it became clear that it had been a bad idea to try and backup almost the entire system: only a few hundred ko1 had been saved, and those were from /var/lib/apt and /var/lib/dpkg, where Debian stores package metadata, alternatives symlinks and (de)installation scripts. We were fresh out of a backup.
$ cat /backupdir/tar.in tar: Removing leading `/' from member names / /lost+found/ /root/ /var/ /var/lost+found/ /var/lib/ /var/lib/apt/ tar: /var/lib/apt/lists: Cannot stat: Input/output error ...
It was a long night, with G. setting up a brand new system and me trying to get our data off the unresponsive disk. My googling was relatively futile and dd conv=noerror did not do me a lot of good, fortunately G. found a link to dd_rescue, a program which can decrease its read block size in case of error and continue reading. This particular incarnation by Kurt Garloff writes sparse files, which also let me see the progress of the operation, here only 1972 ko of the 2 Go have been recovered (no, the size of the file is not 2 G, but calling dd_rescue with -r to start from the end and read backwards would create a sparse file of that size):
1972 -rw-r--r-- 1 root root 2043904 Nov 3 00:08 hdc5
It was unnerving to see dd_rescue wait an eternity for every bad sector. It would have taken days that to restore the whole 40 Go that way, in the end I stopped the dd_rescue process whenever it encountered an error (which it can do by itself with -e 1) and after two hours I got as many gigaoctets.
dd_rescue: (info): ipos: 0.0k, opos: 0.0k, xferd: 0.0k * errs: 0, errxfer: 0.0k, succxfer: 0.0k +curr.rate: 0kB/s, avg.rate: 0kB/s, avg.load: 0.1% dd_rescue: (warning): /dev/hdc6 (0.0k): Input/output error! dd_rescue: (fatal): maxerr reached! Summary for /dev/hdc6 -> hdc6: dd_rescue: (info): ipos: 0.0k, opos: 0.0k, xferd: 0.0k errs: 1, errxfer: 0.0k, succxfer: 0.0k +curr.rate: 0kB/s, avg.rate: 0kB/s, avg.load: 0.1%
The only solution I could conceive of was to wrap dd_rescue in a perl
script that would skip a configurable amount of data when an error was
encountered. The script should try to read backwards from that position in
order to determine the end position of the bad area. After some swearing I
saved the script under the name of ddrs. I
also tried reducing the block size and drive redahead in order to avoid
running into bad sectors; this might or might not have helped, I could not
see much of a difference. fdisk -m 0 -a 0 -A 0 -d 0 /dev/hdc
/usr/local/bin/ddrs /dev/partition [startpos] [blocksize in ko] ddrs takes the last component of the partition name as a result filename, if this does not suit you, make a symlink
With a skip size of 1 Mo (1024), the process still took over a day for 40 Go, but more than 90 % of the "dead" disk could be read. A pattern emerged: generally, the more frequently used blocks were less likely to be recoverable, blocks written to rarely could often be read. To find that out, of course I still had to access the damaged filesystem.
Disk Content analysis
After dd_rescuing and ddrs'ing, the filesystems still would not pass an e2fsck (I tried, read only of course). I had expected this, but that even giving alternate superblocks 8193, 16385, 32769 etc did not work was unnerving. I had already moved to using my good friend debugfs, when I discovered recent versions of mke2fs used a default option of -s sparse-super-flag 1. mke2fs -n /dev/device (n is the pretend option, it does not really overwrite the file system) told me where to find backup superblocks.
# mke2fs -n /dev/hdc5 mke2fs 1.27 (8-Mar-2002) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 128256 inodes, 256024 blocks 12801 blocks (5.00%) reserved for the super user First data block=0 8 block groups 32768 blocks per group, 32768 fragments per group 16032 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376
Disclaimer: an e2fsck call with an appropriate backup superblock (see above) might deliver all the files you will ever want, but it will modify the image, so I would recommend making a copy first. My approach was slightly different: I tried finding dirs in all inodes of the disk (tedious), ran strings (rather od) over the disk to find ascii data and searched for gzip/bzip2 headers at supposed block boundaries (the latter proved hardly successful, but ymmv).
# debugfs -s 163840 -b 4096 -c hdc6 debugfs 1.27 (8-Mar-2002) 163840: Bad superblock number - debugfs Segmentation fault (core dumped) # debugfs -c hdc6 debugfs 1.27 (8-Mar-2002) hdc6: Bad magic number in super-block while opening filesystem debugfs: cd / cd: Filesystem not open debugfs: open -s 163840 -b 4096 -c hdc6 hdc6: catastrophic mode - not reading inode or group bitmaps debugfs: debugfs: cd / debugfs: ls -l 2 40755 (2) 0 0 4096 28-Jun-2004 21:59 . 2 40755 (2) 0 0 4096 28-Jun-2004 21:59 .. 11 40755 (2) 0 0 16384 20-Dec-2001 10:43 lost+found 32705 0 (2) 0 0 0 1-Jan-1970 01:00 aoe ... debugfs: cd aoe aoe: Ext2 inode is not a directory
After working around what looked like a Debian Woody bug which kept me from passing a backup superblock on the command line (see above), the debugfs internal open command with the same syntax worked. Unfortunately many of the directories were unreadable. I brute force tried all the inodes for directory content, someone might want to suggest a better solution. The number of inodes was taken from debugfs' stats command, but mke2fs -n, dumpfs, e2fsck and a number of other tools would lead to the same result.
perl -e '$i=0;print "open -s 4096000 -b 4096 -c hdc6\n";for ($i=0;$i< 3172288 ;$i++){print "ls <$i>\n"}' \ | DEBUGFS_PAGER=cat /sbin/debugfs -c hdc6 2>&1 \ | grep -vE "^debugfs: \$|Ext2 inode is not a directory" > hdc6.ls
Now I had lots of directories to walk through.
With a list generated from a command like the one above, the lost
directories could be accessed and rdumped to the filesystem.
What remained were the files whose directories were unreadable (and the
unreadable files themselves, but except for repeated dd_rescue or
professional data restoration there is not much hope for those).
strings -8 -t d partition > partition.od8
or
od.sarge -s8 -Ad partition > partition.od8
generated a string index which was about two orders of
magnitude smaller than the original disk, mor for the root fs, less for
data and home filesystems. 200 Mo files were greppale enough compared with
partitions of many gigaoctets. Caveat: old versions of od use 32 bit
signed long ints, thus wrapping to -2*220 after 2 Go, strings is
therefore the more stable solution.
# time grep \\.profile partition.ls partition.od8 partition.ls: 196230 (20) .bash_logout 196231 (16) .tclshrc 196232 (16) .profile partition.ls: 196230 (20) .bash_logout 196231 (16) .tclshrc 196232 (16) .profile partition.od8:75089812 ${HOME:-.}/.profile partition.od8:80743184 .profile ... partition.od8:25947525617 .profile real 0m7.072s user 0m0.620s sys 0m1.670s
If it is not obvious, the data from the strings-file can be used to copy
blocks, as at least for small files the probability of entire files being
saved in one piece is high. dd bs=4096 skip=`perl -e 'print
int(75089812/4096)'` count=10 if=partition of=outfile
in this case
yields an entirely unusable part of a binary
(${HOME:-.}/.profile
was part of an environment string, so
...), but it's the spirit that counts. A second ddrs run with a smaller block
size did not yield much: for i in hdc2 hdc3 hdc6; do ddrs /dev/$i 0
100 | tee $i.bad; done
Conclusion
In the end, I got almost all I wanted. The mysql directories were lost and unrecoverable, as disk thrashing seems to have ruined that part of the disk entirely, maybe in the last powered on hours of the usage life, but the backups on the same disks could fortunately be read without a hitch. My script could only be found with the strings method, interestingly enough it was saved twice in two successive blocks; they were identical. The mailing list config files were easily identified by some of the known recipients (and the db format mailman uses), and the mailing list archives were accessible with debugfs (cd <inode>).
$ grep eptember partition.ls 130056 (24) 2002-June.txt 829057 (24) 2002-September ... # debugfs hdc3 debugfs 1.27 (8-Mar-2002) hdc3: Bad magic number in super-block while opening filesystem debugfs: ls ls: Filesystem not open debugfs: open -b 4096 -s 163840 -c hdc3 hdc3: catastrophic mode - not reading inode or group bitmaps debugfs: cd <829057> debugfs: cd .. debugfs: cd .. debugfs: ls 178861 (12) . 991661 (12) .. 325136 (20) mailman.mbox 585487 (16) mailman debugfs: rdump mailman . rdump: EXT2 directory corrupted while dumping ./mailman/2002-November rdump: EXT2 directory corrupted while dumping ./mailman/attachments/20040921 rdump: EXT2 directory corrupted while dumping ./mailman/2004-September debugfs:
What I Learned Today
- Always make a backup on a different disk, as a backup on a failing disk is not worth a lot.
- Pesky peecee bioses do not boot from disabled Drives, so you should activate the cdrom drive you want to boot your rescue system from.
- If you get a chance to copy data from a dying disk, save the important stuff (cgi-scripts, mysql-dirs, sql dumps, source trees, ...) first, as there is a lot more garbage than saveworthy data.
- On a badly damaged disk, dd_rescue itself is too slow, as it tries to read each bad sector again and again, each try blocking for minutes.
- Backup superblocks are not located at 8193 any more, despite a range of e2* tools telling you so, but at 32768, 98304, 163840, 229376, ...
- Woody debugfs does not accept superblock and block size arguments on
the commandline, therefore you should
open -s superblock -b blocksize -c filesystemcopy.
strings -td
helps creating a index that is far easier to grep than the original disk image.- debugfs is a nifty tool, albeit less so since the unerase functionality has been defeated with ext3.