Hard Disk Repair Tricks on a Two-Drive System
Craig Johnson
vince at tc.fluke.COM
Tue Mar 26 17:49:33 AEST 1991
This posting covers several tricks I came up with while trying to repair
a 20M Miniscribe hard disk by connecting it to a two-drive 3B1 with the
bad disk set up as the second drive. Most or all of these tricks can be
applied to hard disk repairs on a single-drive system if you boot from a
floppy file system and have the appropriate utilities available on it.
Remember to Set The Drive Select Jumpers
----------------------------------------
When using a two-drive system for implementing hard disk repairs
don't forget to change the drive select jumpers or DIP switches to
specify the bad drive as the second drive.
Partition Assignments
---------------------
Those of you with two-drive systems should already know this, but
the bad drive will be seen residing at the /dev entries [r]fp010 -
[r]fp012. These devices can be accessed as follows,
rfp010 - Use with iv commands to do formats, surface checks, etc.
rfp011 - Can build tmp file system here, eg. "mkfs /dev/rfp011"
rfp012 - Use with fsck, fsdb, and ncheck
fp010 - (not used)
fp011 - Use to mount tmp file system, eg. "mount /dev/fp011 /mnta"
fp012 - Use to mount the broken file system and recover files
Mounting the Bad Disk to Access its Binarys
-------------------------------------------
You may find you want to execute programs still on the bad hard disk,
particularly if you are trying to repair a disk on a single-drive
system by booting up from a floppy disk. If you mount the disk and
are careful to only read from it, you should not do any additional
damage, and most of the files should still be readable unless the
disk is really in bad shape. You may want to execute something like,
PATH=/mntb/usr/bin:$PATH
to add the hard disk binary directories to your PATH.
Disk Error Logging
------------------
When accessing the second drive through shell commands (and iv in
particular) disk errors are logged to /usr/adm/unix.log. Look for bit
0x0008 in the "SDH:" value to see if errors are associated with drive 0
or drive 1. I found this information invaluable and I know no way to
get the same information when exercising the disk using the diagnostic
disk commands.
Formatting the Second Drive
---------------------------
Iv knows how to format the second drive, and even says "formatting
second drive" when doing it. However, the disk drive descriptor file
is required to have "HD2" specified on the "type" line. For example,
for the 20M Miniscribe 3425,
type HD2
name WINCHE
cylinders 612
heads 4
sectors 17
steprate 0
$
badblocktable 1
loader /usr/lib/iv/loader
$
$
0
4
504
$
$
Reformatting Just the Swap Partition
------------------------------------
If you have a disk whose swap partition has gone bad, but is otherwise
in good shape, you can chose to just reformat the swap partition. This
takes a couple of steps, because you are going to fool iv into thinking
the disk is smaller than it really is while it is formatting. Start by
obtaining the disk descriptor using "iv -d". You get a descriptor that
looks similar to the one above, but perhaps with some bad block
entries. Next, figure out how many cylinders are actually used up
through the end of the swap partition. For the example above, 504
tracks divided by 4 tracks per cylinder equals 126 cylinders. Edit the
descriptor so that the smaller number of cylinders is specified and so
that only two partitions are defined. I chose to decrement my cylinder
count by one just in case there was boundary condition bug in my (or
iv's) reasoning (this simple results in one cylinder not being
reformatted (it still has its original format), which shouldn't be a
problem unless you customarily use up all your swap space). Using the
same disk example from above,
type HD2
name WINCHE
cylinders 125
heads 4
sectors 17
steprate 0
$
badblocktable 1
loader /usr/lib/iv/loader
$
$
0
4
$
$
Now when you run format, specify "iv -i /dev/rfp010 fake_desc", where
"fake_desc" is the one you just edited. After formatting, iv will
rewrite the VHB and loader tracks. If you run iv -t, you will see that
iv now thinks the disk is smaller. To complete the process and regain
use of the whole disk, you run iv -u with the original descriptor
specified (the 612 cylinder descriptor in the example). This simply
causes the correct descriptor to be rewritten to the VHB and that
redefines the disk size and partitions properly.
You can use the same method of formatting a small portion of the disk
to just reformat the VHB and loader tracks too. This might be useful
if you want change your loader to be the verbose loader,
/usr/lib/iv/s4load.verbose.
Building a Temporary File System on /dev/rfp011
-----------------------------------------------
While the bad disk is connected as the second drive, its swap partition
is unused. You can reformat it as described above, then build a file
system on it to gain several Meg. of temporary storage. Simply run
"mkfs /dev/rfp011", then "mount /dev/fp011 /mnta" to build the file
system and mount it. I found it very useful to copy junky, hard to
read stuff from the bad partition to the temporary partition prior to
writing it out to floppies. Note, cpio archives beyond the point of
the failing file are not usable if cpio fails on a read while trying
to archive a file. Cpio apparently commits itself with a file header
on the output stream prior to testing to see if it can actually read
the file, and then isn't smart enough to write out a null file body.
Surface Checks
--------------
Once you've copied all the files you want (or can get) off the bad disk
you can reformat the whole disk and run surface tests on it. The
iv command "iv -sw[l]" works great, and again any disk errors occurring
get logged to /usr/adm/unix.log. The -l (long) option causes the test
to repeat 10 times. It takes about 3 hours on a 20M disk.
DRUN Patch
----------
As I mentioned in a previous posting, if you are getting disk errors
appearing in unix.log after formatting a disk and running surface
checks on it, you may need to install the DRUN rework on your system.
This applies regardless of whether you have one or two drives, and a
WD1010 or a WD2010 disk controller chip. I was getting about a dozen
disk errors each time I ran a surface check until I installed the DRUN
rework. Since then I haven't seen a single disk error appear in the
log.
I hope you find this information helpful and can find it to refer to when
you need it. I know it would have saved me some time. Also, I'd like to
see someone post a decent tutorial on the use of fsck and fsdb to repair
bad disks. I mean a disk with unreadable blocks, some of which might be
directory blocks. I've found a way that didn't work. I'd like to see
if someone knows a better way.
---
Craig V. Johnson ...!fluke!vince
John Fluke Mfg. Co. or
Everett, WA vince at tc.fluke.com
DISCLAIMER (I supposed it's necessary): Muck with your hard disks at your
own risk. Don't believe anything I said, and legally we will both be
happy. Nothing stated in this posting has anything to do with John Fluke
Mfg. Co., so leave them out of it.
More information about the Comp.sys.3b1
mailing list