Hard Disk Repair Tricks on a Two-Drive System

Tue Mar 26 17:49:33 AEST 1991

This posting covers several tricks I came up with while trying to repair
a 20M Miniscribe hard disk by connecting it to a two-drive 3B1 with the
bad disk set up as the second drive.  Most or all of these tricks can be
applied to hard disk repairs on a single-drive system if you boot from a
floppy file system and have the appropriate utilities available on it.

Remember to Set The Drive Select Jumpers
----------------------------------------
   When using a two-drive system for implementing hard disk repairs
   don't forget to change the drive select jumpers or DIP switches to
   specify the bad drive as the second drive.

Partition Assignments
---------------------
   Those of you with two-drive systems should already know this, but
   the bad drive will be seen residing at the /dev entries [r]fp010 -
   [r]fp012.  These devices can be accessed as follows,

	rfp010  - Use with iv commands to do formats, surface checks, etc.
	rfp011	- Can build tmp file system here, eg. "mkfs /dev/rfp011"
	rfp012	- Use with fsck, fsdb, and ncheck
	fp010	- (not used)
	fp011	- Use to mount tmp file system, eg. "mount /dev/fp011 /mnta"
	fp012	- Use to mount the broken file system and recover files

Mounting the Bad Disk to Access its Binarys
-------------------------------------------
   You may find you want to execute programs still on the bad hard disk,
   particularly if you are trying to repair a disk on a single-drive
   system by booting up from a floppy disk.  If you mount the disk and
   are careful to only read from it, you should not do any additional
   damage, and most of the files should still be readable unless the
   disk is really in bad shape.  You may want to execute something like,

	PATH=/mntb/usr/bin:$PATH

   to add the hard disk binary directories to your PATH.

Disk Error Logging
------------------
   When accessing the second drive through shell commands (and iv in
   particular) disk errors are logged to /usr/adm/unix.log.  Look for bit
   0x0008 in the "SDH:" value to see if errors are associated with drive 0
   or drive 1.  I found this information invaluable and I know no way to
   get the same information when exercising the disk using the diagnostic
   disk commands.

Formatting the Second Drive
---------------------------
   Iv knows how to format the second drive, and even says "formatting
   second drive" when doing it.  However, the disk drive descriptor file
   is required to have "HD2" specified on the "type" line.  For example,
   for the 20M Miniscribe 3425,

	type            HD2
	name            WINCHE
	cylinders       612
	heads           4
	sectors         17
	steprate        0
	$
	badblocktable   1
	loader          /usr/lib/iv/loader
	$
	$
	0
	4
	504
	$
	$

Reformatting Just the Swap Partition
------------------------------------
   If you have a disk whose swap partition has gone bad, but is otherwise
   in good shape, you can chose to just reformat the swap partition.  This
   takes a couple of steps, because you are going to fool iv into thinking
   the disk is smaller than it really is while it is formatting.  Start by
   obtaining the disk descriptor using "iv -d".  You get a descriptor that
   looks similar to the one above, but perhaps with some bad block
   entries.  Next, figure out how many cylinders are actually used up
   through the end of the swap partition.  For the example above, 504
   tracks divided by 4 tracks per cylinder equals 126 cylinders.  Edit the
   descriptor so that the smaller number of cylinders is specified and so
   that only two partitions are defined.  I chose to decrement my cylinder
   count by one just in case there was boundary condition bug in my (or
   iv's) reasoning (this simple results in one cylinder not being
   reformatted (it still has its original format), which shouldn't be a
   problem unless you customarily use up all your swap space).  Using the
   same disk example from above,

	type            HD2
	name            WINCHE
	cylinders       125
	heads           4
	sectors         17
	steprate        0
	$
	badblocktable   1
	loader          /usr/lib/iv/loader
	$
	$
	0
	4
	$
	$

   Now when you run format, specify "iv -i /dev/rfp010 fake_desc", where
   "fake_desc" is the one you just edited.  After formatting, iv will
   rewrite the VHB and loader tracks.  If you run iv -t, you will see that
   iv now thinks the disk is smaller.  To complete the process and regain
   use of the whole disk, you run iv -u with the original descriptor
   specified (the 612 cylinder descriptor in the example).  This simply
   causes the correct descriptor to be rewritten to the VHB and that
   redefines the disk size and partitions properly.

   You can use the same method of formatting a small portion of the disk
   to just reformat the VHB and loader tracks too.  This might be useful
   if you want change your loader to be the verbose loader,
   /usr/lib/iv/s4load.verbose.

Building a Temporary File System on /dev/rfp011
-----------------------------------------------
   While the bad disk is connected as the second drive, its swap partition
   is unused.  You can reformat it as described above, then build a file
   system on it to gain several Meg. of temporary storage.  Simply run
   "mkfs /dev/rfp011", then "mount /dev/fp011 /mnta" to build the file
   system and mount it.  I found it very useful to copy junky, hard to
   read stuff from the bad partition to the temporary partition prior to
   writing it out to floppies.  Note, cpio archives beyond the point of
   the failing file are not usable if cpio fails on a read while trying
   to archive a file.  Cpio apparently commits itself with a file header
   on the output stream prior to testing to see if it can actually read
   the file, and then isn't smart enough to write out a null file body.

Surface Checks
--------------
   Once you've copied all the files you want (or can get) off the bad disk
   you can reformat the whole disk and run surface tests on it.  The
   iv command "iv -sw[l]" works great, and again any disk errors occurring
   get logged to /usr/adm/unix.log.  The -l (long) option causes the test
   to repeat 10 times.  It takes about 3 hours on a 20M disk.

DRUN Patch
----------
   As I mentioned in a previous posting, if you are getting disk errors
   appearing in unix.log after formatting a disk and running surface
   checks on it, you may need to install the DRUN rework on your system.
   This applies regardless of whether you have one or two drives, and a
   WD1010 or a WD2010 disk controller chip.  I was getting about a dozen
   disk errors each time I ran a surface check until I installed the DRUN
   rework.  Since then I haven't seen a single disk error appear in the
   log.

I hope you find this information helpful and can find it to refer to when
you need it.  I know it would have saved me some time.  Also, I'd like to
see someone post a decent tutorial on the use of fsck and fsdb to repair
bad disks.  I mean a disk with unreadable blocks, some of which might be
directory blocks.  I've found a way that didn't work.  I'd like to see
if someone knows a better way.

---
	Craig V. Johnson		...!fluke!vince
	John Fluke Mfg. Co.			or
	Everett, WA			vince at tc.fluke.com

DISCLAIMER (I supposed it's necessary): Muck with your hard disks at your
own risk.  Don't believe anything I said, and legally we will both be
happy.  Nothing stated in this posting has anything to do with John Fluke
Mfg. Co., so leave them out of it.