Sick 3B1
Kevin Darcy
kevin at cfctech.UUCP
Fri Dec 8 16:19:21 AEST 1989
For all you UNIXPC hardware-hackers out there sick of seeing "HELP ME!"
articles and mail, I apologize (I hate seeing it too, but I've been
struggling with this stuff for 3 weeks now and I am so frustrated I
have to do something), but...
My poor old 2-year-old 3B1 is sick. Specifically, its disk is sick.
The symptoms are system panics of the type "I/O error in push", with
accompanying hard disk (controller?) info of the form
HDERR ST:51 EF:10 CL:FF65 CH:FF02 SN:FF0B SC:FF01 SDH:FF26 DMACNT:FFFF DCRREG:96 MCRREG:8500 Thu Dec 7 22:07:38 1989
HDERR ST:51 EF:10 CL:FF65 CH:FF02 SN:FF0B SC:FF01 SDH:FF26 DMACNT:FFFF DCRREG:96 MCRREG:8500 Thu Dec 7 22:07:39 1989
HDERR ST:51 EF:10 CL:FF65 CH:FF02 SN:FF0B SC:FF01 SDH:FF26 DMACNT:FFFF DCRREG:96 MCRREG:8500 Thu Dec 7 22:07:39 1989
HDERR ST:51 EF:10 CL:FF65 CH:FF02 SN:FF0B SC:FF01 SDH:FF26 DMACNT:FFFF DCRREG:96 MCRREG:8500 Thu Dec 7 22:07:39 1989
which, of course, is also appearing in my /usr/adm/unix.log.
When I first started getting these panics, I booted the diagnostic disk, and
ran hard disk tests. During the "recal" phase, many blocks showed up as
unreadable. Multiple passes of the recal phase would always show up bad blocks,
but not always the SAME bad blocks, although some would show up much more than
others. The "surface test" would rarely find any more bad blocks than the
"recal" would. After backing up my system, I went in and mapped a whole bunch
of the worst offending blocks, and everything seemed to be working just fine
(with the exception of the occasional buzzing of the drive which accompanies
the read errors). I also opened up the machine, cleaned out the dust bunnies,
checked everything visually, and disconnected and reconnected the disk drive
cables.
Now, two weeks later, the machine is starting to panic again. Last night I
had init croak because of a disk error.
I post this because, not being much of a hardware hacker, there are a lot of
things that I do not understand about this whole situation:
1) If a disk block is "unreadable" on one pass of the "recal" phase, and it is
indeed a media error, how can it pass on the next?
2) What could cause so many errors at once (I've mapped about 20 bad blocks
in the last 3 weeks; in the previous two years I've had the machine, only
3 had to be mapped, and the drive ran happy as a clam)?
3) Why does the "surface test" appear to be no more rigorous than the "recal"
phase?
4) How would I tell the difference between a controller problem or a bona fide
disk problem on the 3B1?
5) Is reformatting my next step?
I also have some related quasi-technical questions:
1) If the hard disk is in need of replacement, am I limited to the same
disk drive (I know that *second* disk drives can vary, but I haven't been
around the UNIXPC block enough to know whether its bootstrap stuff expects
a certain hardware configuration on the primary boot device)? I would love
to use something bigger...
2) If I have to use a Miniscribe 6085, where is a good source for them (I
expect AT&T wants a ridiculous amount)? I wouldn't mind buying a
whole 3B1 and using it completely for parts, either.
3) I'm also looking for a good source for tape drives. This after I sat down
over the long weekend and backed up my machine on 147 floppies. After that
experience, I will never in my life buy a computer with insufficient backup
capability.
4) Why is floppy disk I/O on the UNIXPC so hacked up? From what I can tell,
there are 2 /dev entries for the built-in floppy drive, one of which
(/dev/rfp020) gives me nothing but errors when I try to cpio to it, and the
the other (/dev/rfp021) appears to work fine when writing, except whenever
I try to do a cpio file listing, it always gives the error "Out of phase -
get help" and stops on the second diskette of the series. I realize that
the floppy on the 3B1 is really "meant" to be used from the ua menus, but,
in the absence of being able to see what is on my diskettes, I'm really
scared that one or another of the diskettes in the backup I made from the
ua menu could turn up bad, and I will have no way of getting at the data
beyond that point without some serious hacking.
If it matters, the machine is a vanilla 2-Meg RAM, 67 Mb hard disk fire-sale
3B1. OS=3.51. I would be greatly appreciative of replies via article or e-mail.
Please no "RTFM" replies unless you can cite specific references from the
manuals which come with the machine.
------------------------------------------------------------------------------
kevin at cfctech.UUCP | Kevin Darcy, Asst. Unix Systems Admin.
...[mailrus!]sharkey!cfctech!kevin | MIS, Technical Services
Voice: (313) 948-4863 | Chrysler Financial Corp.
948-4975 | 27777 Franklin, Southfield, MI 48034
------------------------------------------------------------------------------
More information about the Unix-pc.general
mailing list