perl-based UNIXPC disk error message browser

Sun Jun 25 03:18:41 AEST 1989

I suppose others have written awk scripts or whatever to glean clues from
the unix.log file, but I haven't seen anything posted, so here goes.

I put some of the high points from John Milton's "Hardware Notes #13" 
into a perl script, and it made deciphering the hard disk error messages
in my unix.log file much easier. John's discussion of the WD1010 error
register is included in the comment block at the beginning of the script.
Also, there are three variables, $HEADS, $BADLIST, and $SWAPSIZE, that you'll 
want to change for your particular disk.  (No provision as yet for
more than one drive.)

If you have perl, you might find this useful; otherwise, just hit
the 'n' key...

Mike Peterson                         Domain: mkp at mti.com
Micro Technology, Inc.                  UUCP: uunet!mti!mkp
5065 E. Hunter Ave., Anaheim, CA 92807  home: ...!{mti,hacgate}!taqwa!mkp

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed tncli as standard input via
# unshar, or by typing "sh <file", e.g..  If tncli archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  badblocks
# Wrapped by mkp at taqwa on Sat Jun 24 11:09:08 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'badblocks' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'badblocks'\"
else
echo shar: Extracting \"'badblocks'\" \(3457 characters\)
sed "s/^X//" >'badblocks' <<'END_OF_FILE'
Xeval "exec /usr/bin/perl -S $0 $*"
X	if $running_under_some_shell;
X
X#
X# perl hack for interpreting UNIXPC hard disk error messages
X#
X# example log line:
X#
X# pid 0: #HDERR ST:51 EF:10 CL:42D4 CH:4201 SN:4200 SC:4202 \
X# SDH:4223 DMACNT:FFFF DCRREG:93 MCRREG:9100 Fri May 19 19:25:14 1989
X#
X# Calculations courtesy of John B. Milton IV, "Hardware Notes # 13"
X# jbm at uncle.uucp 
X#
X# sez Milton:
X#
X# EF: The "error register" from the WD1010
X#   Bit 7 Bad Block Detect. From what I can tell about how things are done on
X#     our systems, this feature is not used. We use a direct mapping method where
X#     the position of bad blocks is determined by the bad block table. If this
X#     gets turned on, it is some kind of glitch on the disk.
X#   Bit 6 CRC Data Field. This one deserves a direct quote:
X#       "This bit is set when a CRC error occures in the data
X#        field. With Retry enabled, ten more attempts are made
X#        to read the sector correctly. If none of these attempts
X#        are successful, the Error Status is set also (bit 0 in
X#        the Status Register). If one of the attempts is suc-
X#        cessful, this bit remains set to inform the Host that
X#        a marginal condition exists. However, the Error Status
X#        bit is not set. Even if errors exist, the data can be read."
X#     On our machines, if bits 7, 5, 1 or 0 are set or if the error register is
X#     not zero!, or if there was DMA trouble, an HDERR message will be printed.
X#     This is extremely good. It means every time there is the slightest flicker
X#     in the data, you will get an error message. If you get only one, the error
X#     is probably transient and does not mean anything. You should NOT try to
X#     lock out the block! If you get a bunch of CRC errors, but a good read,
X#     this is probably a weak spot and should be locked out.
X#   Bit 5 Reserved. Always zero.
X#   Bit 4 ID not found. Like CRC, this bit is set when the ID field for the
X#     requested sector can not be found, or has a bad CRC.
X#   Bit 3 Reserved. Always zero.
X#   Bit 2 Aborted Command. Should never happen on our system. If you get it, it
X#     probably means BAD power line trouble.
X#   Bit 1 Track Zero Error. This is very bad, and usually indicates a very bad
X#     hardware failure in the drive, so you'll never see it until you get a
X#     second hard drive on your system :)
X#   Bit 0 Data Address Mark Not Found. Yet another thing not found.
X#
X
X# The following are drive specific; set for your drive.
X# You can get these values by running:  iv -tv /dev/rfp000
X# Don't screw up.
X$HEADS = 8;
X$BADLIST = 64;
X$SWAPSIZE = 5000;
X
Xopen(unixlog, 'grep HDERR /usr/adm/unix.log|') || 
X	die('cannot open & sort unix.log');
X
Xwhile(<unixlog>)
X{
X	if(/EF:(..) CL:..(..) CH:..(..) SN:..(..) SC:..(..) SDH:...(.)/)
X	{
X		$err = hex($1);
X		$locyl = hex($2);
X		$hicyl = hex($3);
X		$secnum = hex($4);
X		$count = hex($5);
X		$head = hex($6&0x7);
X		$cyl = $hicyl*256 + $locyl;
X
X		$sector = (((($cyl)*$HEADS)+($head))*16) + $secnum;
X		$absblock = $sector/2;
X		$block = $absblock - ($BADLIST + $SWAPSIZE);
X		printf("    %4d/%d/%d\t%6d\t", $cyl, $head, $secnum, $block);
X		if($err&0x040)
X		{
X			printf("<CRC>");
X		}
X		if($err&0x10)
X		{
X			printf("<ID>");
X		}
X		if($err&0x4)
X		{
X			printf("<ACMD>");
X		}
X		if($err&0x2)
X		{
X			printf("<TZE>");
X		}
X		if($err&0x1)
X		{
X			printf("<DAM>");
X		}
X		if(/MCRREG:.... (........................)/)
X		{
X			printf("\t%s", $1);
X		}
X		printf("\n");
X
X	}
X}
END_OF_FILE
if test 3457 -ne `wc -c <'badblocks'`; then
    echo shar: \"'badblocks'\" unpacked with wrong size!
fi
chmod +x 'badblocks'
# end of 'badblocks'
fi
echo shar: End of shell archive.
exit 0
-- 
Mike Peterson                       Internet: mkp at mti.com
Micro Technology, Inc.                  UUCP: uunet!mti!mkp
5065 E. Hunter Ave., Anaheim, CA 92807  home: ...!hacgate!taqwa!mkp