looking for sysv sum(1) algorithm
Tom Christiansen
tchrist at convex.COM
Mon Jun 3 15:57:11 AEST 1991
>From the keyboard of herbie at dec07.cs.monash.edu.au (Andrew Herbert):
:Hello all.
:
:Can anyone tell me where I can find a description of the SysV sum(1)
:checksum algorithm, or some code which implements it? I am using
:SysVR4, but couldn't find anything to do this in the standard libraries.
I think I can tell you. I have no SysVr4 source code, so had to reverse
engineer what's going on by taking a working emulation of a SysV sum(1)
program written in perl (after confirming it really does give the same
output as sum(1)) and then looking to see what perl's doing inside. But I
get the same results, so something here must be right.
To start with, this perl code seems to emulate the sum(1) command fairly
well, as found on a SysV system I have lying around here:
while (<>) {
$checksum += unpack("%31C*", $_);
$checksum %= 65535;
$bytes += length;
if (eof) {
printf "%d %d %s\n", $checksum, ($bytes+511/512, $ARGV;
$checksum = $bytes = 0;
}
}
Speed freaks might take note that the following rendition actually
faster than the C code! Big buffers pay off.
while ($ARGV = shift) {
warn("can't open $ARGV: $!"), next unless open ARGV;
while (read(ARGV,$_,16 * 512)) {
$checksum += unpack("%31C*", $_);
$checksum %= 65535;
$bytes += length;
}
printf "%d %d %s\n", $checksum, ($bytes+511)/512, $ARGV;
$checksum = $bytes = 0;
}
Of course, this doesn't really help you to know what's going on until you
know what unpack() is doing. Looking in perl/src/doio.c, in the function
do_unpack(), you find that what's happening is basically the following
(loosely transcribed):
checksum = 31; /* from the %31C in unpack */
sum = 0;
unsigned char *sp = string; /* string is a (char *) pointing to $_
while (*sp) sum += *sp++;
sum &= (1 << checksum) - 1;
return sum;
That's what happening for each record. If you look at the above perl
code, we add in this sum to our running $checksum variable each time
through the perl while loop, and then modulo it by 65535 each time (not
65536) to keep it small. Then when each file runs out, we output this
value, the number of 512-byte blocks, and the file's name.
Hope this helps.
--tom
--
Tom Christiansen tchrist at convex.com convex!tchrist
"Perl is to sed as C is to assembly language." -me
More information about the Comp.unix.programmer
mailing list