new survey to supplement arbitron. Please run this program.
Joe Buck
jbuck at epimass.EPI.COM
Sat May 20 08:08:10 AEST 1989
Brian, your program, if invoked in the way you request, will process
crossposted articles N times, where N is the number of groups present.
Please, let's not waste net resources by conducting a large-scale survey
with a basic error in it.
Rather than do a "find" to locate article names, you can count
crossposted articles only once by reading the history file to obtain
article filenames. Since this is going to alt.sources, I obviously
need to include a source: here is a perl program that eats a history
file and spits out a sorted list of host pairs, showing the links your
news has travelled through.
------------------------------ cut here ------------------------------
#! /usr/bin/perl
# This perl program scans through all the news on your spool
# (using the history file to find the articles) and prints
# out a sorted list of frequencies that each pair of hosts
# appears in the Path: headers. That is, it determines how,
# on average, your news gets to you.
#
# If an argument is given, it is the name of a previous output
# of this program. The figures are read in, and host pairs
# from articles newer than the input file are added in.
# So that this will work, the first line of the output of the
# program is of the form
# Last-ID: <5679 at chinet.UUCP>
# (without the # sign). It records the last Message-ID in the
# history file; to add new articles, we skip in the history file
# until we find the message-ID that matches "Last-ID".
$skip = 0;
if ($#ARGV >= 0) {
$ofile = $ARGV[0];
die "Can't open $ofile!\n" unless open (of, $ofile);
# First line must contain last msgid to use.
$_ = <of>;
($key, $last_id) = split (' ');
die "Invalid input file format!\n" if ($key ne "Last-ID:");
$skip = 1;
# Read in the old file.
while (<of>) {
($cnt, $pair) = split(' ');
$pcount{$pair} = $cnt;
}
}
# Let's go.
die "Can't open history file!\n" unless open (hist, "/usr/lib/news/history");
die "Can't cd to news spool directory!\n" unless chdir ("/usr/spool/news");
$np = $nlocal = 0;
while (<hist>) {
#
# $_ contains a line from the history file. Parse it.
# Skip it if the article has been cancelled or expired
# If the $skip flag is true, we skip until we have the right msgid
#
($id, $date, $time, $file) = split (' ');
next if ($file eq 'cancelled' || $file eq '');
if ($skip) {
if ($id eq $last_id) { $skip = 0; }
next;
}
#
# format of field is like comp.sources.unix/2345 . Get ng and filename.
#
($ng, $n) = split (/\//, $file);
$file =~ tr%.%/%;
#
# The following may be used to skip any local groups. Here, we
# skip group names beginning with "epi" or "su". Change to suit taste.
#
next if $ng =~ /^epi|^su/;
next unless open (art, $file); # skip if cannot open file
#
# Article OK. Get its path.
while (<art>) {
($htype, $hvalue) = split (' ');
if ($htype eq "Path:") {
# We have the path, in hvalue.
$np++;
@path = split (/!/, $hvalue);
# Handle locally posted articles.
if ($#path < 2) { $nlocal++; last;}
# Create and count pairs.
for ($i = 0; $i < $#path - 1; $i++) {
$pair = $path[$i] . "!" . $path[$i+1];
$pcount{$pair} += 1;
}
last;
}
}
}
# Make sure print message comes out before sort data.
$| = 1;
print "Last-ID: $id\n";
$| = 0;
# write the data out, sorted. Open a pipe.
die "Can't exec sort!\n" unless open (sortf, "|sort -nr");
while (($pair, $n) = each (pcount)) {
printf sortf ("%6d %s\n", $n, $pair);
}
close sortf;
--
-- Joe Buck jbuck at epimass.epi.com, uunet!epimass.epi.com!jbuck
More information about the Alt.sources
mailing list