arbitron.pl
pat at kla
pat at kla
Sun Feb 11 14:24:49 AEST 1990
Here is a version of the usenet statistics gathering `arbitron'
program written in perl. The advantages are listed in the comments at
the beginning of the file. It should be run on a machine with read
access to all user's home directories (for access to their .newsrc
files.)
I consider this a Beta version, submitted for testing against a
variety configurations. Most of this program should be compatible
with perl 2.18, but for full functionality, perl 3.0 is required (for
socket and getpwent() support).
For reference, we are running a mix of Sun hardware (Sun3,
Sun4, Sun386i) and system versions (3.5, LeftCoast 4.0.3, RightCoast
4.0.1). We receive our news via uucp and maintain it under B-news
2.11 with relatively short expiration times. News is distributed
within our LAN via NNTP & rrn.
------------------------- Cut Here -------------------------------------------
#! /bin/perl
$VERSION = "pl-$1" if ( '$Revision: 1.6 $' =~ /([0-9]*\.[0-9]*) \$/ ) ;
#
#
# arbitron.pl -- this program produces rating sweeps for USENET
#
#
# Usage: arbitron.pl
# or: perl arbitron.pl
#
# This program is intended as a replacement for the sh/sed/awk/... based
# script in common use. This version offers the following advantages:
#
# 1) Greater portability.
# (Perl runs on a lot more systems than you can find compatible
# versions of Bourne shell, sed, awk, etc.)
# 2) More accurate reporting if the local expiration rate is very high.
# The shell script compared .newsrc contents against the active
# file. This version keeps an optional arb.last file to compare
# against.
# 3) No temporary files.
# 4) Can be run on systems which do not support multi-tasking.
# 5) If your mailer allows the specification of the subject as
# a command line parameter, this program will generate a
# subject line of the form `arbitron data for MonYYYY'.
# 6) Direct connection to NNTP socket in distributed environment.
# (Perl 3.0 is required for this option.)
# 7) Much easier to read. (Unless you are an awk & sed wizard :-)
#
# Disadvantages:
# 1) This version is slower than the Bourne shell version.
# (about 50% on a Sun3/60 running SunOS 3.5)
#
# Other differences:
# 1) Obtaining count of valid users:
#
# The script defaults to accepting all users within the specified
# range of userids; but provides optional mechanisms for counting
# actual userids stored in the wtmp(5) file.
#
# This program provides a generalized filter capability which can
# easily check the value of any of the fields from the password
# file. It was felt that this is a more useful mechanism in
# a networked environment.
#
#
# To use this program, edit the "configuration" section below so that the
# information is correct for your site, and then run it. It will produce a
# readership survey for your machine and mail that survey to decwrl.dec.com,
# with a cc to you.
#
# To participate in the international monthly ratings sweeps, run "arbitron"
# every month. The statistics program is run on the first day of each month;
# it will include any report that has reached it by that time. To make sure
# your site's data is included, run the survey program no later than the 20th
# day of each month.
#
# ---------------------------------------------------------------------------
# $Log: arbitron.pl,v $
# Revision 1.6 90/02/10 18:53:02 pat
# Added support for direct NNTP access of active group list.
# Extended comments.
# Removed several unnecessary variables.
# Added debugging flags $noSave and $noMail.
#
# Revision 1.5 90/01/11 20:27:32 pat
# Added get_user_info() subroutine to allow the use of getpwent() with
# perl3.0, but retain compatibility with perl2.18. Also allowed
# cleanup of code at beginning of user processing loop.
#
# Revision 1.4 90/01/11 19:01:40 pat
# Changed subroutine good_user and the loop that calls it to expect parameters
# as supplied by the perl-3.0 getpwent() function.
#
# Revision 1.3 90/01/11 17:26:41 pat
# Various clarifications in the comments.
# Moved date retrieval lines to just under initial comment cluster to obtain
# better grouping of the sections which are most likely to be customized.
# Changed default summary path to check for LOGNAME, then USER, to determine
# the local address to mail the report to. This is to accomidate
# alternate user names created purely for cron-activated jobs, etc.
# Removed intermediate variable used in determining host name.
# Changed generation of $destination to make it easier to remove/modify the
# subject parameters.
# Explicitly look for /usr/ucb/mail when building $destination, instead of
# relying upon path information.
#
# Revision 1.2 90/01/10 09:59:15 pat
# Fixed automatic extraction of $VERSION from RCS revision information.
# Added RCS update log.
#
# ---------------------------------------------------------------------------
# Debugging flags:
# Set $noSave to prevent writing the .arblast file.
# Set $noMail to write the report to stdout instead of mailing it.
$noSave = ;
$noMail = ;
# ---------------------------------------------------------------------------
#
# Find out the date. Sweeps are reported as MonYYYY.
@monthnames = ("Jan", "Feb", "Mar", "Apr", "May", "Jun",
"Jul", "Aug", "Sep", "Oct", "Nov", "Dec") ;
@tim = localtime(time) ;
$dat = $monthnames[$tim[4]] . $tim[5] ;
# ===========================================================================
# Configuration information. Edit this section to reflect your site data.
#
# $ARBLAST is the file to which the `last-reported' information
# is to be stored.
# $ACTIVE is the path to the news system's active file. It is only
# used if $NNTPserver is not set.
# $SORT_OUTPUT should be set if you want the report sorted by
# readership. Setting it to null will improve performance.
# $NNTPserver is the hostname or internet address of the local
# NNTP server. Direct NNTP connection requires a version
# of perl which supports sockets. If your version does
# not support sockets, or you do not use NNTP, you may
# simply set $NNTPserver to ''.
# $NNTPport is the port number to use to connect to your NNTP server.
# The standard allocation is 119. Check your /etc/services
# file or equivalent.
# $Me is the mail address of the local user to whom a copy of the
# report will be mailed. It may be null.
# $summarypath is a list of addresses to which a copy of the report
# will be mailed. It should be in the form expected by
# the mailer. To have your statistics included in the
# published network-wide reports, this list should include
# `netservey at decwrl.dec.com' (or the uucp equivalent
# {sun, hplabs, pyramid, decvax, ucbvax}!decwrl!netsurvey )
# $lowUID and $highUID delimit the range of /etc/passwd UID's which
# represent actual people (rather than maintenance accounts,
# daemons, or whatever.)
#
$ARBLAST = "/usr/lib/news/arb.last" ;
$ACTIVE = '/usr/lib/news/active' ;
$SORT_OUTPUT = 1 ; # Set to null if you don't want to sort
$SERVERfile = '/vol/local/lib/rn/server' ;
$NNTPserver = $ENV{'NNTPSERVER'} ;
if ((! $NNTPserver) && (open (server, "<$SERVERfile"))) {
chop ($NNTPserver = <server>) ;
close (server) ;
} ;
if (! $NNTPserver) { $NNTPserver = 'mailhost' ; } ;
$NNTPport = 119 ;
$Me = ( $ENV{'LOGNAME'} || $ENV{'USER'} || 'news' ) ;
$summarypath = "netsurvey at decwrl.dec.com $Me" ;
$lowUID = 100 ;
$highUID = 9999 ;
# This subroutine is passed an array consisting of a single entry from
# the password file. It should return true if that entry represents a
# valid user (to check for news usage), false otherwise.
# (Return value is value of last expression.)
#
# You may make this as complex as you deem appropriate. The default will
# reject any user with the password '*'; or with a uid less than $lowUID,
# or greater than $highUID; or with no home directory.
# The commented out lines show how to also reject specific users by name.
sub good_user {
local ($name, $passwd, $uid,
$gid, $quota, $comment,
$gcos, $homedir, $shell) = @_ ;
return ( ($passwd ne '*')
&& ($uid >= $lowUID) && ($uid <= $highUID)
&& ($homedir ne '')
&& ($name ne "realtime") && ($name ne "std_user")
&& ($name ne "eglbnch1")
) ;
}
# This is only used in the hostname stuff below.
# We need to find the uucp name of your host. If this code doesn't work,
# then just put it in literally like this:
# $hostname="kla"
#
# `uuname -n' doesn't work on bsd systems. `hostname' may not work on
# sysV or non-unix systems. For bsd, the hostname output is preferred
# because `uuname -l' will truncate to 7 chars. I'm not sure why it
# is not the preferred mechanism on sysV; but this is the sequence used
# in the script version.
if ( -d '/usr/ucb' ) { chop ($hostname = `hostname`) ; }
else { chop ($hostname = `uuname -n`) ; } ;
chop ($hostname = `uuname -l`) unless ($hostname) ;
chop ($hostname = `hostname`) unless ($hostname) ;
# The name of the base password file. This will not be referenced if
# passwords can be obtained via getpwent or `ypcat passwd'.
$PASSWD = '/etc/passwd' ;
# ---------------------------------------------------------------------------
# If your mailer can accept a subject on the command line, set $Subject
# to the appropriate sequence, otherwise comment out the assignment below.
# The default below works with /usr/ucb/mail on bsd derived systems.
$Subject = sprintf ('-s "arbitron data for %s"', $dat) ;
# If you are running on a system which does not support the creation
# of a mailer subtask to which the report can be piped, replace
# the $destination value with the name of a file which can be mailed
# later.
$Mailer = '/usr/ucb/mail' if ( -x '/usr/ucb/mail' ) ;
$Mailer = 'mail' unless ($Mailer) ;
$Mailer = $ENV{'MAILER'} || $Mailer ;
$destination = ($noMail ? ">-" : "| $Mailer $Subject $summarypath") ;
# End portion most likely to require site customization.
# ===========================================================================
#
# Determine what newsgroups are active
#
if ($NNTPserver) {
local ($sockaddr, $name, $aliases, $proto,
$type, $len, $thisaddr, $thataddr, $this, $that) ;
# If your perl does not support sockets, comment out the following
# code and make sure that $NNTPserver is set to '' above.
# Note: You may have to run makelib on sys/socket.h to create a
# perl version. Makelib should be in your perl distribution.
do 'sys/socket.h' || die "Can't do sys/socket.h: $@" ;
$sockaddr = 'S n a4 x8';
chop ($hostname = `hostname`);
($name, $aliases, $proto) = getprotobyname('tcp') ;
($name, $aliases, $NNTPport) = getservbyname ($NNTPport, 'tcp')
unless $NNTPport =~ /^\d+$/ ;
($name, $aliases, $type, $len, $thisaddr) = gethostbyname ($hostname) ;
($name, $aliases, $type, $len, $thataddr) = gethostbyname ($NNTPserver) ;
$this = pack ($sockaddr, &AF_INET, 0, $thisaddr) ;
$that = pack ($sockaddr, &AF_INET, $NNTPport, $thataddr) ;
socket (active, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!" ;
bind (active, $this) || die "bind: $!" ;
connect (active, $that) || die "connect: $!" ;
select (active) ; $| = 1 ; select (stdout) ;
$_ = <active> ;
die $_ unless m/^2/ ; # Extremely primitive result check.
print active "LIST\r\n" ; # Request list of newsgroups.
$_ = <active> ;
die $_ unless m/^215/ ; # Extremely primitive result check.
# End of socket-dependant stuff.
} else {
open (active, "$NEWS/active")
|| die "Could not open active file $NEWS/active" ;
} ;
sub get_active_line {
local ($line) ;
$line = <active> ;
if ($NNTPserver && ($line eq ".\r\n")) {
print active "QUIT\r\n" ;
$line = '' ;
} ;
return ($line) ;
}
for ($nactive = 0 ; $_ = do get_active_line() ; ) {
if (/(\S*)\s*([0-9]*)\s*([0-9]*)/) {
$1 =~ tr/A-Z/a-z/ ; # Canonicalize to lower case
$last{$1} = int ($2) ;
$first{$1} = int ($3) ;
$readers{$1} = 0 ;
$nactive++ ;
}
}
if ($nactive <= 0) { die "No active newsgroups" ; }
if (open (arbLast, $ARBLAST)) {
while ( <arbLast> ) {
if ( ($groupname,$limit) = /(\S*)\s([0-9]*)/ ) {
$first{$groupname} = int($limit) + 1 ;
}
}
close (arbLast) ;
}
# ---------------------------------------------------------------------------
# Grovel through the password file, counting users and processing
# .newsrc files.
$nusers = 0 ; # Total number of users
$newsusers = 0 ; # Number of users who read news.
# Return information about the current user. This routine is entirely to
# provide getpwent() emulation for older versions of perl.
#
sub get_user_info {
# Use this line if your perl has getpwent(); otherwise comment it out
return getpwent() ;
# Un-comment the following lines if your perl does not have getpwent().
# local ($name,$passwd,$uid,$gid,$gcos,$homedir,$shell) ;
#
#
# if ( ! $PasswdIsOpen) {
# open (passwd, 'ypcat passwd |') || open (passwd, $PASSWD) ;
# $PasswdIsOpen = 1 ;
# }
#
# return () if eof (passwd) ;
#
# ($name,$passwd,$uid,$gid,$gcos,$homedir,$shell)
# = (<passwd> =~ /(.*):(.*):(.*):(.*):(.*):(.*):(.*)\n/ ) ;
#
# return ($name,$passwd,$uid,$gid,'','',$gcos,$homedir,$shell) ;
}
# Grovel through the user list, eliminating those which do not appear
# to be `real users'. If a `real user' has a .newsrc file, process it.
while (( @UserInfo = do get_user_info()) && ($#UserInfo > 0))
{
if (do good_user (@UserInfo))
{
$nusers++ ; # Increment the count of valid users.
if (open (newsrc, $UserInfo[7] . "/.newsrc")) {
local ($groups_read, $group_name, $last_read) ;
$groups_read = 0 ;
while ( <newsrc> ) {
if ( /(\S*)[:!].*[,-]([0-9]+)$/ ) {
($group_name = $1) =~ tr/A-Z/a-z/ ; # Make lower case
$last_read = int ($2) ;
if ( ($last_read >= $first{$group_name}) &&
($last_read <= $last{$group_name}) )
{
$groups_read++ ;
$readers{$group_name}++ ;
}
}
}
$newsusers++ if ($groups_read) ;
}
}
}
# Now build the report
open (Report, "$destination") || die "Could not open mailer" ;
$noSave || open (arbLast, ">$ARBLAST") || die "Could not open arb.last" ;
print Report "Host\t\t$hostname\n" ;
print Report "Users\t\t$nusers\n" ;
print Report "NetReaders\t$newsusers\n" ;
print Report "ReportDate\t$dat\n" ;
print Report "SystemType\tnews-arbitron-$VERSION\n" ;
if ($SORT_OUTPUT) {
sub by_readership {
local ($val) ;
$val = $readers{$b} - $readers{$a} ;
return ( ($val != 0) ? $val : (($b le $a) ? -1 : 1) ) ;
}
@allgroups = keys (readers) ;
foreach $groupname (sort by_readership @allgroups) {
print Report $readers{$groupname} . " $groupname\n" ;
print arbLast "$groupname " . $last{$groupname} . "\n"
unless $noSave ;
}
} else {
while (($groupname, $Readers) = each (readers)) {
print Report "$Readers $groupname\n" ;
print arbLast "$groupname " . $last{$groupname} . "\n"
unless $noSave ;
}
}
exit ;
Copyright (c) 1989 PM Lashley under the terms of the GNU General Public License
PMLashley ...{sun | megatest | sts | zygot}!cohesive!kla!pat
<<< I haven't lost my mind. It's backed up on tape somewhere... >>>
More information about the Alt.sources
mailing list