Duplicate articles
William E. Davidsen Jr
davidsen at steinmetz.steinmetz.UUCP
Fri Mar 18 01:23:50 AEST 1988
At one time we got a batch of duplicate articles in some groups. I wrote
this little script to locate the articles, and optionally to prepare a
file of rm commands which could be fed to shell. There was a reason not
to remove them on the fly, but I don't remember it.
I hope no one else has this problem and this posting is totally useless
(but I doubt that it's true).
:
#
# finddup - find duplicate entries in news
#
# enter the group name as a series of arguments, a list of dups
# will be output. Optionally a list of rm commands may be written
# to a file for execution.
#
# Example or find only:
# finddup.sh comp arch
#
# Example of find and delete:
# finddup.sh @r delfile comp arch
# sh delfile
#
# @(#)finddup.sh v1.3, by bill davidsen, modified 1/19/88
# this code tests if the first argument is "@r". If so the next
# argument is taken as the name of an output file for the remove
# commands.
if [ "$1" = "@r" ]
then
# convert to absolute pathname
case "$2" in
/.*) # absolute pathname
rfile="$2";;
*) # relative pathname
rfile=`pwd`
rfile=$rfile/$2;;
esac
shift; shift
else
rfile=""
fi
# build the directory name
dir=$NEWS
i=1
while [ $i -le $# ]
do
eval dir=$dir/\$$i
if [ $i -eq 1 ]
then
ngname=$1
else
eval ngname=$ngname.\$$i
fi
i=`expr $i + 1`
done
# change to the directory
if [ -d $dir ]
then
cd $dir
echo "Scanning newsgroup $ngname"
else
echo "$ngname - no such group"
exit 1;
fi
# are we building a remove list?
if [ -n "$rfile" ]
then
echo "Building a remove list in $rfile"
fi
# build the topic list
for n in [1-9]*
do
# see if any files found
if [ ! -f $n ]
then
echo "No files in $ngname"
exit 0;
fi
# scan for message id
sed -n "
/^Message-ID:/{
s//$n:/
p
q
}
" $n
done |
sort -t: +1 |
awk '
BEGIN {
indup = 0;
oldmid = "";
FS = ":";
}
{
if ($2 == oldmid) {
printf("Msg %d duplicates %d\n", $1, oldmnum);
if (rfile != "") {
printf("rm %s/%d\n", dir, $1) > rfile
}
}
else {
oldmid = $2;
oldmnum = $1+0;
}
}' rfile=$rfile dir=$dir -
--
bill davidsen (wedu at ge-crd.arpa)
{uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me
More information about the Comp.unix.microport
mailing list