My SPOOLNEWS patch for NNTP
David Herron -- One of the vertebrae
david at ms.uky.edu
Tue Aug 9 13:00:13 AEST 1988
As Matt pointed out before I may lose big time by installing a patch
like the one I'd made. He was concerned that I'd have a lot of
articles arriving from many different directions at the same time and
that since they weren't being inserted immediately upon arrival that
I'd have 2 or 3 copies arrive and have 1 or 2 junked almost
immediately.
So I decided to give a look-see and the early results are discouraging.
I used to have my background scripts happen every 15 minutes. I later
moved it up to 10 minutes and later to 5. Each time because I was seeing
some duplicate articles in SPOOLDIR/.rnews. (I have one script for
unbatching news and another for batching, with locking to ensure that only
one is running at a time, and the two scripts are offset from each other
by a couple of minutes).
To examine the logs I ran the following script:
awk <log '$5 == "Duplicate" {print $7}' | sort -u
which gives me a list of message-id's which are duplicated. A different
version has just "print" and "sort +6 -7" instead, which gives me all
the lines about duplicate articles from the log file sorted by message
id. I haven't taken the output of the above command to it's logical
conclusion yet:
while read id; do
grep $id log
done
to give me all the lines talking about the message id's which had been
duplicated.
To make the rest of this story short ...
I saw what seems like a lot of articles arrive here >2 times (A total
of 515 message id's were duplicated, with 653 total duplicates, on
the traffic from about 4 am til around midnight; I don't know off
hand how many total articles arrived in that time). Of the 653
number 188 were from psuvm.bitnet, a neighbor known to give us lots
of duplicates and which I'll have to deal with sometime Real Soon Now.
I don't know how many of the duplicates *only* came through psuvm.
A quick reading of the log file showed that we were often getting two
copies of the same article within 2-3 minutes of each other and less
often getting 3 and even less often getting 4. That is, when we had
duplicates that is what was happening. Since I don't know what the
total traffic for that time period was, I don't know how significant
those numbers are. -- A look at some slightly old reports (June)
shows ~25000 per week, or ~3500 articles per day. If that's accurate
then we're having 10-20 a percent rejection rate. Which maybe isn't
so bad after all.
How does this stack up against other peoples experiences?
--
<---- David Herron -- The E-Mail guy <david at ms.uky.edu>
<---- ska: David le casse\*' {rutgers,uunet}!ukma!david, david at UKMA.BITNET
<----
<---- Looking forward to a particularly blatant, talkative and period bikini ...
More information about the Comp.sources.bugs
mailing list