Eliminating Duplicate Mail Headers
Gary Weimer 253-7796
weimer at garden.ssd.kodak.com
Tue May 7 05:35:38 AEST 1991
>I'm not able to fix the mailer myself, but can pass its output
>through standard filters--awk, sed, etc.--before it goes
>out the door. My first thought was to pass things through 'uniq',
>but this would also delete consecutive identical lines in the body (the
>mailer doesn't distinguish between header and body). The probability
>of consecutive, identical lines in the body of mail messages seems
>low, but not low enough to chance this.
Since I haven't seen a non-perl solution that works yet, here's mine.
Actually I have two (don't ask me why). The second is more robust and
handles all examples in the test file.
============ Start test file =======================
This is the first line
First continued
line
Another continued
line
Another continued
line with extras
A repeated line
A repeated line
A repeated line
with continuation
A repeated line
with continuation
One more line
Body of message
Body of message
More lines
2nd paragraph
Body of message
Body of message
More lines
============ End test file =======================
============ Start 1st solution file =======================
#!/bin/awk -f
# assumes first line is not blank (doesn't modify header if it is)
# assumes continuation lines do not make a "line" unique, i.e.
# A line followed by
# a continuation line
# is a "duplicate" of:
# A line followed by
# a different continuation line
BEGIN{cont = " "} # tab is continuation character
/^$/,//{ # /<carret><dollar>/,/<CTRL-D>/{
print $0;
next}
substr($0,1,1) == cont { # don't print continuation line if first
if (!del) {print $0} # part of line was a repeat
next}
prev == $0 { # this and any continuation is repeat
del = 1;
next}
{ # print line since not repeat
del = 0;
print $0;
prev = $0}
============ End 1st solution file =======================
============ Start 2st solution file =======================
#!/bin/awk -f
# skips blank lines at start of file (can be printed)
# compares continuation lines
BEGIN{contflg = " "} # tab is continuation character
{if (!fndhdr){ # handle blank lines before header
if ($0 == ""){
# print $0; # print blank lines before header
next}
else{
fndhdr = 1}}}
/^$/,//{ # /<carret><dollar>/,/<CTRL-D>/{
print $0;
next}
substr($0,1,1) == contflg {
if (nm != 0 && nm < np && prev[nm+1] == $0){ # still seams to be repeat
nm++}
else{ # line is not a repeat
if (nm == 0){ # we already knew was not repeat
np++}
else{
for (i=1; i<=nm; i++) # print what we thought was a repeat
print prev[nm];
np = nm + 1;
nm = 0}
print $0;
prev[np] = $0} # keep track of continuation lines
next}
prev[1] == $0 { # assume line is repeat
nm = 1;
next}
{ # print line since not repeat
nm = 0;
print $0;
np = 1;
prev[np] = $0}
============ End 2st solution file =======================
More information about the Comp.unix.questions
mailing list