Eliminating Duplicate Mail Headers

Tue May 7 05:35:38 AEST 1991

>I'm not able to fix the mailer myself, but can pass its output
>through standard filters--awk, sed, etc.--before it goes
>out the door.  My first thought was to pass things through 'uniq',
>but this would also delete consecutive identical lines in the body (the
>mailer doesn't distinguish between header and body).  The probability
>of consecutive, identical lines in the body of mail messages seems
>low, but not low enough to chance this.

Since I haven't seen a non-perl solution that works yet, here's mine.
Actually I have two (don't ask me why). The second is more robust and
handles all examples in the test file.

============ Start test file =======================

This is the first line
First continued
	line
Another continued
	line
Another continued
	line with extras
A repeated line
A repeated line
A repeated line
	with continuation
A repeated line
	with continuation
One more line

Body of message
Body of message
More lines

2nd paragraph
Body of message
Body of message
More lines
============ End test file =======================

============ Start 1st solution file =======================
#!/bin/awk -f

# assumes first line is not blank (doesn't modify header if it is)
# assumes continuation lines do not make a "line" unique, i.e.
#     A line followed by
#         a continuation line
# is a "duplicate" of:
#     A line followed by
#         a different continuation line

BEGIN{cont = "	"}	# tab is continuation character

/^$/,//{		# /<carret><dollar>/,/<CTRL-D>/{
    print $0;
    next}

substr($0,1,1) == cont {	# don't print continuation line if first
    if (!del) {print $0}	# part of line was a repeat
    next}

prev == $0 {	# this and any continuation is repeat
    del = 1;
    next}

{		# print line since not repeat
    del = 0;
    print $0;
    prev = $0}
============ End 1st solution file =======================

============ Start 2st solution file =======================
#!/bin/awk -f

# skips blank lines at start of file (can be printed)
# compares continuation lines

BEGIN{contflg = "	"}	# tab is continuation character

{if (!fndhdr){		# handle blank lines before header
    if ($0 == ""){
#        print $0;	# print blank lines before header
        next}
    else{
        fndhdr = 1}}}

/^$/,//{		# /<carret><dollar>/,/<CTRL-D>/{
    print $0;
    next}

substr($0,1,1) == contflg {
    if (nm != 0 && nm < np && prev[nm+1] == $0){ # still seams to be repeat
        nm++}
    else{		# line is not a repeat
        if (nm == 0){	# we already knew was not repeat
            np++}
        else{
            for (i=1; i<=nm; i++) # print what we thought was a repeat
                print prev[nm];
            np = nm + 1;
            nm = 0}
        print $0;
        prev[np] = $0}	# keep track of continuation lines
    next}

prev[1] == $0 {	# assume line is repeat
    nm = 1;
    next}

{		# print line since not repeat
    nm = 0;
    print $0;
    np = 1;
    prev[np] = $0}
============ End 2st solution file =======================