Perl version of from (Was: Re: from.sed (v1.2))
Johan Vromans
jv at mh.nl
Thu Jan 4 08:59:19 AEST 1990
Note: I have redirected follow-ups to comp.lang.perl.
In article <JV.89Dec21221143 at mhres.mh.nl> jv at mh.nl (Johan Vromans) writes:
| In article <1989Dec20.222732.5633 at trigraph.uucp> john at trigraph.uucp (John Chew) writes:
| Here's a new version of from.sed, my sed script that does the job
| of from(1) better and faster. It now truncates long subjects,
| correctly handles messages without subjects and From lines with %
| or @foo: routing.
|
| Yes, I tried writing this in Perl. I'm not an expert Perl programmer,
| but I couldn't get it to run faster than about 70% slower than sed.
To which I replied:
| I've been using a perl version of 'from' for a long time, so I trow it
| in. [...]
| It runs about as fast as the sed version. Typical times for a large
| mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl.
Script fragment:
while ( $line = <> ) {
chop ($line);
# scan until "From_" header found
next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
I was pointed out by John J. Chew <poslfit at gpu.UTCS.UToronto.CA> that
tightening the search for "From " would speed up the program by 30%.
He suggested:
while ( <> ) {
next unless /^From /;
chop ($line);
next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
Well, I tried it, and -NOT to my surprise- I found out that the major
speedup is caused by leaving out the assignment to the variable $line
and postponing the chop. I couldn't imagine (knowing how Larry likes
optimisation) that
next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
would take more time to fail than
next unless /^From /;
With the speedups, the perl script beats the sed script on both large
and small mailboxes:
~ > wc -lc INBOX
163 6927 INBOX
~ > dotime 5 perl src/perl.pl INBOX
Avg Pass 1 2 3 4 5
----- ------- ----- ----- ----- -----
real 0.2 0.4 0.2 0.2 0.2 0.2
user 0.0 0.0 0.0 0.0 0.0 0.0
sys 0.1 0.1 0.1 0.1 0.1 0.1
~ > dotime 5 sed -f from.sed INBOX
Avg Pass 1 2 3 4 5
----- ------- ----- ----- ----- -----
real 0.5 0.7 0.4 0.5 0.4 0.4
user 0.1 0.1 0.1 0.1 0.1 0.1
sys 0.2 0.2 0.2 0.3 0.2 0.2
~ > wc -lc maildir/pax
46585 1240000 maildir/pax
~ > dotime 5 perl src/from.pl maildir/pax
Avg Pass 1 2 3 4 5
----- ------- ----- ----- ----- -----
real 21.9 21.9 20.3 21.1 25.7 20.7
user 14.0 14.4 14.3 14.1 13.7 13.6
sys 5.9 5.8 4.9 5.7 7.4 5.9
~ > dotime 5 sed -f from.sed maildir/pax
Avg Pass 1 2 3 4 5
----- ------- ----- ----- ----- -----
real 23.1 23.4 22.7 22.9 23.1 23.5
user 14.8 14.8 14.9 14.8 14.3 15.2
sys 7.4 7.4 7.1 7.3 7.8 7.2
I have posted the "dotime" program to alt.sources, for whoever thinks
she/he can use it.
Have fun!
Johan
--
Johan Vromans jv at mh.nl via internet backbones
Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------
More information about the Alt.sources
mailing list