Perl version of from (Was: Re: from.sed (v1.2))

Johan Vromans jv at mh.nl
Thu Jan 4 08:59:19 AEST 1990


Note: I have redirected follow-ups to comp.lang.perl.

In article <JV.89Dec21221143 at mhres.mh.nl> jv at mh.nl (Johan Vromans) writes:

| In article <1989Dec20.222732.5633 at trigraph.uucp> john at trigraph.uucp (John Chew) writes:
|    Here's a new version of from.sed, my sed script that does the job
|    of from(1) better and faster.  It now truncates long subjects,
|    correctly handles messages without subjects and From lines with %
|    or @foo: routing.
| 
|    Yes, I tried writing this in Perl.  I'm not an expert Perl programmer,
|    but I couldn't get it to run faster than about 70% slower than sed.

To which I replied:

| I've been using a perl version of 'from' for a long time, so I trow it
| in. [...]

| It runs about as fast as the sed version. Typical times for a large
| mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl.

Script fragment:

  while ( $line = <> ) {
    chop ($line);
    # scan until "From_" header found
    next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

I was pointed out by John J. Chew <poslfit at gpu.UTCS.UToronto.CA> that
tightening the search for "From " would speed up the program by 30%.
He suggested:

  while ( <> ) {
    next unless /^From /;
    chop ($line);
    next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

Well, I tried it, and -NOT to my surprise- I found out that the major
speedup is caused by leaving out the assignment to the variable $line
and postponing the chop. I couldn't imagine (knowing how Larry likes
optimisation) that

  next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

would take more time to fail than

  next unless /^From /;

With the speedups, the perl script beats the sed script on both large
and small mailboxes:

~ > wc -lc INBOX
    163   6927 INBOX
~ > dotime 5 perl src/perl.pl INBOX
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real   0.2     0.4   0.2   0.2   0.2   0.2
user   0.0     0.0   0.0   0.0   0.0   0.0
sys    0.1     0.1   0.1   0.1   0.1   0.1
~ > dotime 5 sed -f from.sed INBOX
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real   0.5     0.7   0.4   0.5   0.4   0.4
user   0.1     0.1   0.1   0.1   0.1   0.1
sys    0.2     0.2   0.2   0.3   0.2   0.2
~ > wc -lc maildir/pax
  46585 1240000 maildir/pax
~ > dotime 5 perl src/from.pl maildir/pax
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real  21.9    21.9  20.3  21.1  25.7  20.7
user  14.0    14.4  14.3  14.1  13.7  13.6
sys    5.9     5.8   4.9   5.7   7.4   5.9
~ > dotime 5 sed -f from.sed maildir/pax
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real  23.1    23.4  22.7  22.9  23.1  23.5
user  14.8    14.8  14.9  14.8  14.3  15.2
sys    7.4     7.4   7.1   7.3   7.8   7.2

I have posted the "dotime" program to alt.sources, for whoever thinks
she/he can use it.

Have fun!

Johan
--
Johan Vromans				       jv at mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------



More information about the Alt.sources mailing list