what should egrep '|root' /etc/passwd print?

Mon Sep 19 13:55:58 AEST 1988

In article <8209 at alice.UUCP> andrew at alice.UUCP (Andrew Hume) writes:
>it sounds appealing to allow a missing RE to mean the empty string
>but i am unconvinced as to its utility.

If x, y, and z are regular expressions, then xyz matches those strings
which can be formed by concatenating any three strings X, Y, and Z
where x matches X, y matches Y, and z matches Z.  The expression 'x|y'
matches any string that is matched by x or y.

So, suppose y=''.  Let x='aa' and z='bb'.  Then xyz='aabb'.  'aa' is
the only string x matches, and 'bb' is the only string z matches,
'aabb' is the only string xyz matches.  The only thing left for y to
match is the null string between 'aa' and 'bb'.  Therefore, the null
string matches the null string.

Let x='' and y='root', so that x|y = '|root'.  Then x|y matches the null
string (because it matches x) and the string 'root' (because it matches
y).  So the egrep command in the subject line should print out all of
/etc/passwd, since every line has the null string on it.

This is intuitively obvious to me, but I tried to prove it because I'm
not sure other people's intuitions are similar to mine.

As for utility, consider the case, which I have actually run into,
where I wanted an expression like 'aa(|bb)cc' to match the strings
'aacc' and 'aabbcc'.  In this case, it's clear I want the expression
in parentheses to match the null string.  The program I was using
wouldn't let me do this, and I had to use something like 'a(a|abb)cc'
to get what I wanted.  If I had had a program generate that expression,
I would have had to add code to detect this special case and rewrite
the regular expression.  Yecch.

-- 
David Canzi