POSIX Regular Expression Funnyness
Geoff Clare
gwc at root.co.uk
Wed Feb 1 21:02:22 AEST 1989
In article <4118f7b1.ae48 at apollo.COM> arnold at apollo.COM (Ken Arnold) writes:
>The POSIX proposal [] has a rework of regular expressions.
>(stuff deleted)
>
>They have added a new set of bracket expressions which stand for
>pre-defined sets of characters. For example, "[:alpha:]" is all
>alphabetic characters, "[.ch.]" is the character string ch treated as a
>single character (which is useful for sorting in many languages), and
>"[=a=]" refers to all variants of a, i.e., a, a with a circumflex, a
>with an umlaut, etc.
>
>(stuff deleted)... these new bracket expressions only have their new
>meaning inside outer brackets.
>
>Why? The only existing expressions you would break if you allowed "top
>level" [::] expressions (or [..] or [==] expressions) would be
>expressions which currently existed that contained *two* colons (or
>dots or equals), on either side. Since this is currently pointless
>redundancy, I can't believe this is a serious problem.
There are more serious problems with the new expressions than just the
obscure syntax. A short while ago I had to design some verification
tests for these new regular expressions as part of the X/Open verification
suite (the latest X/Open standard incorporates POSIX). I found some
ambiguity in the area of 2 to 1 character mappings. For example, if ch
collates between c and d, which of the following REs should match the
string "xchy"?
x[a-[.ch.]]y
x[a-[.ch.]]hy
The simple answer would be to create some rule about 2 to 1 character
mappings to eliminate the ambiguity. However, whichever rule is
decided, there will be many cases where the actual behaviour is
non-intuitive, resulting in users not getting the results they expect.
We have informed X/Open of the problem, and are waiting to see what they
come up with.
Geoff.
--
Geoff Clare UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc at root.co.uk ...!mcvax!ukc!root44!gwc +44-1-606-7799 FAX: +44-1-726-2750
More information about the Comp.unix.wizards
mailing list