Isolating alphanumeric words with regular expressions
Jim Campbell
campbell at lotus.com
Tue Sep 4 06:13:29 AEST 1990
I have an editing script which seeks to append ".o" to all words in an input
line. If, however, a word already contains a ".", I do not wish it to append
a ".o".
I have been a UNIX enthusiast for several years now, but for the life of me,
I can't figure out how to solve what seems to be a simple problem.
Here is what I have tried:
s/\([^.A-Za-z0-9_]*\)\([^. ][^. ]*\)/\1\2.o/g
This doesn't do it, since if the input line looks like this:
abc bar foo.obj fooie baby
the regular expression will fail to match the entire word "foo.obj", but
will match "foo" and "obj" separately, yielding this:
abc.o bar.o foo.o.obj.o fooie.o baby.o
If you do this:
s/\([^.A-Za-z0-9_]*\)\([^. ][^. ]*\)\([^.]*\)/\1\2.o\3/g
the third expression grouped in the "\(...\)" operators swallows the next
space in some instances, leaving you with the .o on every other word, like
this:
abc.o bar foo.obj.o fooie baby
I have spent a lot of time on this one little problem, and I am wondering if
anyone out there knows of a solution.
(Yes -- I know it can be solved with two substitution operations, but I
am looking for a way to do it with one.)
--
Jim Campbell, Lotus Development Corporation | harvard!ima \
1 Rogers St., Cambridge, MA 02142 | ihnp4 >!lotus!campbell
617/693-5652 | uunet /
More information about the Comp.unix.questions
mailing list