Pattern matching with awk
Michael Nolan
nolan at tssi.UUCP
Tue Mar 5 03:56:54 AEST 1991
lin at CS.WMICH.EDU (Lite Lin) writes:
> This is a simple question, but I don't see it in "Freqently Asked
>Questions", so...
> I'm trying to identify all the email addresses in email messages, i.e.,
>patterns with the format user at node. Now I can use grep/sed/awk to find
>those lines containing user at node, but I can't figure out from the manual
>how or whether I can have access to the matching pattern (it can be
>anywhere in the line, and it doesn't have to be surrounded by spaces,
>i.e., it's not necessarily a separate "field" in awk).
If you have nawk or gawk, use the match function, which sets two variables:
RSTART - the first position in the string matched by the pattern.
RLENGTH - the length of the string matching the pattern
A pattern to match any single mail address might be rather ugly, though.
If you assume all the following:
1. Upper case and lower case letters are permitted
2. Dash, underscore, and period are permitted
3. There is only one @ [I'm not sure this assumption is valid, though!]
4. There may be several ! or % in the 'user' portion
5. No commas or spaces
Then that gives a pattern something like this
[a-zA-Z0-9.\-_%!]+@[a-zA-Z0-9.\-_]+
I've escaped the dash, I suppose it might be necessary to escape other
characters as well. Have I left anything out that might occur in strange
but otherwise valid mail addresses?
------------------------------------------------------------------------------
Michael Nolan "Software means never having
Tailored Software Services, Inc. to say you're finished."
Lincoln, Nebraska (402) 423-1490 --J. D. Hildebrand in UNIX REVIEW
UUCP: tssi!nolan (or try sparky!dsndata!tssi!nolan)
Internet: nolan at helios.unl.edu (if you can't get the other address to work)
More information about the Comp.unix.questions
mailing list