Comment recognition in Lex, again
Mark Plotnick
mp at whuxle.UUCP
Sun May 6 03:54:38 AEST 1984
The problem with
"/*"([^*]|("*"/[^/]))*"*/"
is that the right context handling in lex in nested regular expressions
is a little nonintuitive. After lex recognizes the complete
expression, it backs up one character because of the ``/[^/]''
expression.
In case you still don't see the problem, run this lex program:
a(b/c)c { printf("I saw this: %s\n", yytext); }
. { printf("char: '%c'\n", yytext[0]);
The first rule will NOT match ``abc'', but it will match ``abcc'',
sort of. It prints out ``I saw this: ab''.
To be safe, only use right context at the very end of your regular
expression.
Yet Another Way To Recognize Comments:
I really don't enjoy beating my head against a wall playing
with regular expressions and starting conditions. When we
had to write a compiler a couple of years ago (any other
AM295 survivors out there?), we did something like:
"/*" {
#define LEXEOF 0
int c, last_c='\0';
while ((c=input()) != LEXEOF) {
if (last_c == '*' && c=='/')
break;
else
last_c=c;
}
printf("comment seen\n");
if (c == LEXEOF)
printf("EOF within comment\n");
}
Moving some of the effort into the action routine allows you to easily
add more context-dependent features, such as printing a warning message
if there's a ';' within the comment, supporting nested comments, etc.
Mark Plotnick
More information about the Comp.unix.wizards
mailing list