Comment recognition in Lex, again
Chris Miller
chris at hwcs.UUCP
Fri May 18 20:13:45 AEST 1984
The following is a fully general comment recogniser for /* ... */
comments in 'lex' - I have used definitions to make it a little more
readable (I just can't cope with things like ("*"[^*]*)!).
It should be pointed out that I don't believe that this is the RIGHT
way to handle comments unless it is essential to retain their text;
comments can be very long, and trying to match them with 'lex' can
easily overflow buffers. I prefer solutions which match the opening
/* and then throw away the rest of the comment in the action routine,
using a bit of ad hoccery.
____________________________________________________________________
STAR "*"
SLASH "/"
CSTART ({SLASH}{STAR})
CUNIT ([^*]|{STAR}+[^/*])
CBODY ({CUNIT}*)
CEND ({STAR}+{SLASH})
COMMENT ({CSTART}{CBODY}{CEND})
%%
{COMMENT} printf("COMMENT '%s'\n", yytext);
%%
yywrap()
{
exit(0);
}
main()
{
for (;;)
yylex();
}
____________________________________________________________________
One problem with the original non-working version is that it fails for
comments terminated by an EVEN number of asterisks and a /. This seems
to be a common bug in distributed compilers, etc, even when they don't use
'lex' for token generation. I have encountered this bug in several C
compilers and their corresponding lints (of course, since lint usually uses
cpp), and also in the original distribution of CProlog - you may find it
entertaining to try out
/** This is a legal comment **/
on any language systems which OUGHT to accept it. The fix is almost always
trivial - the problem comes from reading the character following an asterisk
without subsequently putting it back in the input if it happens to be
another asterisk.
Chris Miller
Heriot-Watt Computer Science
...ukc!edcaad!hwcs
More information about the Comp.unix.wizards
mailing list