lex grammer for C comments
Michael Gwilliam
michael at nyit.UUCP
Tue Apr 5 02:49:45 AEST 1988
NOTE: Sorry this reply took so long, but our phone line was out for a long
time.
-----
Well the information is back and I've summerized the replies. In case
you forgot the question it is, "Can C comments be filtered out with
LEX as regular expressions?"
The answer is, "Yes, but it may not be a good idea."
The reasons are...
o It's nearly impossible to read.
o An extended comment could over flow the buffer.
The correct way of doing this seems to be:
You could use states, something like this (I might have the syntax
a bit wrong):
"/*" { BEGIN comment; }
<COMMENT>. ;
<COMMENT>"*/" { BEGIN 0; }
The problem is that this requires you to set up states for everything,
which is a pain.
Here's what I did -- built my own little automata inside the action
for the "/*" pattern. This is stripped out of working code.
"/*" {
/* Comment. */
register enum { S_STAR, S_NORMAL, S_END } S;
for (S = S_NORMAL; S != S_END; )
switch (input()) {
case '\0':
/* Complain about premature EOF? */
S = S_END;
break;
case '*':
S = S_STAR;
break;
case '/':
if (S == S_STAR) {
S = S_END;
break;
}
/* FALLTHROUGH */
default:
S = S_NORMAL;
break;
}
}
(credit goes to rsalz)
Another method uses states.
%START Normal Comment
%%
{ BEGIN Normal; }
<Normal>"/*" { ECHO; BEGIN Comment; }
<Comment>"*/" { ECHO; printf("\n"); BEGIN Normal; }
<Comment>\ |
<Comment>[^ \t\n*]+ |
<Comment>"*"/[^/] |
<Comment>. |
<Comment>\n { ECHO; }
<Normal>. |
<Normal>\n { }
(credit goes to Tony Hansen)
If you're hard set on doing this, a good reference seems to be...
_Introduction_to_Compiler_Construction_with_Unix_, by Axel T. Schreiner and
H. George Friedman, Jr., Prentice-Hall, 1985, on page 25 gives:
"/*""/"*([^*/]|[^*]"/"|"*"[^/])*"*"*"*/".
The reason that the expression I used was accepting nexted comments
is that lex tries to match the largest case.
Nested comments are not regular expression so they are hopeless without
writting a little C code. I never really wanted to do them anyway, I guess
I just didn't make myself clear. (Besides, I'm told they're not ANSI.)
Thanks for all the help from...
Erik Baalbergen <mcvax!cs.vu.nl!erikb at uunet>
Kjell Post <cmcl2!ida.liu.se!kpo>
MH Cox <rutgers!garage.nj.att.com!mhc at gatech>
R. Nigel Horspool <rutgers!uw-beaver!uvicctr!nigelh at gatech>
cmcl2!gondor!psuvax1!gondor!schmidt at uiucdcs (David E. Schmidt)
cmcl2!harvard!pineapple.bbn.com!rsalz
harvard!gsg!gsgpyr!lew at linus (Paul Lew)
harvard!ll-xn!ames!sdcsvax!sdcc6.UCSD.EDU!ix426 at linus (Tom Stockfisch)
sbcs!mmintl!franka at pwa-b
sbcs!pegasus!hansen at cbosgd
and I hope to goodness I gave proper credit to everyone.
michael
More information about the Comp.lang.c
mailing list