LEX behaviour when given "large" au
carroll at snail.CS.UIUC.EDU
carroll at snail.CS.UIUC.EDU
Thu Mar 24 02:32:00 AEST 1988
/* Written 11:30 pm Mar 19, 1988 by chris at mimsy.UUCP in snail:comp.lang.c */
Several people suggested replacing lex description such as
if return (IF);
then return (THEN);
else return (ELSE);
{id} { symenter(yytext); return (ID); }
with
{id} { /* first try it as a keyword */
k = keyword_index(yytext); if (k >= 0) return (k);
/* not a keyword, must be an identifier */
symenter(yytext); return (ID); }
This may (probably does) help in lex, but it is clearly the wrong way
around in almost all cases. The lexer has to traverse a DFA to decide
that something is an ID. While it is at it it could easily tell
whether it is a keyword instead. Obviously this makes the DFA table
larger, but the result should run faster.
( ... )
/* End of text from snail:comp.lang.c */
I disagree. For a compiler class, we had to write a lex file, and
the official way was to use the
IF {return T_IF }
etc. thing. It was big and slow. I changed mine to use the single rule for
identifier, and look it up in a table, and I got a speed up of roughly 4
in lex/compile time, and 40% reduction in compiled object size. I think that
the point is that the DFA that decides if something is a token or not is
so much simpler than one that has to check for all of the keywords, that
the gain is positive from removing the keyword cases. In addition, I would
think that for most cases, you have to find non-builtin "keywords" (e.g,
variable names, etc.) as you go, and so have to check for something
that is not a keyword but is still an identifier anyway.
Alan M. Carroll amc at woodshop.cs.uiuc.edu carroll at s.cs.uiuc.edu
...{ihnp4,convex}!uiucdcs!woodshop!amc
Quote of the day :
"Touch my soul, catch the very light
Hide the moment, from my eager eyes"
More information about the Comp.lang.c
mailing list