LEX behaviour when given "large" automata.

Sat Mar 19 02:45:26 AEST 1988

In comp.compilers (<911 at ima.ISC.COM>), phs at lifia.imag.fr (Philippe Schnoebelen) writes:
>   I'm having some problems with LEX. When my number of keywords/regexps is
>growing, the lexical analyzer  begins to  give strange,  unexpected, (let's
>face  it, wrong) results.

Lex ain't robust.  As a work-around, you can get real big savings in
all of
	1.  creation time
	2.  running time
	3.  exectuable size
by going from one pattern/keyword, to a general pattern for identifiers,
and doing a table lookup.  That is, don't do this:
	for	return(FOR);
	if	return(IF);
	foo	return(FOO);
	[a-z]+	return(munchonit(yytext));

Do this:
	table[] = { { "for", FOR }, { "if", IF }, { NULL } };

	[a-z]+ 	{
		for (p = table; p->name; p++)
		    if (strcmp(p->name, yytext) == 0)
			return(p->value);
		return(munchonit(yytext));
	    }

(I left out all sort of declarations and optimizations on the search
loop.)

This is a real fun thing to do:  how often do you get to win on both sides of
the time-space tradeoff?
	/r$
[Similar suggestion from Eddie Wyatt edw at ius1.cs.cmu.edu]
[From Rich Salz <rsalz at BBN.COM>]
--
Send compilers articles to ima!compilers or, in a pinch, to Levine at YALE.EDU
Plausible paths are { ihnp4 | decvax | cbosgd | harvard | yale | bbn}!ima
Please send responses to the originator of the message -- I cannot forward
mail accidentally sent back to compilers.  Meta-mail to ima!compilers-request