Lex and initial start conditions

Jeff Barber jeff at samna.UUCP
Sat Jun 2 01:58:02 AEST 1990


In article <1990May30.174745.1161 at csrd.uiuc.edu> pommu at iis.ethz.ch (Claude Pommerell) writes:
>However, if you put such an insertion text after "%%" (in the rules
>section of your
>Lex source), it gets inserted at the start of the body of the function
>that performs
>the lexical analysis, so you can use it to specify an initial condition.

That's okay for this particular situation.  But it won't work if
your lex program is a lexical analyzer in a larger program.  Your
placement of the "BEGIN start-symbol;" after the first %% causes
it to be included at the beginning of the yylex() function.

This means that every time you call the lexical analyzer for a 
new token, its state gets reset.

If your actions are designed to return a token to a parser (a yacc
program, for example), they'll contain statements like:
	return TOK_IDENTIFIER;

So, a better general purpose solution is to define some function 
after the *second* %% which contains the BEGIN statement and is
called to initialize the analyzer.

In your case, we can just create a main() function with the
BEGIN in it (You've also got some unnecessary states in
here, so I've simplified a bit):

--------------------Cut Here----------------------------
%{
/* context in recursive C-like comments */
static int commentLevel = 0;
%}
/* Starting conditions to support recursive C-like comments */
%START  Text InCCom
%%
\/\*		{ ++commentLevel; BEGIN InCCom; }
<InCCom>\*\/	{ if (--commentLevel == 0) BEGIN Text; }
<Text>\*\/	{ printf("Syntax error\n"); exit(1); }
<InCCom>.	|
<InCCom>\n	{ /* Ignore stuff inside of comments 
			everything else echoed by default. */ }
%%
main(ac, av)
char	**av;
{
	/* Set the initial condition */
	BEGIN Text;
	return yylex();
}
--------------------Cut Here----------------------------

One last thing, it is possible to utter the name of the
initial state ("INITIAL") so that if INITIAL were substituted
for Text, no state initialization would be necessary
(our main() function wouldn't be either; it would be supplied
by the lex library [ cc ... -ll ]).

(BTW, anybody know whether this is portable - I don't recall reading
about this INITIAL state in the documentation; I just noticed
it in the lex.yy.c output and discovered by experimentation
that lex recognizes it in a <INITIAL> rule).

I've directed followups out of comp.lang.c.

Jeff



More information about the Comp.lang.c mailing list