yacc and lex bugs
R. Curtis Jackson
rcj at burl.UUCP
Fri Apr 20 04:15:36 AEST 1984
FIRST OFF -- AN APOLOGY: I have been informed that the Unix Hotline
folks processed my MR on yacc(1) promptly, and that after
sitting in Murray Hill for a year now it is considered "Under
Investigation" and the status is "We'll postpone judgement until
a later date". The Hotline people did their job admirably, and I
am sorry I blasted them without having the MR checked first.
1) yacc
a) Problem (history):
In the 'good old days' (V6), yacc would not tell you in its
debug output that it had found 'token ADDOP'; it would tell
you that it had found 'token 426'; it was up to you to find
out (via using the -d option and looking at y.tab.h) what
token 426 really was. So it was beneficial to define your
own token numbers rather than letting yacc default them;
that way they were in your source file for easy access.
Even today, if you have one lexical analyzer feeding two or
more parsers with the same tokens, you want to make sure
that the token numbers are the same in both parsers, so this
feature of yacc (being able to define your own token numbers)
is still quite valid and useful.
b) Problem:
yacc uses tables of ints to transition from state to state, and
it uses negative numbers based on the negative of the token number
and on ( -(the_next_desirable_state) - 1000 ). In other words,
if you are to transition to state 53, the number in the table will
be -1053. [ I am about 90% sure this is accurate -- regardless
I do know the problem is related to this ]. If you use token
numbers > 1000, then yacc will run perfectly, generate proper
y.output if you use the -v option, but when y.tab.c is compiled
and executed, the results are totally unpredictable. yacc will
transition to wildly inappropriate states and start generating
'Syntax error's at a phenomenal rate.
c) Cure:
Let yacc default its token numbers unless you absolutely cannot
get around it. If you really need that feature, don't use token
numbers over 1000. NOTE: remember to start your token numbers
above the ascii code, or yacc will think that your ADDOP, to which
you have assigned a token number of 040, is a space, and
vice-versa. If you have to use token numbers *AND* you have so
many tokens that you are running over 1000, then wade through the
yacc code and find the define for that number and increase it.
(An extremely improbable situation)
2) lex
a) Problem:
lex has an input character buffer called yysbuf that is
dimensioned to YYLMAX, defined to be 200. Unfortunately, the
routine that reads the input file [ yylook() ] does not, as
far as I can tell, check to make sure that it has not gathered
into yysbuf (or yytext, which is also dimensioned to YYLMAX)
more than YYLMAX characters. If it is matching a pattern that
is more than YYLMAX characters, it writes them right past the
end of yysbuf and on into 'The Memory Zone', usually producing
Memory Faults or Bus Errors somewhere down the line.
b) Cure:
If you get a Memory Fault or Bus Error, and cannot seem to
locate it, put the following lines into the declarations
section of your lex program:
%{
blah;
blah;
blah;
# undef YYLMAX
# define YYLMAX 5000 /* or some other ridiculously large number */
blah;
blah;
%}
This will override lex's YYLMAX define (see the lex(1)
documentation concerning overriding lex's input() macro and also
look at the first 15 lines of any lex.yy.c for details).
If your Memory Fault/Bus Error goes away, then either:
1) Your pattern specs for lex are out of line -- you are not
matching what you think you are matching -- check for rules
containing things like [^x], where x is some character. Remember
that rules like these match ANY character but x, including
newlines.
2) Your pattern specs are OK, but you are simply trying to match
more than 200 characters. Use the above method to define YYLMAX
to a reasonable number for your application and go on.
Hope this helps some people, please direct any questions/comments to
me at the address below,
--
The MAD Programmer -- 919-228-3313 (Cornet 291)
alias: Curtis Jackson ...![ ihnp4 ulysses cbosgd clyde ]!burl!rcj
More information about the Net.bugs.usg
mailing list