LEX
Richard A. O'Keefe
ok at quintus.UUCP
Mon Feb 8 18:57:33 AEST 1988
In article <248 at goofy.megatest.UUCP>, djones at megatest.UUCP (Dave Jones) writes:
> In article <260 at nyit.UUCP>, michael at nyit.UUCP (Michael Gwilliam) writes:
> > I'm writing a C like language to discribe data structures. When I
> > was writing the tokenizer using LEX and I got intrigued by a little
> > problem. Is it possible to write a regular expression that will
> > transform a /* comment */ into nothing?
> It is indeed intriguing. I don't think you can write any
> LR(k) context-free grammar to "transform it" into anything.
It is quite straightforward to write a Yacc grammar which matches
comments, so LR(1) is not just possible, it's easy.
Here it is. (OTHER is any character but / or *.)
%token '/' '*' OTHER
%start comment
%%
comment : '/' '*' rest
;
rest : OTHER rest
| '/' rest
| '*' hope
;
hope : '/'
| '*' hope
| OTHER rest
;
It is possible to express this directly as a single RE:
* + * + *
{/}{*} ({OTHER}|{/}) {*} ({OTHER} ({OTHER}|{/}) {*} ) {/}
but as I don't use LEX, I don't know how to say this in LEXese.
Note that *nested* comments cannot be expressed as a RE, but can
be handled by a simple Yacc grammar.
But if all you want to do is to skip C-like comments, why not RTM?
In "Chapter 5: LEX" of the "UNIX System V Programmer's Guide",
we find EXACTLY this example on page 207 of the 1987 edition.
...
"/*" skipcmnts();
...
%%
skipcmnts()
{
for (;;) {
while (input() != '*') ;
if (input() == '/') return;
unput(yytext[yyleng-1]);
}
}
{I've hacked it around a bit.} Note that this stores the comment in
the yytext[] buffer, which is a Good Thing if you want to write the
comment out somewhere, but it can overflow. yyless() can be used to
keep that buffer empty.
Frankly, I've always found it easier to transcribe the state machine
directly into C.
More information about the Comp.lang.c
mailing list