Re^2: Why nested comments not allowed?
Mike Coffin
mike at cs.arizona.edu
Tue Feb 20 07:29:50 AEST 1990
>From article <4320 at daffy.cs.wisc.edu>, by schaut at cat9.cs.wisc.edu (Rick Schaut):
> I think you've missed the point. In compilers for languages that do not
> allow nested comments the parser never see the comment at all. The comments
> are eaten by the scanner (which is a much simpler part of the compiler than
> is a parser). Essentially, any language that requires balancing characters
> (e.g. the language of balanced parens) cannot be represented using regular
> expressions, and regular expressions are the construct upon which scanners
> are based. In short, a compiler for a language that doesn't allow nested
> comments is _much_ faster than a compiler for a language that allows them.
The last sentence doesn't follow from the rest of the paragraph.
Scanners may be *based* on regular expressions, but the popular
scanners (Lex, Flex, and friends) are not *restricted* to regular
expressions. In fact, as people often have pointed out, parsing
comments with regular expressions can be dangerous with some scanners
because long comments will overflow fixed-sized buffers. A common
work-around is to detect the beginning of a comment by a regular
expression and call a function (in C, perhaps) to eat the rest of the
comment. This avoids the buffer-overflow problems and makes it
trivial to parse nested comments---just count the number of
<begin-comment> tokens and match them with <end-comment> tokens.
Nothing slow about that.
--
Mike Coffin mike at arizona.edu
Univ. of Ariz. Dept. of Comp. Sci. {allegra,cmcl2}!arizona!mike
Tucson, AZ 85721 (602)621-2858
More information about the Comp.lang.c
mailing list