Unixworld competition - SOLUTIONS
Col. G. L. Sicherman
gls at corona.ATT.COM
Tue Apr 2 11:26:39 AEST 1991
As I promised, here are the bugs in the prize-winning comment-
stripping programs. First the C program:
#include <stdio.h>
char *sccsID="@(#) cstrip.c 1.1 Bart J. Besseling, 8/90";
int m[9][8] = { /* finite-state machine */
/* events:
/ * " ' \ \n sp ch states: */
{ 0x01,0x80,0x85,0x87,0x80,0x80,0x80,0x80 }, /* 0: hunt */
{ 0x02,0x33,0xc0,0xc0,0xc0,0xc0,0xc0,0xc0 }, /* 1: maybe */
{ 0x02,0x02,0x02,0x02,0x02,0x80,0x02,0x02 }, /* 2: c++ */
{ 0x13,0x14,0x13,0x13,0x13,0x83,0x83,0x13 }, /* 3: c */
{ 0x10,0x13,0x13,0x13,0x13,0x83,0x83,0x13 }, /* 4: end c */
{ 0x85,0x85,0x80,0x85,0x86,0x80,0x85,0x85 }, /* 5: string */
{ 0x85,0x85,0x85,0x85,0x85,0x85,0x85,0x85 }, /* 6: \ in str */
{ 0x87,0x87,0x87,0x80,0x88,0x80,0x87,0x87 }, /* 7: char */
{ 0x87,0x87,0x87,0x87,0x87,0x87,0x87,0x87 }, /* 8: \ in char */
};
int
main() /* Input parser and output generator */
{
register int ch, event, state;
for (state = 0; (ch = getchar()) != EOF;) {
/* translate character into event */
switch (ch) {
case '/': event = 0; break;
case '*': event = 1; break;
case '"': event = 2; break;
case '\'': event = 3; break;
case '\\': event = 4; break;
case '\n': event = 5; break;
case '\t':
case ' ': event = 6; break;
default: event = 7; break;
}
/* obtain next state and operation from machine */
state = m[state & 0x0f][event];
/* perform operation */
if (state & 0x10) putchar(' ');
if (state & 0x20) putchar(' ');
if (state & 0x40) putchar('/');
if (state & 0x80) putchar(ch);
}
return 0;
}
The transition matrix has an erroneous entry that resets the automaton
after two asterisks. The program will fail to terminate any comment
that ends in "**/", such as
/* This compiles, though it shouldn't. **/
IDENTIFICATION DIVISION.
/* What's a COBOL statement doing here? */
main() {printf("hello, world\n");}
Ian Collier found the bug and told me so. If you found the bug and
didn't tell me, that's all right too.
Now the lex program:
%Start CODE CCOM STRING CHAR CPLUS
%%
%{
char *sccsID = "@(#) sc 1.0 Andre van Dalen, 6/90";
BEGIN CODE;
%}
<STRING>([^\\]\")|(\\\\\") |
<CHAR>([^.\\]\')|(\\\\\') |
<CPLUS>\n { ECHO; BEGIN CODE; }
<CCOM>"*/" { two_space(); BEGIN CODE; }
<CCOM,CPLUS>. { output(*yytext=='\t'?'\t':' ');}
<CODE>"/*" { two_space(); BEGIN CCOM ; }
<CODE>"//" { two_space(); BEGIN CPLUS ;}
<CODE>\" { ECHO; BEGIN STRING; }
<CODE>\' { ECHO; BEGIN CHAR; }
<STRING,CODE>. { ECHO; }
%%
two_space()
{
output(' '); output(' ');
}
main(argc, argv)
int argc; char **argv;
{
if (argc==1) yylex();
else while (*++argv) {
fclose(yyin);
if (!(yyin=fopen(*argv,"r"))) {
perror(*argv);
exit(1);
}
yylex();
}
exit(0);
}
This one doesn't handle multiple backslashes, though lex has the power
to do so easily. A program like this will break it:
main()
{
char *str = "This string has everything \\\" /* and more!\n";
printf(str);
}
Finally, the shell program:
# @(#) sc Strip comments from a C/C++ source file
# Author: Carl Bergerson, August 1990
# set -x # Uncomment for debugging
# Define correct usage message:
USAGE="Usage: $0 [sourcefile]"
case $# in
0) sed -e 's/^#/a#/' | /lib/cpp |
sed -e '/^#/d' -e 's/^a#/#/';;
1) sed -e 's/^#/a#/' $1 | /lib/cpp |
sed -e '/^#/d' -e 's/^a#/#/';;
*) echo $USAGE >&2
exit 1 ;;
esac
Even assuming that /lib/cpp is a C++ preprocessor that strips // comments,
this can be broken with a little ingenuity:
/*
* play music on your home computer
*/
main() {
printf("press the return key to hear Mozart's sonata in \
a# ");
getchar();
play();
}
The script uses "a#" as a flag, but it is not a safe flag.
--
G. L. Sicherman
gls at corona.att.COM
More information about the Comp.lang.c
mailing list