You, too, can look at strings.
Cameron Laird
cl at lgc.com
Thu Feb 21 02:02:04 AEST 1991
I asked for help extracting string constants from source code.
I summarize the responses I received:
1. my own was to write (approximately)
echo 's/"[^"]*$/"/
s/[^"]*"/"/' >/tmp/string_script
grep '".*"' | tee /tmp/string_list | \
sed -f /tmp/string_script | ...
rm /tmp/string_script
as part of a filter. The filter does these things:
a. puts a grep-listing (not egrep, not fgrep, but grep)
of all lines with at least two "-s into /tmp/string_list,
for my later convenience in examining the contexts where
the strings occur; and
b. copies what's left of those lines after throwing away
everything before the first " and after the last " to
stdout.
This was something I knew how to write in a few minutes,
and works well enough, although it is ignorant nothing about
the syntax of C beyond looking for a pair of "-s.
2. various folks suggested combinations of
{m,}xstr--available on uunet:bsd-sources/pgrm/{m,}xstr/*
I thought this had possibilities, but didn't
work with it much.
cxref
I didn't find any quick way to make this do
something useful to me.
strings--this was definitely not what I had in
mind (I'm thinking about source code, and,
as far as I'm concerned, strings is for work-
ing with object files), but I've invoked
strings hundreds of times for other chores,
and I'm happy to give it a bit of publicity.
3. a few folks wrote to say that perl could do it in
one line; no one delivered such a line, but I didn't
ask. Does perl remind anyone else of APL? That's not
entirely a bad thing ...
4. comp.compilers publishes each month sites for distribution
of lexical analyzers and such. I haven't checked this
list. I also received the advice that, "At site
primost.cs.wisc.edu (128.105.2.115) in directory
/pub/comp.compilers are files called *grammar.Z
They contain grammars for lex/yacc for c, c++ ftn
and pascal. . . ."
5. a Swedish HPUX user reported that he relies on findstr,
in the NLS (Natural Language Support) package that is part
of HPUX.
6. William A. Hoffman posted the kind of lapidary answer I expected
from the net: a couple dozen lines, definitive (in some sense),
no-nonsense, functional, and a starting-point for yet more re-
finements (or arguments).
... string.lex
--------------------------------------------------------
string \"([^"\n]|\\["\n])*\"
%%
{string} printf("%s\n", yytext); return(1);
\n ;
. ;
%%
main()
{
int i;
while(i= yylex())
;
}
yywrap()
{
}
------------------------------------------------------------
to run just:
lex string.lex
cc lex.yy.c -o string
string < *.c
The moderator noted that this deserved to be beefed up "... to
handle character constants and comments ..."
7. One reader wrote that he'd send a finite-state machine which
models C syntax as soon as he found his copy. I haven't heard
from him since. I'll pass it along when it arrives.
My apologies to Henry Spencer for misremembering his name as "Harry".
Thanks, all.
--
Cameron Laird USA 713-579-4613
cl at lgc.com USA 713-996-8546
--
Send compilers articles to compilers at iecc.cambridge.ma.us or
{ima | spdcc | world}!iecc!compilers. Meta-mail to compilers-request.
More information about the Comp.unix.programmer
mailing list