sed script to remove cr/lf except at paragraph breaks

Bill Fulton [Sys Admin] itwaf at dcatla.UUCP
Tue May 23 10:59:31 AEST 1989


In article <119 at sherpa.UUCP> rac at sherpa.UUCP (Roger A. Cornelius) writes:
> I'm in need of a sed script to remove MSDOS cr/lf (actually replace each
> cr/lf combination with one space) except at the start of a paragraph.
> i.e. only the cr/lf preceding a paragraph break should remain.  Paragraphs
> are marked only by four leading spaces and nothing else.
> Here's where I am now:
> [ sed script deleted]

How about lex, instead? I think the lex input between these lines:
----------
%%
\015\012"    "    ECHO;
\015\012          { strcpy(yytext, " "); ECHO; }
----------
should do what you want. Make it with 'lex <filename> ; cc lex.yy.c -ll',
then feed a.out your MSDOS file(s)! You could append a functions section to
do setup, or you could drive it from a front-end script.

I don't want to turn this into a lex vs. sed thing, but it does seem that
lex would be much more direct and easy. I agree that lex is "well ... a
little strange" if you don't work with it a lot, but once you start to mess
around with sed scripts such as you have, it starts to balance out.

Once I played with it a little, I've decided that lex is pretty neat as a
standalone utility!

Bill Fulton
dcatla!itwaf at gatech.edu  OR  ..!gatech!dcatla!itwaf



More information about the Comp.unix.questions mailing list