mawk0.97.shar 1 of 6 (6 pieces not 4)

Sun May 12 00:49:56 AEST 1991

------------------cut here----------------

# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by ssc-bee!brennan on Fri May 10 18:11:41 PDT 1991
# Contents:  mawk0.97/ mawk0.97/rexp/ mawk0.97/test/ mawk0.97/examples/
#	mawk0.97/msdos/ mawk0.97/packing.list mawk0.97/README
#	mawk0.97/LIMITATIONS mawk0.97/Makefile mawk0.97/mawk.manual
#	mawk0.97/array.c mawk0.97/bi_funct.c mawk0.97/bi_funct.h
#	mawk0.97/bi_vars.c mawk0.97/bi_vars.h mawk0.97/cast.c mawk0.97/code.c
#	mawk0.97/code.h mawk0.97/da.c mawk0.97/error.c mawk0.97/execute.c
#	mawk0.97/fcall.c mawk0.97/field.c mawk0.97/field.h mawk0.97/files.c
#	mawk0.97/files.h mawk0.97/fin.c mawk0.97/fin.h mawk0.97/hash.c
#	mawk0.97/init.c mawk0.97/init.h mawk0.97/jmp.c mawk0.97/jmp.h
#	mawk0.97/kw.c mawk0.97/machine.h mawk0.97/main.c mawk0.97/makescan.c
#	mawk0.97/matherr.c mawk0.97/mawk.h mawk0.97/memory.c mawk0.97/memory.h
#	mawk0.97/parse.y mawk0.97/print.c mawk0.97/re_cmpl.c mawk0.97/regexp.h
#	mawk0.97/repl.h mawk0.97/scan.c mawk0.97/scan.h mawk0.97/scancode.c
#	mawk0.97/sizes.h mawk0.97/split.c mawk0.97/symtype.h mawk0.97/types.h
#	mawk0.97/zmalloc.c mawk0.97/zmalloc.h mawk0.97/rexp/Makefile
#	mawk0.97/rexp/rexp.c mawk0.97/rexp/rexp.h mawk0.97/rexp/rexp0.c
#	mawk0.97/rexp/rexp1.c mawk0.97/rexp/rexp2.c mawk0.97/rexp/rexp3.c
#	mawk0.97/rexp/rexpdb.c mawk0.97/test/README mawk0.97/test/benchmarks
#	mawk0.97/test/cat.awk mawk0.97/test/concat.awk mawk0.97/test/fields.awk
#	mawk0.97/test/loops.awk mawk0.97/test/newton.awk
#	mawk0.97/test/primes.awk mawk0.97/test/qsort.awk mawk0.97/test/reg0.awk
#	mawk0.97/test/reg1.awk mawk0.97/test/reg2.awk mawk0.97/test/sample
#	mawk0.97/test/squeeze.awk mawk0.97/test/test.sh mawk0.97/test/wc.awk
#	mawk0.97/test/wfrq.awk mawk0.97/test/wfrq0.awk mawk0.97/test/words.awk
#	mawk0.97/test/words0.awk mawk0.97/examples/decl.awk
#	mawk0.97/examples/deps.awk mawk0.97/examples/gdecl.awk
#	mawk0.97/examples/nocomment.awk mawk0.97/msdos/INSTALL
#	mawk0.97/msdos/makefile mawk0.97/msdos/mklib.bat
#	mawk0.97/msdos/rand48.asm mawk0.97/msdos/rand48.h
#	mawk0.97/msdos/rand48_0.c mawk0.97/msdos/reargv.c

echo mkdir - mawk0.97
mkdir mawk0.97
chmod u=rwx,g=rx,o=rx mawk0.97

echo x - mawk0.97/packing.list
sed 's/^@//' > "mawk0.97/packing.list" <<'@//E*O*F mawk0.97/packing.list//'

################################################
# These files form the mawk distribution
#
# Mawk is an implementation of the AWK Programming Language as
# defined and described in Aho, Kernighan and Weinberger, The
# Awk Programming Language, Addison-Wesley, 1988.
#
################################################
# Source code written by Michael D. Brennan
# Copyright (C) 1991 , Michael D. Brennan
################################################
packing.list		this file
README			how to get started
LIMITATIONS		restrictions on use
Makefile		mawk makefile
mawk.manual		mock manual
######################
array.c			source files
bi_funct.c
bi_funct.h
bi_vars.c
bi_vars.h
cast.c
code.c
code.h
da.c
error.c
execute.c
fcall.c
field.c
field.h
files.c
files.h
fin.c
fin.h
hash.c
init.c
init.h
jmp.c
jmp.h
kw.c
machine.h
main.c
makescan.c
matherr.c
mawk.h
memory.c
memory.h
parse.y
print.c
re_cmpl.c
regexp.h
repl.h
scan.c
scan.h
scancode.c
sizes.h
split.c
symtype.h
types.h
zmalloc.c
zmalloc.h
########################
# directory:  rexp
rexp/Makefile		makefile for regexp.a
rexp/rexp.c		source for regular matching library
rexp/rexp.h
rexp/rexp0.c
rexp/rexp1.c
rexp/rexp2.c
rexp/rexp3.c
rexp/rexpdb.c
#######################
# directory:  test      benchmarking directory
test/README
test/benchmarks
test/cat.awk
test/concat.awk
test/fields.awk
test/loops.awk
test/newton.awk
test/primes.awk
test/qsort.awk
test/reg0.awk
test/reg1.awk
test/reg2.awk
test/sample			sample input file for test.sh
test/squeeze.awk
test/test.sh
test/wc.awk
test/wfrq.awk
test/wfrq0.awk
test/words.awk
test/words0.awk
######################
# directory:  examples       useful awk programs
examples/decl.awk
examples/deps.awk
examples/gdecl.awk
examples/nocomment.awk
######################
# directory  msdos
msdos/INSTALL
msdos/makefile
msdos/mklib.bat
msdos/rand48.asm
msdos/rand48.h
msdos/rand48_0.c
msdos/reargv.c
@//E*O*F mawk0.97/packing.list//
chmod u=rw,g=r,o=r mawk0.97/packing.list

echo x - mawk0.97/README
sed 's/^@//' > "mawk0.97/README" <<'@//E*O*F mawk0.97/README//'

to build mawk:

make sure  there is an appropriate description of
your system in machine.h

set CFLAGS in the Makefile to pick the appropriate blob
in machine.h

run make

PS:
I expected to have bcopy() <-> memcpy()
hassles on 4.3BSD, but didn't
Is this right? or did someone add memcpy(), strchr() etc
   to that machine?
   If 4.3BSD in machine.h is wrong, let me know at
   brennan at bcsaic.boeing.com
@//E*O*F mawk0.97/README//
chmod u=r,g=r,o=r mawk0.97/README

echo x - mawk0.97/LIMITATIONS
sed 's/^@//' > "mawk0.97/LIMITATIONS" <<'@//E*O*F mawk0.97/LIMITATIONS//'

Mawk is an implementation of the AWK Programming Language
as defined in Aho, Kernighan and Weinberger, The AWK 
Programming Language, Addison-Wesley, 1988.

The source code is original work, in the sense that its
development relied only on the specification of the AWK
language in the book above.  Most of the algorithms and
data structures used in this code are not original --
but based on knowledge acquired from numerous sources.
Originality is claimed only for the aggregate work.  Any
ideas or techniques in this code can be freely copied and
used in other work.  

The source code may be modified provided the copyright
notices remain intact, and modifications are unambiguously
distinct from the original.  I want to retain credit for my
work and do not want credit for yours.

Redistribution in any form is permitted provided the built-in
variable VERSION is retained, and its initial value only
modified by appending extra lines.

    For example, if you modify a mawk with VERSION

	mawk x.xx Mon Year, Copyright (C) Michael D. Brennan

    then add an extra line to VERSION without modifying the
    first line.

	mawk x.xx Mon Year, Copyright (C) Michael D. Brennan
	mod y.yy  Mon Year, your name

Michael D. Brennan
16 Apr 1991

@//E*O*F mawk0.97/LIMITATIONS//
chmod u=r,g=r,o=r mawk0.97/LIMITATIONS

echo x - mawk0.97/Makefile
sed 's/^@//' > "mawk0.97/Makefile" <<'@//E*O*F mawk0.97/Makefile//'

# ###################################################
# This is a makefile for mawk,
# an implementation of The AWK Programmin Language, 1988.
# 
# 

SHELL=/bin/sh

####################################
# CFLAGS needs to match a define in machine.h
# unless machine.h uses a built-in compiler flag
#

CFLAGS = -O -DULTRIX
#CFLAGS =  -O -DBSD43
YACC=yacc -dv
#YACC=bison -dvy

#######################################

O=parse.o scan.o memory.o main.o hash.o execute.o code.o\
  da.o error.o init.o bi_vars.o cast.o print.o bi_funct.o\
  kw.o jmp.o array.o field.o  split.o re_cmpl.o zmalloc.o\
  fin.o files.o  scancode.o matherr.o  fcall.o

REXP_C=rexp/rexp.c rexp/rexp0.c rexp/rexp1.c rexp/rexp2.c\
    rexp/rexp3.c rexp/rexpdb.c

mawk : $(O)  rexp/regexp.a
	cc $(CFLAGS) -o mawk $(O) -lm rexp/regexp.a

rexp/regexp.a :  $(REXP_C)
	cd  rexp ; make

parse.c  : parse.y
	@echo  expect 3 shift/reduce conflicts
	$(YACC)  parse.y
	mv y.tab.c parse.c
	-if cmp -s y.tab.h parse.h ;\
	   then rm y.tab.h ;\
	   else mv y.tab.h parse.h ; fi

scancode.c :  makescan.c  scan.h
	cc -o makescan.exe  makescan.c
	makescan.exe > scancode.c
	rm makescan.exe

array.o : bi_vars.h sizes.h zmalloc.h memory.h types.h machine.h mawk.h symtype.h
bi_funct.o : fin.h bi_vars.h sizes.h memory.h zmalloc.h regexp.h types.h machine.h field.h repl.h files.h bi_funct.h mawk.h symtype.h init.h
bi_vars.o : bi_vars.h sizes.h memory.h zmalloc.h types.h machine.h field.h mawk.h symtype.h init.h
cast.o : parse.h sizes.h memory.h zmalloc.h types.h machine.h field.h scan.h repl.h mawk.h symtype.h
code.o : sizes.h memory.h zmalloc.h types.h machine.h code.h mawk.h init.h
da.o : sizes.h memory.h zmalloc.h types.h machine.h field.h repl.h code.h bi_funct.h mawk.h symtype.h
error.o : parse.h bi_vars.h sizes.h types.h machine.h scan.h mawk.h symtype.h
execute.o : sizes.h memory.h zmalloc.h regexp.h types.h machine.h field.h code.h repl.h bi_funct.h mawk.h symtype.h
fcall.o : sizes.h memory.h zmalloc.h types.h machine.h code.h mawk.h symtype.h
field.o : parse.h bi_vars.h sizes.h memory.h zmalloc.h regexp.h types.h machine.h field.h scan.h repl.h mawk.h symtype.h init.h
files.o : fin.h sizes.h memory.h zmalloc.h types.h machine.h files.h mawk.h
fin.o : parse.h fin.h bi_vars.h sizes.h memory.h zmalloc.h types.h machine.h field.h scan.h mawk.h symtype.h
hash.o : sizes.h memory.h zmalloc.h types.h machine.h mawk.h symtype.h
init.o : bi_vars.h sizes.h memory.h zmalloc.h types.h machine.h field.h code.h mawk.h symtype.h init.h
jmp.o : sizes.h memory.h zmalloc.h types.h machine.h code.h jmp.h mawk.h init.h
kw.o : parse.h sizes.h types.h machine.h mawk.h symtype.h init.h
main.o : fin.h bi_vars.h sizes.h memory.h zmalloc.h types.h machine.h field.h code.h files.h mawk.h init.h
makescan.o : parse.h scan.h symtype.h
matherr.o : sizes.h types.h machine.h mawk.h
memory.o : sizes.h memory.h zmalloc.h types.h machine.h mawk.h
parse.o : bi_vars.h sizes.h memory.h zmalloc.h types.h machine.h field.h code.h files.h bi_funct.h mawk.h jmp.h symtype.h
print.o : bi_vars.h parse.h sizes.h memory.h zmalloc.h types.h machine.h field.h scan.h files.h bi_funct.h mawk.h symtype.h
re_cmpl.o : parse.h sizes.h memory.h zmalloc.h regexp.h types.h machine.h scan.h repl.h mawk.h symtype.h
scan.o : parse.h fin.h sizes.h memory.h zmalloc.h types.h machine.h field.h scan.h repl.h files.h mawk.h symtype.h init.h
split.o : bi_vars.h parse.h sizes.h memory.h zmalloc.h regexp.h types.h machine.h field.h scan.h bi_funct.h mawk.h symtype.h
zmalloc.o : sizes.h zmalloc.h types.h machine.h mawk.h
@//E*O*F mawk0.97/Makefile//
chmod u=r,g=r,o=r mawk0.97/Makefile

echo x - mawk0.97/mawk.manual
sed 's/^@//' > "mawk0.97/mawk.manual" <<'@//E*O*F mawk0.97/mawk.manual//'

                              Mawk Manual 

Mawk implements the awk language as defined in Aho, Kernighan and 
Weinberger, The AWK Programming Language, Addison-Wesley, 1988, ISBN 
0-201-07981-X, hereafter called the AWK book.  Chapter 2 serves as a 
reference to the language and the rest (8 total chapters) provides a 
wide range of examples and applications.  This book is must reading to 
understand the versatility of the language.  

The 1988 version of the language is sometimes called new awk as opposed 
to the 1977 version (awk or old awk.) Virtially every Unix system has 
old awk, somewhere in the documentation will be an (old) awk tutorial 
(probably in support tools).  If you use (old) awk, the transition to 
new awk is easy.  The language has been extended and ambiguous points 
clarified, but old awk programs still run under new awk.  

This manual assumes you know (old) awk, and hence concentrates on the 
new features of awk.  Feature xxx is new means xxx was added to the 1988
version.  

Experienced new awk users should read sections 9 and 12, and skim 
sections 7 and 8.  

1. Command line

	mawk [-Fs] 'program'  optional_list_of_files
	mawk [-Fs] -f program_file  optional_list_of_files

2. Program blocks

    Program blocks are of the form:

	pattern { action }

    pattern can be:

	regular_expression
	expression
	( pattern )
	! pattern
	pattern || pattern
	pattern && pattern

	pattern , pattern  (range pattern)
	BEGIN
	END

Range, BEGIN and END patterns cannot be combined to form new patterns.  
BEGIN and END patterns require an action; otherwise, if action is 
omitted it is implicitly { print }.  

	NR==2    {  print }  # prints line number 2
	NR==2		     # also prints line number 2

If pattern is omitted then action is always applied.

	{ print $NF }

prints the last field of every record.

3. Statement format and loops

Statements are terminated by newlines, semi-colons or both.  Groups of 
statements are blocked via { ...  } as in C.  The last statement in a 
block doesn't need a terminator.  Blank lines have no meaning; an empty 
statement is terminated with a semi-colon.  Long statements can be 
continued with a backslash, \.  A statement can be broken without a 
backslash after a comma, left brace, &&, ||, do, else, the right 
parenthesis of an if, while or for statement, and the right parenthesis 
of a function definition.  

Loops are for(){}, while(){} and do{}while() as in C.  

4. Expression syntax

The expression syntax and grouping of the language is similar to C.  
Primary expressions are numeric constants, string constants, variables, 
arrays and functions.  Complex expressions are composed with the 
following operators in order of increasing precedence.  

    assignment: = += -+ *= /= ^=
    conditional:  ? :
    logical or:   ||
    logical and:  &&
    array membership :   in
    matching :   ~   !~
    relational :  <  >   <=  >=  ==  !=
    concatenation:   (no explicit operator)
    add ops:  +  -
    mul ops:  *  /  % 
    unary  :  +  -
    logical not :  !
    exponentiation:  ^
    inc and dec:  ++ -- (both post and pre)
    field:  $

5. Builtin variables.

The following variables are built-in and initialized before program 
execution.  

    ARGC	number of command line arguments
    ARGV	array of command line arguments, 0..ARGC-1
    FILENAME    name of the current input file
    FNR         current record number in the current input file
    FS		splits records into fields as a regular expression
    NF		number of fields in the current record, i.e., $0
    NR		current record number in the total input stream
    OFMT	format for printing numbers; initially = "%.6g"
    OFS		inserted between fields on output, initially = " "
    ORS		terminates each record on output, initially = "\n"
    RLENGTH     length of the last call to the built-in function, match()
    RS		input record separator, initially = " "
    RSTART	index of the last call to match()
    SUBSEP	used to build multiple array subscripts, initially = "\034"
    VERSION     Mawk version, unique to mawk.

ARGC, ARGV, FNR, RLENGTH, RSTART and SUBSEP are new.  

The current input record is stored in the field, $0.  The fields of $0 
determined by splitting with RS are stored in $1, $2, ..., $NF.  

6. Built-in Functions

String functions

    index(s,t)
    length(s), length
    split(s, A, r), split(s, A)
    substr(s,i,n) , substr(s,i)
    sprintf(format, expr_list)

    match(s,r)		returns the index where string s matches
    			regular expression r or 0 if no match. As
			a side effect, sets RSTART and RLENGTH.

    gsub(r, s, t)       Global substitution, every match of regular
			expression r in variable t is replaced by s.
			The number of matches/replacements is returned.

    sub(r, s, t)	Like gsub(), except at most one replacement.

Match(), gsub() and sub() are new.  If r is an expr it is coerced to 
string and then treated as a regular expression.  In sub and gsub, t can
be a variable, field or array element, i.e., it must have storage to 
hold the modification.  Sub(r,s) and gsub(r,s) are the same as 
sub(r,s,$0) and gsub(r,s,$0).  In the replacement string s, an & is 
replaced by the matched piece and a literal & is obtained with \&.  
E.g., 

	    y = x = "abbc"
	    sub(/b+/, "B&B" , x)
	    sub(/b+/, "B\&B" , y)
	    print x, y

outputs:    aBbbBc aB&Bc

Arithmetic functions

    atan2(y,x)		arctan of y/x between -pi and pi.
    cos(x)
    exp(x)
    int(x)		x.dddd ->  x.0
    log(x)
    rand()		returns random number , 0 <= r < 1.
    sin(x)
    sqrt(x)
    srand(x) , srand()  seeds random number generator, uses clock
			if x is omitted.

Output functions

    print		writes  $0 ORS   to stdout.

    print expr1 , expr2 , ... , exprn
			writes expr1 OFS expr2 OFS ... OFS exprn ORS to
			stdout.

    printf format, expr_list
			Acts like the C library function, writing to
			stdout.  Supported conversions are
			%c, %d, %e, %f, %g, %o, %s and %x.  
			- , width and .prec are supported.
			Dynamic widths can be built using string operations

Output can be redirected 

   print[f]  > file
	     >> file
	     | command

File and command are awk expressions that are interpreted as a filename 
or a shell command.  

Input functions

    getline		read $0, update NF, NR and FNR.

    getline < file      read $0 from file, update NF.
    getline var         read var from input stream, update NR, FNR.
    getline var < file  read var from next record of file

    command | getline   read $0 from piped command, update NF.
    command | getline var   read var from next record of piped command.

(Old) awk had getline, the redirection facilities are new.

    Files or commands are closed with

	close(expr)

where expr is command or file as a string.  Close returns 0 if expr was 
in fact an open file or command else -1.  Close is needed if you want to
reread a file, rerun a command, have a large number of output files 
without mawk running out of resources or wait for an output command to 
finish.  Here is an example of the last case: 

    { ....  do some processing on each input line
      #  send the processed line to sort
      print | "sort > temp_file"

    }

    END { # reread the sorted input
      close( "sort > temp_file")  # makes sure sort is finished

      cnt=1
      while ( getline line[cnt++] < "temp_file"  > 0 )  ;  
      system( "rm temp_file" )  # cleanup

      ... process line[1], line[2] ... line[cnt-1]
    }

The system() function executes a command and returns the command's exit 
status.  Mawk uses the shell in the environment variable SHELL to 
execute system or command pipelines; defaulting to "/bin/sh" if SHELL is
not set.  

7. String constants

String constants are written as in C.

	"This is a string with a newline at the end.\n"

Strings can be continued across a line by escaping (\) the newline.  The
following escape sequences are recognized.  

	\\		\
	\"		"
	\'		'
	\a		alert, ascii 7
	\b		backspace, ascii 8
	\t		tab, ascii 9
	\n		newline, ascii 10
	\v		vertical tab, ascii 11
	\f		formfeed, ascii 12
	\r		carriage return, ascii 13
	\ddd		1, 2 or 3 octal digits for ascii ddd

	\xhh		1 or 2 hex digits for ascii  hh

If you escape any other character \c, you get \c, i.e.  the escape is 
ignored.  Mawk is different than most awks here; the AWK book says \c is
c.  The reason mawk chooses to be different is for easier conversion of 
strings to regular expressions.  

8. Regular expressions

Awk notation for regular expressions is in the style of egrep(1).  In 
awk, regular expressions are enclosed in / ...  /.  A regular expression
/r/, is a set of strings.  

	    s ~ /r/

is an awk expression that evaluates to 1 if an element of /r/ is a 
substring of s and evaluates to 0 otherwise.  ~ is called the match 
operator and the expression is read "s matches r".  

	   s ~ /^r/   is 1 if some element of r is a prefix of s.
	   s ~ /r$/   is 1 if some element of r is a suffix of s.
	   s ~ /^r$/  is 1 if s is an element of r.

Replacing ~ by !~ , the not match operator, reverses the meanings.  In 
patterns, /r/ and !/r/ are shorthand for $0 ~ /r/ and $0 !~ /r/.  

Regular expressions are combined by the following rules.

	//  stands for the one element set "" (not the empty set).
	/c/ for a character c is the one element set "c".

	/rs/  is all elements of /r/ concatenated with all 
	      elements of /s/.

	/r|s/ is the set union of /r/ and /s/.

	/r*/  called the closure of r is // union /rr/ union /rrr/ ...
	      In words, r repeated zero or more times.

The above operations are sufficient to describe all regular expressions,
but for ease of notation awk defines additional operations and notation.

	/r?/  // union /r/.  In words r 0 or 1 time.
	/r+/  Positive closure of r.  R 1 or more times.
	(r)   Same as r -- allows grouping.
	.     Stands for any character (for mawk this means 
	      ascii 1 through ascii 255)
	[c1c2..cn]    A character class same as (c1|c2|...|cn) where
	      ci's are single characters.

	[^c1c2..cn]   Complement of the class [c1c2..cn].  For mawk
	      complement in the ascii character set 1 to 255.

Ranges c1-cn are allowed in character classes.  For example,

	/[_a-zA-Z][_a-zA-Z0-9]*/

expresses the set of possible identifiers in awk.

The operators have increasing precedence:

       |
       implicit concatenation
       + * ?

So /a|b+/ means a or (1 or more b's), and /(a|b)+/ means (a or b) one or
more times.  The so called regular expression metacharacters are \ ^ $ .
[ ] | ( ) * + ? .  To stand for themselves as characters they have to be
escaped.  (They don't have to be escaped in classes, inside classes the 
meta-meaning is off).  The same escape sequences that are recognized in 
strings (see above) are recognized in regular expressions.  For mawk, 
the escape rule for \c changes to c.  

For example,

	/[ \t]*/   is optional space
	/^[-+]?([0-9]+\.?|\.[0-9])[0-9]*([eE][-+]?[0-9]+)?$/
		   is numbers in the Awk language.
		   Note,  . must be escaped to have
		   its meaning as decimal point.

For building regular expressions, you can think of ^ and $ as phantom 
characters at the front and back of every string.  So /(^a|b$|^A.*B$)/ 
is the set of strings that start with a or end with b or (start with A 
and end with B).  

Dynamic regular expressions are new.  You can write 

	x ~ expr

and expr is interpreted as a regular expression.  The result of x ~ y 
can vary with the variable y; so 

	x ~ /a\+b/   and   x ~ "a\+b"

are the same, or are they? In mawk, they are; in some other awk's they 
are not.  In the second expression, "a\+b" is scanned twice: once as a 
string constant and then again as a regular expression.  In mawk the 
first scan gives the four character string 'a' '\' '+' 'b' because mawk 
treats \+ as \+; the second scan gives a regular expression matched by 
the three character string 'a' '+' 'b' because on the second scan \+ 
becomes +.  

If \c becomes c in strings, you need to double escape metacharacters, 
i.e., write 

	x ~ "a\\+b".

Exercise: what happens if you double escape in mawk?

In strings if you only escape characters with defined escape sequences 
such as \t or \n or meta-characters when you expect to use a string as a
regular expression, then mawk's rules are intuitive and simple.  See 
example/cdecl.awk and example/gdecl.awk for the same program with single
and double escapes, the first is clearer.  

9. How Mawk splits lines, records and files.

Mawk uses the essentially the same algorithm to split lines into pieces 
with split(), records into fields on FS, and files into records on RS.  

Split( s, A, sep ) splits string s into array A with separator sep as 
follows: 

    Sep is interpreted as a regular expression.

    If s = "", there are no pieces and split returns 0.

    Otherwise s is split into pieces by the matches with sep
    of positive length treated as a separator between pieces,
    so the number of pieces is the number of matches + 1.
    Matches of the null string do not split.
    So sep = "b+" and sep = "b*" split the same although the
    latter executes more slowly.

    Split(s, A) is the same as split(s, A, FS).
    With mawk you can write sep as a regular expression, i.e.,
    split(s, A, "b+") and split(s, A, /b+/) are the same.

    Sep = " " (a single space) is special.  Before the algorithm is
    applied, white-space is trimmed from the front and back of s.
    Mawk defines white-space as SPACE, TAB, FORMFEED, VERTICAL TAB
    or NEWLINE, i.e [ \t\f\v\n].  Usually this means SPACE or TAB
    because NEWLINE usually separates records, and the other
    characters are rare.  The above algorithm
    is then applied with sep = "[ \t\f\v\n]+".

    If length(sep) = 1, then regular expression metacharacters do
    not have to be escaped, i.e. split(s, A, "+") is the same as
    split(s, A, /\+/).

Splitting records into fields works exactly the same except the pieces 
are loaded into $1, $2 ...  $NF.  

Records are also the same, RS is treated as a regular expression.  But 
there is a slight difference, RS is really a record terminator (ORS is 
really a terminator also).  

    E.g., if FS = ":" and $0 = "a:b:" , then
    NF = 3 and $1 = "a", $2 = "b" and $3 = "", but
    if "a:b:" is the contents of an input file and RS = ":", then
    there are two records "a" and "b".

    RS = " " does not have special meaning as with FS.

Not all versions of (new) awk support RS as a regular expression.  This 
feature of mawk is useful and improves performance.  

    BEGIN { RS = "[^a-zA-Z]+" 
	    getline
	    if ( $0 == "" ) NR = 0 
	    else word[1] = $0
    }

    { word[NR] = $0 }

    END { ... do something with word[1]...word[NR] }

isolates words in a document over twice as fast as reading one line at a
time and then examining each field with FS = "[^a-zA-Z]+".  

To remove comments from C code: 

    BEGIN { RS = "/\*([^*]|\*[^/])*\*/"  # comment is RS
	    ORS = " "
    }

    { print }

    END { printf "\n" }

10. Multi-line records

Since mawk interprets RS as a regular expression, multi-line records are
easy.  Setting RS = "\n\n+", makes one or more blank lines separate 
records.  If FS = " " (the default), then single newlines, by the rules 
for space above, become space.  

   For example, if a file is "a b\nc\n\n", RS = "\n\n+" and
   FS = " ", then there is one record "a b\nc" with three
   fields "a", "b" and "c".  Changing FS = "\n", gives two
   fields "a b" and "c"; changing FS = "", gives one field
   identical to the record.

For compatibility with (old) awk, setting RS = "" has the same
effect on determining records as RS = "\n([ \t]*\n)+".

Most of the time when you change RS for mult-line records, you
will also want to change ORS to "\n\n".

11. User functions.

User defined functions are new.  They can be passed expressions by value
or arrays by reference.  Function calls can be nested and support 
recursion.  The syntax is 

	function  funcname( args ) {

	.... body

	}

Newlines are ignored after the ')' so the '{' can start on a different 
line.  Inside the body, you can use a return statement 

	return expr
	return

As in C, there is no distinction between functions and procedures.  A 
function does not need an explicit return.  Extra arguments act as local
variables.  For example, csplit(s, A) puts each character of s in array 
A.  

	function  csplit(s, A,     i)
	{
	  for(i=1; i <= length(s) ; i++)
		A[i] = substr(s, i, 1)
	}

Putting lots of space between the passed arguments and the local 
variables is a convention that can be ignored if you don't like it.  

Dynamic regular expressions allow regular expressions to be passed to 
user defined functions.  The following function gobble() is the lexical 
scanner for a recursive descent parser, the whole program is in 
examples/cdecl.awk.  

	function gobble( r,   x) # eat regular expression 
	    #  r off the front of global variable line

	{
	  if ( match( line, "^(" r ")") )
	  {
	    x = substr(line, 1, RLENGTH)
	    line = substr(line, RLENGTH)
	  }
	  else  x = ""

	  return x
	}

You can call a function before it is defined, but the function name and 
the '(' must not be separated by white space to avoid confusion with 
concatenation.  

12. Other differences in mawk

The main differences between mawk and other awks have been discussed, RS
as a regular expression and regular expression metacharacters don't have
to be double escaped.  Here are some others: 

  VERSION  -- built-in variable holding version number of mawk.

	mawk 'BEGIN{print VERSION}'       shows it.

  -D  --  command line flag causes mawk to dump to stderr 
	  a mawk assembler listing of the current program.
	  The program is executed by a stack machine internal
	  to mawk.  The op codes are in code.h, the machine in
	  execute.c.

  srand() --    
      During initialization, mawk seeds the random number generator
      by silently calling srand(), so calling srand() yourself is
      unnecessary.  The main use of srand is to use srand(x) to get
      a repeatable stream of random numbers.  Srand(x) returns x
      and srand() returns the value of the system clock in some form
      of ticks.

13. MsDOS

For a number of reasons, entering a mawk program on the command line 
using command.com as your shell is an exercise in futility, so under 
MsDOS the command syntax is 

	mawk [-Fs] optional_list_of_files

You'll get a prompt, and then type in the program.  The -f option works 
as before.  

If you use a DOS shell that gives you a Unix style command line, to use 
it you'll need to provide a C function reargv() that retrieves argc and 
argv[] from your shell.  The details are in msdos/INSTALL.  

Some features are missing from the DOS version of mawk: No system(), and
no input or output pipes.  To provide a hook to stderr, I've added 

	errmsg( "string" )

which prints "string\n" to stderr which will be the console and only the
console under command.com.  A better solution would be to associate a 
file with handle 2, so print and printf would be available.  Consider 
the errmsg() feature as temporary.  

For compatibility with Unix, CR are silently stripped from input and LF 
silently become CRLF on output.  

WARNING: If you write an infinite loop that does not print to the 
screen, then you will have to reboot.  For example 

	x = 1 
	while( x < 10 )  A[x] = x
	x++

By mistake the x++ is outside the loop.  What you need to do is type 
control break and the keyboard hardware will generate an interrupt and 
the operating system will service that interrupt and terminate your 
program, but unfortunately MsDOS does not have such a feature.  

14. Bugs

Currently mawk cannot handle \0 (NUL) characters in input files 
otherwise mawk is 8 bit clean.  Also "a\0b", doesn't work right -- you 
get "a".  You can't use \0 in regular expressions either.  

   printf "A string%c more string\n" , 0

does work, but more by luck than design since it doesn't work with 
sprintf().  

15. Releases

This release is version 0.97.  After a reasonable period of time, any 
bugs that appear will be fixed, and this release will become version 
1.0.  

Evidently features have been added to awk by Aho, Kernighan and 
Weinberger since the 1988 release of the AWK book.  Version 1.1 will add
whatever features are necessary to remain compatible with the language 
as defined by its designers.  

After that ...  ? 

16. Correspondence

Send bug reports or other correspondence to

Mike Brennan
brennan at bcsaic.boeing.com

If you have some interesting awk programs, contributions to the examples
directory would be appreciated.  
@//E*O*F mawk0.97/mawk.manual//
chmod u=rw,g=r,o=r mawk0.97/mawk.manual

echo x - mawk0.97/array.c
sed 's/^@//' > "mawk0.97/array.c" <<'@//E*O*F mawk0.97/array.c//'

/********************************************
array.c
copyright 1991, Michael D. Brennan

This is a source file for mawk, an implementation of
the Awk programming language as defined in
Aho, Kernighan and Weinberger, The AWK Programming Language,
Addison-Wesley, 1988.