ap
Tim Long
tml at extro.ucc.su.OZ.AU
Wed Jun 12 03:39:07 AEST 1991
I read Byron's comments on perl and awk with some sympathy.
I have had thoughts along similar, although not identical, lines for
some time. By coincidence, I just designed and implemented a language to
address similar issues; which I would be grateful to here peoples
opinion on. But first I'll just mention my own motivations:
1) To have a freely available general purpose interpretive language on
UNIX systems. (As opposed to the many more special purpose ones such
as awk and the shell). This can be re-phrased as: To have a UNIX
language like DOS has BASIC.
2) To have a freely available language suitable for embedding in other
programs and systems.
3) To allow programming on UNIX systems which do not have development
systems (which are becoming very common).
So I guess the design spec was to make a freely available general
purpose language suitable both for system supported, and embedded use.
By embeded use I mean both within stand-alone devices (like PostScript)
and as an adjunct to applications. The source is arranged to be ammenable
to this.
Although I have been brooding on it for some time I have only actually
done it in the last month. I'm reasonably happy with the result at this
stage but welcome comment. There is a preliminary manual entry which
describes the language, but it's just a manual entry. I'll try to give
some more background here.
The language, which I am calling ICI for the time being, has dynamic
typing and object management, with all the flavour (flow control constructs,
operators and syntax) of C. You can write very C like code, if you wish
(pointers work), but you can take advantage of the more flexible data
handling to make things a lot easier.
I have tried to keep the design carefully divided into the language
and its fundamental functions and then other groups of functions which
relate to the operating environment. Naturally the UNIX shell level
version has almost all of these included.
I could try to convey the nature of the language here, but it
is probably better just to skim the manual entry. So I'll include
it here and continue the general discussion after that. Its
about 14 pages, but you can start skipping after you get to the
standard functions (it finishes after the next line of minuses)...
----------------------------------------------------------------------
ICI(1) ICI(1)
NAME
ici - General purpose interpretive programming language
SYNOPSIS
ici [ file ] [ -f file ] [ -i prog ] [ -digit ] [ -l lib ] [ args... ]
DESCRIPTION
Ici parses ICI program modules as indicated by its
arguments. They may or may not cause code to execute as
they are parsed. But after the modules have been read, if
main is defined as an external function it will be called
with the otherwise unused arguments (as an integer count and
a pointer to the first element of an array of strings).
The options are:
file If the first argument does not start with a hyphen
it is taken to be a program module as if specified
with the -f flag. This may be used to allows ICI
programs to execute directly with the #! facility.
-f file The file is parsed as an ICI module.
-i prog The prog argument is parsed directly as an ICI
module.
-digit An ICI module is read from the file descriptor
digit.
-l lib An ICI module is read from $ICILIB/liblib.ici. If
ICILIB is not defined as an environment variables,
/usr/ici will be used.
other Any argument not listed above is gathered into the
arguments which will be available to the program.
-- All further arguments are gathered into the
arguments which will be available to the program.
Note that argument parsing is two pass, all the "unused"
arguments are determined and assigned to argc and argv
before the first module is parsed.
If an error occurs which is not dealt with by the program
itself, a suitable error message will be printed and ici
will exit.
The remainder of this manual entry is a brief description of
the language.
OVERVIEW
ICI has dynamic typing and flexible data types with the flow
control constructs and operators of C. It is designed to
allow all types of programs to be written without the
programmer having to take responsibility for memory
management and error handling. There are standard functions
to provided the sort of support provided by the standard I/O
and the C libraries, as well as additional types and
functions to support common needs such as simple data bases
and character based screen handling.
A programmer familiar with C should be able to write ICI
programs after reading this document.
STATEMENTS
An ICI source module consists of a sequence of statements.
Statements may be any of the following:
expression ;
compound-statement
if ( expression ) statement
if ( expression ) statement else statement
while ( expression ) statement
do statement while ( expression ) ;
for ( exp(opt) ; exp(opt) ; exp(opt) ) statement
switch ( expression ) compound-statement
case constant-expression :
default :
break expression(opt) ;
continue expression(opt) ;
return expression(opt) ;
;
storage-class ident function-body
storage-class decl-list ;
In contrast to C, all statement forms are allowed at all
scopes. But in order to distinguish declarations and
function definitions from ordinary expressions, the storage
class (extern, static or auto) is compulsory.
There is no goto statement, but break and continue
statements may have an optional expression signifying how
many levels to effect. (Not in this version.)
The term constant-expression above refers to an expression
that is evaluated exactly once, at parse time. In other
respects it is unrestricted, it may call functions and have
side-effects.
Switch statements must be followed by a compound statement,
not just any statement as in C. Furthermore, each case-
label and the default must label statements at the top level
of this compound statement.
OBJECTS AND LVALUES
In ICI objects are dissociated from the storage locations
(variables, for instance) which refer to them. That is, any
place which stores a value actually stores a reference to
the value. The value itself, whether it is a simple integer
or a large structure, has an independent existence. The
type of an object is associated with the value, not with any
storage locations which may be referring to it. Thus ICI
variables are dynamically typed. The separation of storage
location and value is transparent in most situations, but in
some ways is distinguishable from the case in a language
such as C where an object is isomorphic with the storage it
occupies.
ICI assignment and function argument passing does not
transfer a copy of an object, but transfers a reference to
the object (that is, the new variable refers to the same
object). Thus it is straight forward to have two variables
referring to the same object; but this does not mean that
assigning to one effects the value of the other.
Assignment, even in its most heavily disguised forms, always
assigns a new object to a storage location. (Even an
operation such a "++i" makes the variable "i" refer to the
object whos value is one larger than the object which it
previously referred to.)
The normal storage locations are the elements of arrays and
structures. Simple variables are actually structure
elements, although this is not apparent in everyday
programming.
Some object types are "atomic" (scalar), that is their
internal structure is not modifiable. Atomic data types
have the property that all objects with the same value are
in fact the same object. Integers, floating point numbers,
strings and functions are atomic by nature. The only
standard non-atomic data types are arrays and structures.
An atomic (constant) version of any aggregate type (array or
structure) can be obtained. Several of the intrinsicly
atomic types do allow read-only access to their interior
through indexes, structure keys or pointers. (Strings for
example allow indexing to obtain one character sub-strings.)
TYPES
Each of the following paragraphs is tagged with the internal
name of the type, as returned by the typeof() function:
int Integers are 32 bit signed integers. All the usual C
integer operations work on them. When they are
combined with a float, a promoted value is used in the
usual C style. Integers are atomic.
float
All floating point is carried out in the host machine's
double precision format. All the usual C floating
point operations work. Floats are atomic.
string
Strings are atomic sequences of characters. Strings
may be indexed and have the address taken of internal
elements. The value of fetching a sub-element of a
string is the one character string at that position
unless the index is outside the bounds of the string,
in which case the result is the empty string. The
first character of a string has index 0.
Strings may be used with comparison operators, addition
operators (which concatenate) and regular expression
matching operators. The standard function sprintf is a
good way of generating and formatting strings from
mixed data.
NULL The NULL type only has one value, NULL (the same name
as the type). The NULL value is the general undefined
value. Anything uninitialised is generally NULL.
array
Arrays always start at 0 but extend to positive indexes
dynamically as elements are written to them. A read of
any element either not yet assigned to or outside the
bounds of the array will produce NULL. A write to
negative indexes will produce an error, while a write
to positive indexes will extend the array. Note that
arrays do not attract an implicit ampersand as in C.
Use &a[0] to obtain a pointer to the first element of
an array "a".
The function array() and array constants (see below)
can be used to create new arrays.
struct
Structures are collections of storage locations named
by arbitrary keys. Structures acquire storage
locations and member names as they are assigned to.
Elements which do not exist read as NULL. Pointers may
be taken to any member, but pointer arithmetic is only
possible amongst element names which are simple
integers.
Note that normal structure dereferencing with
struct.member is as per C, and the member name is a
string. Member names which are determined at run time
may be specified by enclosing the key in brackets as
per: struct.(expr), in which case the key may be any
object (derived from any expression). Thus
struct.("mem" + "ber") is the same as struct.member. An
"index" may also be used, as per: struct[expr], and has
the same meaning as struct.(expr). (This is true in
general, all data types which allow any indexing of
their internal structure operate through the same
mechanism and these are only notational variations.)
The function struct() and structure constants (see
below) can be used to create new structures.
From a theoretical standpoint structures are a more
general type than arrays. But in practice arrays have
some properties structures do not (intrinsic order,
length and different concatenation semantics, as well
as less storage overhead).
Note that by ignoring the value associated with a key,
structures are sets (and addition performs set union,
see below).
ptr Pointers point to places where things are stored, but a
pointer may be taken to any object and a nameless
storage location will be fabricated if necessary. They
allow all the usual C operations. Pointer arithmetic
works as long as the pointer points to an aggregate
element which is indexed by an integer (for instance
all elements of arrays, and amongst structure elements
which have integer keys). Pointers are atomic.
Note that pointers point to a storage location, not to
the value of an object itself. Thus if "a" is an
array, after "p = &a;", the expression "*p" will have
the same value as "a" even if "a" becomes a structure
(through assignment).
Note that it is not possible to generate pointers which
are in any way illegal or dangling. Also note that
because assignment and argument passing does not copy
values, pointers are not required as often as they are
in C.
func Functions are the result of a function declaration and
function constants. They are generally only applicable
to the function call operation and equality testing.
They do not attract an implicit ampersand as in C.
Functions are atomic. (Code fragments within functions
are also atomic and thus shared amongst all functions.)
regexp
Regular expressions are atomic items produced by either
regular expression constants (see below) or compiled at
run-time from a string. They are applicable to the
regular expression comparison operators described
below.
file Files are returned and used by some of the standard
functions. See below.
window
Windows are produced and used by some of the standard
functions. See below.
Other types (pc, catch, mark, op, module and src) are used
internally and are not likely to be encountered in ordinary
programming.
LEXICON
Lexicon is as per C, although there is no preprocessor yet,
with the following additions:
Adjacent string constants separated only by white space form
one concatenated string literal (as per ANSI C).
The sequence of a "#" character (not on the start of line),
followed by any character except a newline up to the next
"#" is a compiled regular expression.
The sequences !~, ~~, ~~=, ~~~, $, @, [{, }], [<, and >] are
new tokens.
The names NULL and onerror are keywords.
EXPRESSIONS
Expressions are full C expressions (with standard precedence
and associativity) with some additions. The overall syntax
of an expression is:
expression:
primary
prefix-unary expression
expression postfix-unary
expression binop expression
primary:
NULL
int-literal
float-literal
char-literal
string-literal
regular-expression
[ expression-list ]
[< assignment-list >]
[{ function-body }]
ident
( expression )
primary ( expression-list(opt) )
primary [ expression ]
primary . struct-key
primary -> struct-key
struct-key:
ident
( expression )
prefix-unary:
* & + - ! ~ ++ -- $ @
postfix-unary:
++ --
binop:
* / % + - >> << < > <= >=
== != ~ !~ ~~ ~~~ & ^ | && || : ?
= += -= *= /= %= >>= <<= &= ^= |= ~~=
,
expression-list:
expression
expression , expression-list
assignment-list:
assignment
assignment , assignment-list
assignment:
struct-key = expression
The effect and properties of various expression elements are
discussed in groups below:
simple constants
integers and floats are recognised and interpreted as
they are in C. Character literals (such as 'a') have
the same meaning as in C (ie. they are integers, not
characters). String literals have the same lexicon as
C except that they produce strings (see Types above).
Both character and string literals allow the additional
ANSI C backslash escapes (\e \v \a \? \xhh) Regular
expressions are those of ed(1).
complex constants
[ expression-list ]
[< assignment-list >]
[{ function-body }]
Because variables are intrinsically typeless it is
necessary that initialisers, even of aggregates, be
completely self-describing. This is one of the reasons
these forms of constants have been introduced. The
first is an array initialised to the given values, the
second is a structure with the given keys initialised
to the given values. The third is a function. The
values in the first two are all computed as constant
expressions (not meaning that they are made atomic or
may only contain constants, just that they are computed
once when they are first parsed).
primary ( expression-list(opt) )
Function calls have the usual semantics. But if there
are more actual parameters than there are formal
parameters in the function's definition, and the
function has an auto variable called "vargs", the
remaining actual parameters will be formed into an
array and assigned to this variable. If there is no
excess of actual parameters any "vargs" variable will
be undisturbed, in particular, any initialisation it
has will be effective.
prefix-unary (* & + - ! ~ ++ -- $ @)
Apart from "$" and "@", the prefix unary operators have
the same meaning as they do in C. The "*" operator
requires a ptr as an argument. The "-" operator
requires an int or float. "!" and "~" require ints.
"++" and "--" work with any values which can be placed
on the left of a "+ 1" or "- 1" operation (see below).
The rest ("&", "+", "$", "@") work with any types. A
"+" always has no effect. If the operand of an "&" is
not an lvalue in the usual sense, a one element array
will be fabricated to hold the value and a pointer to
this element will result. The "$" operator causes the
effected expression to evaluated at parse time (thus
"$sin(0.5)" will cause the value to be computed once no
matter how many times the term is used). The "@"
operator returns the "atomic" form of an object. This
is a no-op for simple types. When applied to an
aggregate the result is a read-only version of the
same, which will be the same object as all other atomic
forms of equal aggregates (as per ==).
regular expression matches (~ !~ ~~ ~~= ~~~)
These binary operators perform regular expression
matches. In all cases one operand must be a string and
the other a regular expression. The operator ~
performs the match and returns 1 or 0 depending whether
the string did, or didn't match the expression.
Likewise for !~ with opposite values.
The operator ~~ matches the string and regular
expression and returns the portion of the string
matched by the \(...\) enclosed portion of the regular
expression, or NULL if the match failed. The ~~=
operator is the equivalent assignment operator and
follows the usual rules.
The ~~~ operator matches the string and the regular
expression and returns an array of the portions of the
string matched by the \(...\) portions of the regular
expression, or NULL if the match failed. (This may
move to a function.)
assignment operators
As previously mentioned, assignment always sets a
storage location to a new object. The old value is
irrelevant (although it may have been used in the
process of a compound assignment operator). Thus there
is no implicit cast on assignment, so assigning an int
to what is currently a float will result in an int.
Assigning to a currently unknown variable will
implicitly declare the variable as static.
other binary operators
The usual C binary operators work as they do in C and
on the same range of types. In addition:
The == and != operators work on all types. Arrays and
structures are equal if they contain the same objects
in the same positions.
The + and += operators will concatenate string, arrays
and structures (in the last case, where identical keys
occur the values of the right hand operand take
precedence).
The << and <<= operator will shift an array, loosing
elements from the front and shortening the array as a
whole.
The <, >, <=, >= operators work on strings, making
lexical comparisons.
VARIABLES, SCOPES AND INITIALISERS
There are exactly three levels of scope. Extern (visible
globally by all code), static (visible by code in the
module), and auto (visible by code in the function). The
variables in the first two are persistent and static. Auto
variables have a fresh instantiation created each time a
function is entered, and lost on exit (unless there are
references to them). Implicitly declared variabled are
static.
All types of declarations may occur anywhere, they are
simple statements unlike in C. They have their effect
entirely at parse time and thus produce no code. But the
rules about scope still apply. No matter where an extern
declaration is made, once it is parsed that variable is
visible globally. Similarly once an auto declaration is
parsed that variable is visible throughout the scope of the
function.
Note that initialisers are constant expressions. They are
evaluated once at parse time. Even initialisers of autos.
Every time a set of auto variables is instantiated (by
function entry) the variables are set to these initial
values, NULL if there is no initialiser.
STANDARD FUNCTIONS
The following functions form part of the language definition
and should be present in all implementations, including
embedded systems.
call(func, array)
Calls the function with arguments taken from the array.
Thus the statement call(func, ["a", "b"]); is
equivalent to func("a", "b");. Returns the return
value of the function.
array(...)
Returns a new array formed from the arguments, of which
there may be any number, including zero.
struct([key, value...])
Returns a structure initialised with the paired keys
and values given as arguments, of which there may be
any even number, including zero.
string = sprintf(format, args...)
Returns a sting formatted as per printf(3S) from the
format and arguments. All flags and conversions are
supported up to System 5.3's. The new ANSI n and p
conversions are not provided. Precision and field
width * specifications are allowed. Type checking is
strict.
copy(any)
Returns a copy of its argument. A null operation for
all types except arrays and structures. To simulate
C's structure assignment use: "a = copy(b)" in place of
"a = b". Note that this is a "top level" copy. Sub-
aggregates are the same sub-aggregates in the copy as
in the original.
eval(any)
Evaluates its argument in the current scope. This is a
null operation for any type except strings. For these
it will return the value of the variable of that name
as looked up in the current scope.
exit(int)
Exits with the given status.
fail(str)
Generates a failure with the given message (see Error
handling above).
float(any)
Returns a floating point interpretation of its argument
(an int, string or float else it will return 0.0).
int(any)
Returns an integer interpretation of its argument (a
float, string or int else it will return 0).
string(any)
Returns a string interpretation of its argument (an
int, float or string, else it will return the type name
in angle brackets).
typeof(any)
Returns the type name of an object (a string).
parse(file/string [,module])
Parses the file or string in a new module, or the
context of the given module if supplied.
regexp(string)
Return the regular expression compiled from the string.
sizeof(any)
Return the number of elements the object has (Ie.
elements of an array or key/value pairs in a struct or
characters in a string, returns 1 for all other types).
push(array, any)
Adds the object to the end of the array, extending it
in the process.
pop(array)
Return the last object in the array and shortens the
array by one in the process. It will return NULL is
the array is empty already.
keys(struct)
Returns an array of the keys (ie. member names) of the
struct.
smash(string1, string2)
Returns an array of sub strings from string1 which were
delimited by the first character from string2.
str = subst(string1, regexp, string2 [, flag])
(Coming soon.) Returns a copy of string1 with sections
that matched regexp replaced by string2, globally if
flag is given as 1.
str = tochar(int)
Retuns a one character string made from the integer
character code.
int = toint(str)
Return the character code of the first character of the
string.
int = rand([int])
Returns a pseudo-random number in the range 0 .. 2^15 -
1. If an argument is supplied this is used to seed the
random number generator.
string/array = interval(string/array, start [,len])
Returns the interval of the string or array starting at
index start an continuing till the end or len elements
if len is supplied. Interval extraction outside the
bounds of the object will merely leave out the absent
elements.
array = explode(string)
Return an array of the integer character codes of the
characters in the string.
string = implode(array)
Returns a string formed from the concatenation of the
integer character codes and strings found in the array.
Objects of other types are ignored.
file = sopen(string, mode)
Returns a file (read only) which when read will return
successive characters from the string.
module = module(string)
Return a new module with its name taken from the string
argument.
obj = waitfor(obj...)
Blocks (waits) until an event indicated by any of its
arguments occurs, then returns that argument. The
interpretation of an event depends on the nature of
each argument. A file argument is triggered when input
is available on the file. A float argument waits for
that many seconds to expire, an int for that many
millisecond (they then return 0, not the argument
given). Other interpretations are implementation
dependent. Where several events occur simultaneously,
the first as listed in the arguments will be returned.
Note that in implementations that support many basic
file types, some file types may always appear ready for
input, despite the fact they are not.
unixfuncs()
When first called, will define as external functions
the unix system interface functions described below (if
available). Subsequent calls are ignored.
vstack()
Return a copy of the variable (scope) stack. Index 0
is the outermost scope. It will contain functions,
each optionally followed by a structure of the local
variables. (Only for debuggers obviously.)
STANDARD EXTERNAL VARIABLES
externs
A structure of all the extern variables.
argc A count of the otherwise unused arguments to the
interpreter.
argv An array of strings, which are the otherwise unused
arguments to the interpreter. (Note this is different
from the argument to main, which is a pointer to the
first element of this array as it is in C. It is
probably easier to use the globals in general.)
stdin
Standard input.
stdout
Standard output.
stderr
Standard error output.
OTHER FUNCTIONS
The following functions will be present on systems where the
environment permits. Missing file arguments are interpreted
as standard input or output as appropriate. Pretty obvious,
but more details latter.
printf(fmt, args...)
fprintf(file, fmt, args...)
file = fopen(name, mode)
file = popen(cmd, mode) /* UNIX only. */
status = system(cmd)
str = getchar([file])
str = getline([file])
str = getfile([file])
put(str [,file])
fflush([file])
fclose(file)
UNIX FUNCTIONS
The following functions will be available on UNIX systems or
systems that can mimic UNIX. See unixfuncs() above. They
all return an integer. On failure they raise a failure with
the error set to the appropriate system error message
derived from errno. These interfaces are raw. Use at your
own risk.
access(), alarm(), acct(), alarm(), chdir(),
chmod(), chown(), chroot(), close(), creat(),
dup(), _exit(), fork(), getpid(), getpgrp(),
getppid(), getuid(), geteuid(), getgid(), getegid(),
kill(), link(), lseek(), mkdir(), mknod(),
nice(), open(), pause(), rmdir(), setpgrp(),
setuid(), setgid(), setgid(), signal(), sync(),
ulimit(), umask(), umask(), unlink(),
clock(), system(), lockf(), sleep(),
/* Rest on the way. */
DATA BASE FUNCTIONS
Simple non-indexed, but otherwise fully locked and
functional data base support. Not for speed. If your
application needs a serious data base, get one, don't use
this. Use this for configuration info and all that
peripheral stuff.
The array's are arrays of strings, which are the fields of a
record. The "keyfieldno" is which field number of the
record is the key for this operation. The "dbname" is a
file name, one table per file. It will be created if it
does not exists, but an empty file is ok too. Use UNIX
permissions for access control. Read access on read-only
files is ok. db_get() returns NULL if not found. More
details later.
array = db_get(dbname, keyfieldno, value)
array = db_delete(dbname, keyfieldno, value) /* Returns old data. */
db_set(dbname, keyfieldno, array)
db_add(dbname, array)
WINDOWS
Upon first reference to any of the window routines standard
input is placed in the appropriate modes for non-echoing
character at a time input. All input from the terminal
should be fetched with w_getchar() and w_edit(). Upon exit
(including interrupt) all modes will be restored.
win = w_push(line, col, nlines, ncols)
Pushes an opaque rectangular window on the screen at
the given line and col, which are in screen
coordinates. But special values of -1 or -2 for line
or col indicate centering or right justification
(bottom justification for line) for that aspect of the
position. The window will have the given number of
lines and columns, unless line or col are less than or
equal to zero, in which case they will be that much
less than the full screen size. The window is
initially clear and on top of all previous windows.
w_pop(win)
"Pops" the window from the screen; re-exposing anything
which the window was hiding. Any window may be popped
from the screen, whether it is the top window or not.
After a window has been popped it is dead and cannot be
put back. Make a new window to do this. Note that if
a window is not referenced it will get popped when the
next garbage collection occurs, but windows should
always be popped explicitly.
w_paint(win, line, col, text [,tabs])
Paints the text on the window at the given line and
column (in the window's space), with auto-indent on
subsequent lines (indicated by a \n character in the
text).
A string tab specification reminisent of troff (and
most word processors) may be given. If supplied it
must be a concatenation of tab-specs. Each tab-spec
consists of an optional "+" character, followed by a
decimal number, followed by an optional leader
character, followed one of the letters "L", "C" or "R".
If the "+" is supplied the tab position is at a
relative offset from the previous one, else it is an
distance from the left margin of this text block. If a
leader character is given the distance between the
current column and the start of the next text will be
filled with that character, else a direct motion will
be used (use an explicit space leader to clear an
area). If an "L" tab is set, the next field of text
will start at the tab stop, if a "C" tab is set the
next field of text will be centered on the tab stop,
and if an "R" tab is set the next field of text will
end on the tab stop. The "next field of text" is the
text after the tab character up to the next tab,
newline or end of string.
The last tab-spec in the string will be used
repeatedly. Scanning of the tabs starts again on each
new line. If no tab specification is given multiple-
of-8 column tabs are used, but relative to the start
position.
For example, a three part title in an 80 column window
could be painted with the tab spec "40C80R".
win = w_textwin(line, col, text [,tabs])
Pushes a window in the same manner as w_push() (with
the same interpretation of line and col) of just
sufficient size to hold the given text as it is set by
w_paint() with a box around it. It is allowable for
column positions in the text being set to have negative
numbers during the sizeing phase of this operation.
w_mesg(str)
Pushes a boxed one line window centred at the bottom of
the screen and containing the string. It will be
automatically removed after the next keystroke.
w_cursorat(win, line, col)
Sets the cursor position for this window (in the
window's space). When the window is the top window on
the screen, the real screen cursor will be at this
position.
str = w_getchar()
Returns the next character from the terminal, without
echo and without canonical input processing. For
ordinary ASCII characters a one character string is
returned. For special keys an appropriate multi
character string is returned (currently "F0", "F1" ...
"F32", "LEFT", "RIGHT", "UP", "DOWN", "HOME", "END",
"PGUP", "PGDOWN").
The screen is refreshed before the waiting for user
input.
w_ungetchar(str)
Pushes a character back. Only one character of push-
back is allowed. Only the first 16 characters of the
string will be significant (all "characters" returned
by w_getchar() are shorter than this).
str = w_edit(win, line, col, width, str)
Allows traditional editing of an input field at the
given position and width and initially containing the
given string. Editing will proceed until any unusual
character is pressed (that is, not a printing ASCII
character or one of the field editing keys such as
backspace). At that point the character which caused
termination will be pushed back on the input stream and
the current text of the field returned. The next call
to w_getchar() will return the key which terminated
editing.
w_box(win)
Draws a box around the inside edge of the window.
w_clear(win)
w_refresh()
w_suspend()
Restores the terminal to normal modes and moves the
cursor to the bottom left. The next window operation
will revive the screen.
EXAMPLES
The following shell command line will print Hello world.
ici -p 'printf("Hello world.\n");'
The following program prints the basename of its argument:
#!ici
printf("%s0, argv[1] ~~ #\([^/]*\)$#);
The following example illustrates a simple grep like
program. The first line makes a Bourne shell pump the
program in through file descriptor 3, and passes any
arguments to the shell script on to the ICI program. File
descriptor 3 is used to avoid disturbing the standard input.
This works on all UNIX's but of course 4.2+ and 5.4+ can use
#! stuff. Note that errors (such as those encountered upon
failure to open a file) are not checked for. The program
can be expected to exit with an appropriate message should
they occur.
exec ici -3 -- "$0" "$@" 3<<'!'
extern
main(argc, argv)
{
if (argc < 2)
fail(sprintf("usage: %s pattern [files...]", argv[0]));
pattern = regexp(argv[1]);
if (argc == 2)
grep("", stdin);
else
{
for (i = 2; i < argc; ++i)
grep(sprintf("%s:", argv[i]), fopen(argv[i], "r"));
}
}
static
grep(prefix, file)
{
while ((s = getline(file)) != NULL)
{
if (s ~ pattern)
printf("%s%s\n", prefix, s);
}
if (file != stdin)
fclose(file);
}
SEE ALSO
awk(1), ed(1), printf(3S), etc.
BUGS
There is a problem with the right-associativity of ? :
stuff. Use brackets when combining multiple ? : operators
for the time being.
There is an occasional problem with the screen updating with
multiple windows.
A && or || expression may not result in exactly 0/1 if it
gets to the last clause.
AUTHOR
Tim Long, May '91.
----------------------------------------------------------------------
Returning to the general; My intention was not to replace any of the
special purpose tools like the shell, awk, sed etc, nor was it to
make a replacement for real programming languages like C. Rather, I
regard it as a casual programming tool filling much the same niche as
BASIC. As such it doesn't have specific language features dedicated
to special tasks (like doing something for each line of input text).
But it does (or will) have a broad base of simple primitives to make
most routine tasks easy. And of course it is extensible. But you will
notice that almost none of its "library" features are the ultimate
expression of that area of software technology.
In practice every major application has some principle, or piece of
software technology, or bit of hardware which is its reason
for existence as a product. But products can't run on one leg.
Inevitably the endless series of tack-on bits has to be supplied.
usually with a great deal of re-invention taking place. I have thought
of ICI as assisting in that area. The theory is that if something is
a major focus of an application, you won't be using these dicky little
features to doit. But for all those other bits, which aren't your
real business, you can just use the stuff provided and hack up the
rest in a somewhat more amenable programming environment than raw C.
Getting back to the language itself...
You can easily see from the above how it is like C. What is probably
not so obvious is how it is not like C. Here is a grab bag of things
to convey some of the flavour.
A lot of the usual messing around with strings can be handled
by the regular expression operators. The ~~= operator is particularly
usful. For example, to reduce a string s which holds a file name
to its basename:
s ~~= #\([^/]*\)$#;
I know it looks a bit insane, but regular expression are
like that. I'm not going to apologise for using # rather than /
to delimit regular expressions. It was necessary to avoid lexical
ambiguity and you get used to it in no time.
I don't seem to have written the bit in the manual on error handling.
I'll quickly describe it here. The actual syntax of a compound
statement is:
compound-statement:
{ statement(rpt) }
{ statement(rpt) } onerror statement
In other words compound statements may have an optional "onerror"
followed by a statement. Errors work on the principle that the lower
levels of a program know what happened, but the higher levels know
what to do about it. When an error occurs, either raised by the
interpreter because of something the program did or explicitly
by the program, an error message is generated and stored in
the global variable "error".
The execution stack is then automatically unwound until an onerror
clause is found, and execution resumes there. The unwinding will unwind
past function calls, recursive calls to the interpreter (through
the parse function) etc.
If there is no onerror clause in the scope of the execution, the main
interpreter loop will bounce the error message all the way out to the
invoking system. In the UNIX case this will print the message along
with the the source file name, the function name and the line number (which
is also available).
Although the manual entry doesn't go into that sort of detail it is
important to know what things raise errors in what circumstances.
But the basic philosophy is that the casual programmer can just
ignore the possibility of errors (like failure to open a file)
and expect the finished program to exit with a suitable message when
things go wrong. The grep program given in the manual is an example
of this. One error is checked for explicitly so it can give its
own usage message, but failures to open files or syntactically incorrect
regular expressions are allowed to fall out naturally.
I seem to be wandering a bit here, back to some examples...
Functions are of course just another datum. A function called
"fred" is just a variable which has been assigned a function.
You could re-define the getchar function (even though it is an
intrinsic function coded in C) with either:
extern
getchar()
{
return tochar(rand() % 256);
}
OR
extern getchar = [{(){return tochar(rand() % 256);}}];
The second it a little perverse, but function constants make more sense
in examples like:
sort(stuff, [{(a, b){return a < b ? -1 : (a > b ? 1 : 0);}}]);
Where the sort comparison function is given in-line so you don't have
to go chasing all over the code to find the two line function.
(There is a growing library which contains functions like sort, but
it is not in a fit state for discussion yet.)
They also make more sense when doing object oriented stuff.
Suppose you want to define a set of methods in a type. You can just
assign the functions directly into the type with:
static type = struct();
type.add =
[{
(a, b)
{
return ....;
}
}];
type.sub =
[{
(a, b)
{
return ....;
}
}];
Or you could build it in one hit like:
type =
[<
add =
[{
(a, b)
{
return ....;
}
}],
sub =
[{
(a, b)
{
return ....;
}
}],
>];
The variable argument support handles all possibilities. One nice
example of it's use comes from the way libraries are done.
Because code is parsed at run-time, you don't want have to parse
thousands of lines of libraries for every one line program. Instead,
a library will just define stub functions, which invoke a standard
(library) function called autoload(). They look like this:
extern sort() {auto vargs; return autoload("sort", sort, vargs);}
Because the function has an auto variable called "vargs", any
unused arguments (ie. all of them) are assigned to it. These are
than passed on to autoload. The arguments to autoload are a
file name (it will prefix it with the standard lib dir), the function
being re-defined and the arguments. It will parse the file, check
that it redefined the function and then call it with the arguments.
>From then on of course the new function is defined and the old one gets
garbage collected like all lost data. The loaded file could define
several functions, and any autoload definition they have will also
be replaced a the same time. The current version of autoload looks
like this:
/*
* Parse the given file and transfer control to the newly loaded version
* of the function as if that was what was called in the first place.
* A loaded file can define more than one function. They will all
* be replaced on the first load. See examples below.
*/
extern
autoload(file, func, args)
{
auto stream;
file = "/usr/ici/" + file;
parse(stream = fopen(file, "r"));
fclose(stream);
if (func == eval(func.name))
fail(sprintf("function %s was not found in %s", func.name, file));
return call(eval(func.name), args);
}
Notice that it references a sub-field of the function like a structure
field. This is something that the manual entry doesn't go into details
about but you can do things like that. A function, for instance, has
sub fields of: "name" a name for the function (for simple declarations
this is the name the function was first declared as), "args" an atomic
array of the declared formal parameters, "autos" an atomic struct of all
the autos and their initial values, and there are a few other fields too.
Also notice how it uses the "eval" function to check the value of a
variable whos name is determined at run time, and then its use of the
call function to call a function with a run-time determined variable
argument list.
Again notice that it doesn't need to worry about any errors except
those it wants to check for explicitly. The others will happen
correctly automatically. This one feature can save a lot of code.
The sequence of operations on function entry is very deliberate and
you can do some neat things with it. In particular, formal parameters
are just auto variables which are initialised with the corresponding
actual parameter. But they are initialised with this after the
explicit initialisations have been applied. Thus you can use an
explicit initialisation to give a default value to an argument which
is optional, without messing about with the "vargs" variable.
For example:
static
getstuff(file)
{
auto file = stdin;
....
}
Structure keys (and switch statements which use a struct) work on the
key being the same object as the tag. Thus switching on strings, ints,
floats, functions etc. is fine. But you can also use aggregate keys by
always using atomic versions of them:
switch (@array(this, that))
{
case @["one thing", "the other"]:
...
case @[1, 2]:
...
case @[x, y]:
...
}
You will notice that because things refer to other things, rather
than actually holding them, you use pointers far less often than you
do in C. In fact you can start to treat structured data types in a
much more casual fashion.
I have hardly scratched the surface here, but this is getting a bit long
so I'll terminate this section.
A few practicalities: on my 386 the initial load image (text+data)
comes in at around 110K (85K text + 25K data, of which a disconcerting
amount comes from curses, even though all I want it to do is read
a terminfo entry. After that, time and space is as proportional to
the needs of the program as I could make it (These sort of
interprerative languages often have nasty non-linear time or space
performance characteristics due to garbage collection and stuff.
I have tried to be careful to avoid this sort of behaviour.)
For some tasks memory use can be better than expected, because of
object sharing...
Memory is only needed to hold distinct atomic objects, so although
technically there are reasonable memory overheads for, say, an integer,
in practice most programs don't have very many distinct integers at any
given point in time. After the first instance of a given number you are
only paying the overhead of the storage location which refers to each
additional reference, which is 4 bytes for array elements and 8 bytes
for structure elements.
In fact it can happen that large arrays of floating point numbers
(which are 8 bytes each) can occupy less space than you would at first
expect. I have been thinking of shifting integers to 64 bits, because
there would be no overhead in memory use (they already use the same size data
block as floats) and I suspect the performance loss would be marginal.
But more to the point, 32 bits is just not enough. (A set of good portable
64 bit routines will be gratefully accepted.)
I think I have mentioned that it also designed for embedded systems.
This means that:
a) It is easy to link the interpreter into other C programs, there
are as few external symbols as I could manage and it uses just a few
classic library functions.
b) It is easy to write intrinsic functions (ie. functions written in
C which can be called from ICI code).
c) It is easy to call ICI functions from C (although at the moment there
is slightly more overhead than the inverse direction).
d) Where necessary, additional types can be introduced without disturbing
the rest of the interpreter. (An example of this is the character
based screen handler. It is done in a single source module with only
one reference to it (in a configuration array), yet its "window" type
integrates fully with the rest of the interpreter.)
I think this will have to do for now. I'll post the source, the manual
and some sample programs somewhere soon.
By the way, I have always regarded designing a programming language
as the height of arrogance. And I can only defend this by saying I
did it for me.
--
Tim Long
tml at extro.ucc.su.OZ.AU
More information about the Comp.unix.shell
mailing list