TRC - expert system building tool (part 3 of 8)
sources-request at panda.UUCP
sources-request at panda.UUCP
Sat Feb 8 22:22:12 AEST 1986
Mod.sources: Volume 3, Issue 111
Submitted by: ihnp4!dicomed!ndsuvax!nckary (Daniel D. Kary)
This is NOT a shell archive. Simply delete everything up to and including
the cut mark and save the result as reference.1.doc.
Dan Kary
ihnp4!dicomed!ndsuvax!nckary
-------------- cut here ---------------
The TRC Reference Manual
Daniel D. Kary
North Dakota State University
Computer Science Department
300 Minard Hall
Fargo, ND 58102
_A_B_S_T_R_A_C_T
The syntax of TRC is formally defined. The
output of TRC is elucidated.
TABLE OF CONTENTS
PART ONE - INPUT
1. INTRODUCTION
2. OVERVIEW
3. LEXICAL ELEMENTS
4. DEFINITIONS
5. SHORT TERM MEMORY
6. LONG TERM MEMORY
7. OPTIMIZER
PART TWO - OUTPUT
8. OVERVIEW
9. COMMON PROCEDURES
10. DATA OBJECTS
11. MANIPULATING THE DATA
12. TRANSLATING RULES
13. OPTIONS
APPENDICES
A. TRC GRAMMAR
B. ERROR MESSAGES
C. STYLE NOTES
D. SAMPLE PROGRAM
_1. _I_N_T_R_O_D_U_C_T_I_O_N
TRC is a programming language that is useful for build-
ing expert systems. It is presumed that the reader is fami-
liar with expert systems in general and has used at least
one expert system building tool. Some terms that are widely
- 2 -
used in describing expert systems have specific meanings
when used to describe TRC and will be defined now.
The set of situation-action rules that embody the
knowledge an expert uses to solve a problem are referred to
as Long Term Memory (LTM). The information that may vary
with each instance of the problem is referred to as Short
Term Memory (STM). The code which determines if the situa-
tion part of a rule is true will be called a pattern matcher
or a matcher. The code which determines which rule to
activate will be called an inference engine and includes
both the matcher and the LTM. The input to the TRC compiler
is called a specification.
The input to the TRC compiler is a rule based language.
The output is a set of C language files. The procedures in
the C language files output by the TRC compiler collectively
implement the inference engine. An inference engine is to
an expert system as a parser is to a compiler: it is of cen-
tral importance but it does not comprise a complete imple-
mentation. TRC does not provide code for interaction with
the user, but does permit the programmer to easily add this
code.
This document is divided into two parts and a set of
appendices. The first part presents a formal definition of
the input language with examples of each language feature.
The second part describes the output of the TRC compiler and
includes some important insight on integrating TRC generated
code with other C language code. The appendices include the
complete TRC grammar, a listing and explanation of all the
error messages that TRC might produce and a sample specifi-
cation.
PART ONE - INPUT
_2. _O_V_E_R_V_I_E_W
Every specification file consists of five sections, the
header, definitions, short term memory (data base), long
term memory (rules) and the trailer. These sections are
separated by double percent characters. The form of a full
specification is illustrated below. The header, STM and
trailer are optional so the minimum specification would con-
tain only the definitions and LTM. All of the "%%" marks
must be present in each specification file.
header
%%
definitions
%%
- 3 -
STM
%%
LTM
%%
trailer
The purpose of the header and trailer sections is to
permit the inclusion of C language code in the specifica-
tion. The header and trailer are each composed of a single
lexical element called a c-code which is defined in section
3. Separate sections are devoted to each of the remaining
parts of a TRC specification.
_3. _L_E_X_I_C_A_L _E_L_E_M_E_N_T_S
A program consists of a single file. A file is a
sequence of lexical elements composed of characters. Char-
acters may be one of these classes; (1) upper-case-letters,
(2) lower-case-letters, (3) digits, (4) special characters
(5) separators (6) embedded characters and (7) other charac-
ters.
(1) upper-case-letters
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
(2) lower-case-letters
a b c d e f g h i j k l m n o p q r s t u v w x y z
(3) digits
0 1 2 3 4 5 6 7 8 9
(4) special characters
" ( ) : ; = < > ! ^ _ $ . { } / * %
(5) separators
space tab newline
(6) embedded characters
\n embedded-newline
\t embedded-tab
\b embedded-backspace
\r embedded-carriage-return
\f embedded-form-feed
\\ embedded-back-slash
\" embedded-quote
(7) other characters
@ # & + [ ] ` ~ ' | , ?
The following names are used when referring to special characters:
Symbol Name Symbol Name
- 4 -
" quote : colon
; semicolon = equal
! exclamation ^ hat
_ underscore $ dollar
< less-than > greater-than
( left-paren ) right-paren
{ left-bracket } right-bracket
* asterisk % percent
- minus
A lexical element is a either a delimiter, an identif-
ier, an integer literal, a floating point literal, a string
literal, a comment or a c_code. In some cases a separator
is required between lexical elements, specifically when
adjacent lexical elements could be interpreted as a single
lexical element. A separator is any of space, tab or new-
line. One or more separators are permitted between any two
lexical element, before the first lexical element and after
the last lexical element.
_3._1. _D_E_L_I_M_I_T_E_R_S
A delimiter may be one of the following special charac-
ters:
: ; " . ^ - ( ) { } < >
A delimiter may be one of the following compound delim-
iters. Each compound delimiter is composed of two adjacent
special characters.
=> %% != == >= <=
The following names are used when referring to compound
delimiters:
Delimiter Name
=> arrow
%% delim
!= not-equal
== equality
>= greater-than-or-equal
>= less-than-or-equal
_3._2. _I_D_E_N_T_I_F_I_E_R_S
Identifiers are used as tokens and as reserved words.
Separators are not allowed in an identifier. The underscore
character is the only special character that may be part of
an identifier.
identifier ::= letter { underscore | letter | digit}
- 5 -
letter ::= upper-case-letter | lower-case-letter
Examples:
PENNY Get_Stuff x1
ThisOne WrZ_123 etc_
The identifiers that are reserved words are:
ADD MARK PROFILE
BACKTRACK NORECURS RECURS
DUMP NOT SAVE
EMPTY OPTIMIZE STRING
FLOAT POINTER TRACE
INT PREFIX ZERO
_3._3. _N_U_M_E_R_I_C _L_I_T_E_R_A_L_S
There are two classes of numeric literals, integer
literals and floating point literals. The presence of a
decimal point distinguishes floating point literals from
integer literals.
floating-point-literal ::= [ minus ] digits dot digits
integer-literal ::= [ minus ] digits
digits ::= digit { digit }
Example integer literals:
1 -33 187
Example floating point literals:
0.5 -3.14159 6.0
_3._4. _S_T_R_I_N_G _L_I_T_E_R_A_L_S
A string literal is formed by a sequence of characters
(possibly zero) enclosed between quote characters. Any of
the six classes of characters may be embedded in a string.
string-literal ::= quote { [ character ] } quote
Examples:
""
"\"recursion\""
"these characters can be in a string $, ! => etc_\n"
_3._5. _C_O_M_M_E_N_T_S
Comments may be included anywhere in the input file
that separators or delimiters may occur. Comments follow
the style of comments in the C language. Comments may not
be nested within comments. Any of the six classes of
- 6 -
characters may be embedded in a comment.
comment ::= slash asterisk { [ character ] } asterisk slash
Examples:
/* a simple comment */
/*
a multi-line
comment
*/
/*******************
* A Fancy Comment *
*******************/
_3._6. _C__C_O_D_E
A c_code is a fragment of C language code that is
embedded in the input file. A c_code is recognized by the
scanner as a single lexical item. The C language itself is
not parsed by TRC. A c-code may not contain a procedure or
function.
c_code ::= left-bracket { [character] | [c_code] } right-bracket
Example:
{
if(condition){
action(argument);
another_action();
}
}
_4. _D_E_F_I_N_I_T_I_O_N_S
Each entity that can be referred to by a TRC rule must
be defined in the definition section. Each entity that is
defined is called an _o_b_j_e_c_t. Objects may have numeric or
string values associated with them. These associated values
are called _e_l_e_m_e_n_t_s of the object. There are two forms of
definitions. There is a simple form for objects which have
no elements and an extended form for objects that have asso-
ciated elements.
definition ::= identifier
definition ::= identifier left-paren item-list right-paren
item-list ::= { [ item ] }
item ::= identifier colon type
type ::= INT | FLOAT | STRING | POINTER
For each object there will be an associated object
count in the output code, which represents the number of
- 7 -
objects of that type that exist at any point in time. For
each object with at least one element, there will be an
associated structure and list of objects of that type in the
output code. The set of object counts and object lists
defined in the definition section represent the STM that the
system of rules may refer to.
Each element that is defined for a given object must
have a data type specified. Strong type checking is
enforced throughout TRC. Comparisons and assignments of
elements must involve elements or literals of the same type.
There is no coercion of types. The INT, FLOAT, STRING and
POINTER types are described in section 10. The POINTER type
is a pointer to a structure of the type of the object that
contains it.
Examples:
A
b_b (This : INT)
CAST ( THAT : INT
The_Other : FLOAT)
_5. _S_H_O_R_T _T_E_R_M _M_E_M_O_R_Y
The short term memory (stm) section of the input is
where the initial state of the working memory is specified.
The intention of this section is to permit the working
memory to be initialized to some state that may be required
for each invocation of the expert system. It is also
intended to serve as a way of entering test data while the
expert system is being developed, before data entry pro-
cedures are developed.
stm ::= { [ entry ] }
entry ::= [ integer-literal ] identifier
entry ::= [ integer-literal ] identifier
left-paren { [ init-item ] } right-paren
init-item ::= identifier arrow value
value ::= integer-literal
value ::= floating-point-literal
value ::= string-literal
The short term memory section is a list of objects that
are to be entered into the working memory. If an object has
one or more elements, those elements can be initialized to
user specified values. Numeric values that are not speci-
fied are initialized to zero and string values that are not
specified are initialized to an empty string. The integer-
literal that may precede the name (identifier) of the object
specifies how many objects of that type are to be added to
working memory with the given element values, e.g.:
/* definition part */
- 8 -
A (A1 : STRING
A2 : FLOAT)
B (B1 : STRING
B2 : FLOAT)
%%
/* short term memory */
A ( A1 => "string") /* It is not necessary to
initialize all the elements
in an object.
*/
2 A ( A2 => 3.34
A1 => "thing") /* Nor is it necessary to
initialize elements in the
order they were declared.
*/
3 B /* It is not necessary to
initialize the elements
at all.
*/
%%
_6. _L_O_N_G _T_E_R_M _M_E_M_O_R_Y
Long term memory is the section where the
situation/action rules are enumerated. This section may
begin with a listing of options that are to be turned on.
All options in this section can also be specified by command
line flags. Since the syntax for the long term memory sec-
tion is more complex than for the other sections, it will be
presented in several parts.
_6._1. _O_P_T_I_O_N_S
The long term memory is composed of two sections, the
options and the rules.
ltm ::= { [option] } { rule }
option ::= ZERO | PROFILE | BACKTRACK
| DUMP | RECURS | NORECURS
| SAVE | TRACE | PREFIX identifier
_6._1._1. _Z_E_R_O
The ZERO option directs the compiler to generate a pro-
cedure that will free all the dynamic structures allocated
by TRC generated code. This feature is useful when develop-
ing inference engines that will be entered more than once.
It is often necessary to remove the 'leftovers' from a pre-
vious execution before beginning a new execution.
- 9 -
_6._1._2. _P_R_O_F_I_L_E
The PROFILE option directs the compiler to generate
code to profile the execution of the inference engine and a
procedure to print a summary of that profile. The profiling
code counts the number of times that each rule searches some
part of STM and how many times each rule is fired.
_6._1._3. _B_A_C_K_T_R_A_C_K
The BACKTRACK option directs the compiler to generate
an inference engine that will backtrack when no rule is
true. Backtracking is accomplished by undoing the actions
of the last rule that fired and continuing to test rules as
if the undone rule had never fired.
_6._1._4. _D_U_M_P
The DUMP option directs the compiler to generate pro-
cedures that will print the contents of STM on the standard
output. The intention of this option is to simplify the
process of developing and debugging rules. By having the
DUMP procedures generated automatically, the knowledge
engineer is freed of the mundane task of writing procedures
to display the current state of the STM. The DUMP pro-
cedures are not intended to serve as the output of an expert
system. Appropriate output routines will have to be
developed by the knowledge engineer after the rules have
been written.
_6._1._5. _R_E_C_U_R_S
TRC will generate code that uses one of two strategies
for searching STM. These strategies (detailed in section
6.3.3) are called LINEAR and RECURSIVE. The LINEAR strategy
is the default. The RECURS directive in the option part
directs the compiler to use the RECURSIVE strategy as the
default. It is possible to override the default on a per
rule basis. Overriding the default is discussed in section
6.3.3.
_6._1._6. _N_O_R_E_C_U_R_S
The NORECURS option directs the compiler to use the
LINEAR search strategy in all rules, unless otherwise
directed. Since this is the default condition, it is not
necessary to use this option.
_6._1._7. _S_A_V_E
The SAVE option directs the compiler to generate pro-
cedures to save all objects which are dynamically allocated
by TRC code on a file. The compiler will also generate pro-
cedures which can restore the dynamically allocated
- 10 -
structures from the previously written files. The intention
of this option is to simplify the development of expert sys-
tems with checkpointing and restarting capabilities. The
procedures generated by this option and the use of those
procedures is described in section 13.6 and Appendix C.
_6._1._8. _T_R_A_C_E
The TRACE option directs the compiler to trace the exe-
cution of the inference engine by maintaining a list of the
rules that have been fired in the order they were fired.
This list can be used to produce an explaination of the
actions taken by the expert system.
_6._1._9. _P_R_E_F_I_X
The PREFIX option directs the compiler to use the iden-
tifier that follows the reserved word 'PREFIX' as a prefix
for all data objects and procedures generated by TRC. The
intention of this option is to facilitate building expert
systems that have more than one inference engine. Supplying
different prefixes for each inference engine insures that
there will be no name conflicts between separate inference
engines, e.g.:
PREFIX X_
_6._2. _R_U_L_E_S
The second section if LTM is the list of rules. Each
rule has a label, which can be supplied or automatically
generated by TRC. The label is used whenever it is neces-
sary or convenient to refer to the rule by name. The label
is followed by the situation part (described in the next
section). The situation part is a list of statements fol-
lowed by the arrow delimiter. The action part (described in
the section following the description of the situation part)
follows the arrow delimiter and is itself a list of state-
ments. The action part is followed by a semicolon which
terminates the rule.
rule ::= label situation arrow action semicolon
label ::= identifier colon | colon
_6._3. _S_I_T_U_A_T_I_O_N
The situation part specifies how STM is to be searched
and what must be present in STM for the situation part to be
true.
situation ::= { [ s-option ] } { [ match ] }
s-option ::= EMPTY identifier identifier
- 11 -
s-option ::= RECURS | NORECURS
match ::= [ integer-literal ] identifier
match ::= NOT identifier
match ::= [ integer-literal ] left-paren name
match-list right-paren
match ::= c-code
name ::= hat identifier identifier
match-list ::= { match-item }
match-item ::= identifier dot identifier relop literal
match-item ::= identifier dot identifier relop
identifier dot identifier
relop ::= equality | not-equal | less-than
relop ::= greater-than | greater-than-or-equal
relop ::= less-than-or-equal
_6._3._1. _M_A_T_C_H_I_N_G
It is necessary to understand how matching is specified
before the s-option part can be explained. A match, which
will also be referred to as a test, is a statement of what
the inference engine is to search for in STM. Assume the
following objects were defined in the definition section:
%%
A (A1 : INT
A3 : INT
A2 : STRING)
B (B1 : INT
B2 : STRING)
%%
The simplest match specifies only the object that must
be present. A search for one object of type A and one
object of type B can be specified as follows:
A B
A search for two objects of type A and two objects of
type B can be specified in many ways, including these four
equivalent ways:
A A B B
2A 2 B
A B A B
A A 2B
The objects can be listed in any order and may be pre-
ceded by an integer literal. The integer literal specifys
how many objects of the named type are to be search for. In
one of the examples there is a space between the count and
the object name and in other examples there is no space
- 12 -
between the count and the object. Spaces are required only
when there would be a conflict without a space. Since the
string "2A" (for example) begins with a digit, it is
presumed to be a numeric literal. Since "A" is not a digit,
the numeric literal ends at that point. Since the numeric
literal contained no decimal point, it is an integer-
literal. The string is therefore lexically equivalent to "2
A".
The reserved word NOT is used to explicitly test for an
empty list. The following match statement will be true if
there are no objects of type A in STM:
NOT A
Any rule which contains a search for an object and a
test for that same list being empty can never be true. TRC
generates an error message in this situation because even
though it is syntactically correct, it is in fact meaning-
less, e.g.:
A
NOT A
Very often it is necessary to search STM based not only
on the type of object, but also based on the values of the
elements of the object. This is specified by placing a list
of the element values after the element name.
(A.A1 == 2)
(B.B1 != 3
B.B2 <= "THIS")
(A.A2 == "HERE" A.A1 > 6)
These three statements can be translated as follows:
First search the A list for an object whose element A1 is
equal to two, then search the B list for an object whose
element B1 is not equal to three and whose element B2 is
less than or equal to "THIS", finally search the A list for
an object whose element A2 is equal to "HERE" and whose ele-
ment A1 is greater than six. This situation part would be
true if all three objects were found in STM, otherwise it
would be false. In the first match only the value of A1 is
specified. Only the elements that are specified are tested,
the values of any other elements that the object may contain
are not considered. Association of parameters is by name,
so it is not necessary to list elements in the order they
were declared. The third match statement in the example
above lists the value of element A2 first, even though ele-
ment A1 was declared first.
The final case that must be considered is the case
where it is necessary to search STM for an object whose ele-
ments are to be tested against the result of some previous
- 13 -
search. To do this it is first necessary to name the object
that is being searched for so that it may later be referred
to, e.g.:
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
(A.A1 != A.A3)
The first statement begins with a hat character. This
indicates that this object is to be named. The hat charac-
ter is followed by the object type and it's name. The name
is followed by a list of the elements to search for, in this
case a search for element A1 equal to two is specified.
This statement can be translated as follows: Search the A
list for an object whose element A1 is equal to two and name
that object "FIRST". A name that is applied to an object is
called a free variable. The scope of a free variable is the
current rule. Free variable names can be reused in subse-
quent rules.
The second statement specifys that the B list is to be
searched for an element B1 whose value is equal to the value
of the element A3 found in the previous statement. The free
variable name makes it possible to refer to previously found
elements.
The third statement, while looking innocent enough, is
radically different from all previous examples. In all the
previous examples the exact value that was being searched
for was known before the search began. That value was
expressed as either a literal, or the value of some element
that was found in a previous test. In the third statement,
the A list is being searched for an object whose elements A1
and A3 are not equal to one another. The values these ele-
ments are to have are not specified, only their relationship
to one another. This can be further complicated:
(^A Second
A.A1 == 3)
(A.A1 < Second.A3
A.A3 < A.A1)
In the second match statement of this example A1 is
being compared to the value of A3 in the object named
"Second" and it is being compared to the value of the ele-
ment A3 in the object that contains it. An element may
appear on the left hand side of the relational operator only
once in a given match statement. It is now possible to con-
sider the effects of the options.
- 14 -
_6._3._2. _O_P_T_I_O_N_S
The situation part begins with a (possibly empty) list
of options. The reserved words RECURS or NORECURS may
appear in the option part of the situation. The appearance
of one of these words causes the named strategy to be used
rather than the current default strategy. It is not an error
to explicitly specify the default strategy, but it is
unnecessary. The option part of the situation may also
include EMPTY statements. An EMPTY statement is a static
object declaration. The intention of the EMPTY statement is
to provide a means of passing data from STM to embedded c-
code and from embedded c-code to STM. Examples in section
12 will illustrate these actions.
_6._3._3. _S_E_A_R_C_H _S_T_R_A_T_E_G_I_E_S
A small example provides an easy way to illustrate the
two search strategys. This example is a complete TRC
specification, though not useful for anything other than as
an example.
%%
PENNY (MINT : STRING DATE : INT)
%%
PENNY (MINT => "DENVER"
DATE => 1964)
PENNY (DATE => 1966)
%%
R1:
(^PENNY First)
(PENNY.DATE <= First.DATE)
=>
MARK PENNY
;
%%
STM will be initialized to contain two objects of type
PENNY, the first minted in Denver in 1964, the second minted
in an unspecified location in 1966. Since the reserved word
RECURS does not appear in either option section, the default
LINEAR search strategy will be used.
In the LINEAR strategy, STM is searched in a linear
fashion for each object specified in the situation part.
Objects are searched for in the order they are listed. In
this example, the object named "First" will be associated
with the first object in the list. Since the values of the
elements are not specified, any object of type PENNY will
match. This object is then temporarily marked as being "in
use" and can not be used to match any subsequent tests. The
list will then be searched for an object of type PENNY whose
DATE element is less than or equal to the DATE element of
the "First" object. The only other object in the list has a
- 15 -
DATE element of 1966 which is not less than or equal to
1964, so the rule fails. In the LINEAR strategy, when any
test in the situation part fails, the entire rule fails
immediately, no further tests are made.
It should be obvious that this rule would have been
true if "First" had been associated with the second object
in the PENNY list. This is precisely the purpose of the
RECURSIVE search strategy. In the RECURSIVE search stra-
tegy, when a test fails, the previous test is redone. To
redo a test, the object that was marked as "in use" is
unmarked, and the list is searched from that point for an
object that will match the test. The RECURSIVE search fails
when a single test fails and it is no longer possible to
undo the previous test (this occurs when there is no previ-
ous test). The RECURSIVE search strategy is a powerful pat-
tern matching tool, but it can be expensive in terms of exe-
cution time.
_6._3._4. _E_M_B_E_D_D_E_D _C_O_D_E
Arbitrary C language code may be embedded in the situa-
tion part anywhere a match may occur. Recall that embedded
code (c-code) is recognized as a single lexical element by
the scanner, the C language itself is not parsed by TRC.
Errors in embedded code will not be detected by TRC. The
intention of permitting embedded code in the situation part
is to make it possible to include tests that may not fit the
context of a match against STM.
In order to integrate an embedded code test with the
existing match statements, it is necessary to have a way to
refer to objects in embedded code. In order for embedded
code to have the same functionality as a match statement, it
is necessary to have a way to cause a rule to fail in the
embedded code. Each of these facilities are provided.
_6._3._5. _E_M_P_T_Y _O_B_J_E_C_T_S
The purpose of the EMPTY statement is to create a named
object that can be referred to by match statements and
embedded code, without having to exist in STM. One of the
capabilities that results is the ability to have STM
searched on the basis of the result of some external pro-
cedure, e.g.:
R1:
EMPTY PENNY SPARE /* this creates an object of
type PENNY that is named
SPARE. This object exists
separately from STM and it's
elements are not initialized. */
{
/* this embedded C code precedes
- 16 -
any search of STM
*/
if(($SPARE.DATE = external-procedure()) <= 1920){
$FAIL.
}
}
(PENNY.DATE == SPARE.DATE)
=>
MARK PENNY
;
Several things are happening in this example. First an
object of type PENNY is created and given the name SPARE.
This object exists separately from STM and will exist only
during the current rule. It is useful only as something
that can be referred to in other statements. A section of
embedded code precedes the only match statement in this
rule. When the code produced by TRC is compiled and run,
that embedded code will be executed before STM is searched,
by virtue of the fact that it precedes the match statement.
The embedded code contains an "if" statement which con-
tains an embedded assignment and function call as part of
it's condition. The left-hand-side of the embedded assign-
ment, "$SPARE.DATE" is not syntactically correct C language
code. The dollar character is a flag to TRC that indicates
a reference to a named object. The identifier that follows
the dollar character will be translated by TRC during the
output phase. This translation is described in section 12.
The statement "$FAIL." is translated by TRC into whatever
statements are required to make this rule fail. The defini-
tion of failure depends on the search strategy. If the
LINEAR strategy is being used, "$FAIL." will cause the rule
to stop searching STM and continue with the next rule. If
the RECURSIVE strategy is being used, "$FAIL." will cause
the rule to undo and then redo the previous match statement.
An object name preceded by the dollar character may
occur in the embedded code anywhere a variable name may
occur, since that is what it will actually be translated to.
Embedded code may also refer to objects that exist in STM
using the same dollar character translation technique:
R1:
RECURS
(^PENNY NEW
PENNY.MINT == "DENVER)
{
if(some-function($NEW.DATE))
$FAIL.
else
$NEW.DATE = 0;
}
(PENNY.DATE == 1921)
- 17 -
=>
MARK PENNY
;
The "else" part of the embedded code illustrates an
assignment to an element of the object named NEW. The
object that is being called NEW exists in STM and this
assignment to it's DATE element is permanent. Since this
rule is recursive, it is possible that this embedded code
will set the DATE element of every object in the PENNY list
to zero. These modifications are made before it is even
known that the situation part is true. Modifying STM in the
situation part of a rule would be a major departure from
traditional expert system implementation techniques. Furth-
ermore, the BACKTRACKing option is unaware of changes made
in STM by embedded code. The BACKTRACKing option is unable
to correctly undo this rule.
_6._4. _A_C_T_I_O_N
The ACTION part specifies what is to be done if the
situation part is true. The actions that can be taken pri-
marily involve adding objects to STM or deleting objects
from STM. Recall that the non-terminal 'entry' was defined
in section 4.
action ::= statements c-code
statements ::= { [statement] }
statement ::= MARK mark-list
statement ::= ADD add-list
statement ::= OPTIMIZE identifier
mark-list ::= { [ mark-item ] }
mark-item ::= [ integer-literal ] identifier
add-list ::= { [ entry ] }
_6._4._1. _M_A_R_K
The MARK statement is used to delete objects from STM.
Only objects that were found in the situation part may be
deleted. The reason for this constraint is that only the
objects found in the situation part are definitely known to
exist in STM. STM is searched only in the situation part,
there is no searching in the action part. Objects may be
deleted by name or in the order they were found, e.g. (using
the definitions from section 6.3.1):
R1:
(A.A1 != A.A3)
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
=>
MARK A
;
- 18 -
This MARK statement will delete the object in the A
list that met the test (A.A1 != A.A3). In some instances it
may be desirable to delete an object that was not the first
object that was found, e.g.:
R1:
(A.A1 != A.A3)
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
=>
MARK FIRST
;
The A list object named 'FIRST' is the second object of
type A to be found. It is specified as the object to delete
by using it's free variable name. A MARK statement can
specify a count of how many objects of a given type are to
be deleted. A MARK statement may list any number of objects
to delete, and each object to be deleted can have a separate
MARK statement if desired. In no case can more objects be
deleted than were found in the situation part. Each of the
following examples is equivalent:
R1:
(A.A1 != A.A3)
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
=>
MARK B FIRST A
;
R1:
(A.A1 != A.A3)
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
=>
MARK 2A
MARK B
;
R1:
(A.A1 != A.A3)
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
=>
MARK FIRST
MARK A
MARK B
;
- 19 -
_6._4._2. _A_D_D
The ADD statement is used to add new objects to STM.
As in the MARK statement, an ADD statement can specify one
or several objects to add to STM. The value of each element
of each object can be specified as in the STM section of the
specification. Each object is inserted at the head of the
appropriate list. The insertions are actually made in the
opposite order that they are listed, the net effect is that
the objects appear at the head of the list in the order they
are specified. ADD and MARK statements may be intermixed in
any order, e.g.:
R1:
(A.A1 != A.A3)
(^A FIRST
A.A1 == 2)
(B.B1 == FIRST.A3)
=>
MARK FIRST
ADD A (A.A1 => 6
A.A3 => FIRST.A3)
ADD B (B.B2 => FIRST.A2
B.B1 => 9)
MARK B
;
All the ADD statements will be executed before any MARK
statements are executed regardless of the order of the
statements in the action part. The statements are ordered
by the compiler to insure that an ADD statement does not
refer to an object that has already been MARKed. In the
example above, the first ADD statement refers to the object
named 'FIRST'. The object named 'FIRST' is MARKed in the
previous statement. If the code were executed in the speci-
fied order, the element 'FIRST.A3' would not exist when the
ADD statement was executed.
_6._4._3. _O_P_T_I_M_I_Z_E
The OPTIMIZE statement is named for it's primary func-
tion, hand optimization of LTM. There is also a built in
optimizer that can be invoked. Optimization is discussed in
detail in section 7. The OPTIMIZE statement can be thought
of as an unconditional GOTO statement. In normal execution,
after a rule fires the rules are tested from the beginning
of LTM for the next rule that will fire. The OPTIMIZE
statement can specify a point other than the start of LTM to
begin testing rules. In addition to optimization, it can be
used to impose a customized control structure on the set of
rules.
One example of the use of the OPTIMIZE statement is to
implement a search for the absence of some object(s) in STM,
- 20 -
which is not otherwise supported by the TRC language. To
search for the absence of some object(s), use two rules.
The first rule searches for the presence of the object(s) in
question, if the rule is true then the object(s) are not
absent. If the rule fails, the object(s) are absent, e.g.:
R1:
/* effectively search for the
absence of an object A with
element A1 == 2 */
(A.A1 == 2)
=>
/* If this rule is true, branch
around the next rule */
OPTIMIZE R3
;
R2:
/* If R1 fails, then there is
no object A with element
A1 == 2. An empty situation
part such as this always
evaluates to true */
=>
/* whatever you wish to do in response
to the absence of A1 == 2 */
;
R3:
/* continue here if R1 is true */
. . .
_6._4._4. _C-_C_O_D_E
A c-code may follow the MARK, ADD and OPTIMIZE state-
ments. This is user code that is to be executed when a rule
fires. C-code may not appear between MARK, ADD or OPTIMIZE
statements. If it is necessary to refer to an object that
is being MARKed in c-code, it should be done in the situa-
tion part. A c-code may precede the arrow symbol that
separates the situation and action parts. C-code in this
position is equivalent to c-code in the situation part
preceding the MARK, etc. statements, e.g.:
R1:
(^A FIRST)
{
/* this c-code follows all
situation tests. It is
effectively in the action
part since it will execute
only if the situation is
- 21 -
true */
some_procedure($FIRST.A1);
}
=>
MARK FIRST
;
_7. _O_P_T_I_M_I_Z_E_R
The optimizer does not produce code that is optimum in
any sense. What it does is to perform a single, very useful
code modification that can have a very positive impact on
execution time.
Consider the execution of an inference engine. Each
rule is tested until one who's situation part is true is
found. This rule's action part is then executed. When
rules are being tested the problem space is being searched.
When an action part is executed a step is taken in the solu-
tion of the problem. Searching the problem space is clearly
part of the solution, but the action part is where the the
results occur.
The goal, which is not attained, is to reduce the
search time to zero. To attain this goal it would be neces-
sary to know each time a rule fires which rule will fire
next. This is generally not known. In particular when the
inference engine begins execution, the contents of STM are
not known, any rule can be the first rule to fire. Once a
rule has fired and each time any rule fires a great deal of
implicit knowledge about the contents of STM is obtained.
It is known that no rule previous to the current rule is
true and no rule previous to the current rule can be true
after the execution of the current rule unless the current
rule modifies STM in such a way as to make some previous
rule true. This simple fact is the entire basis of the
optimizer, which attempts to reduce the number of rules that
are tested by deducing which rules can not possibly fire.
Three tests must be performed to determine a candidate
next rule, which is the first rule in LTM that can possibly
fire after the current rule fires. The three tests are
called the NOT test, the ADD test and the MARK test.
The first case to be considered is the case of a rule
which contains a NOT statement in the situation part. A NOT
test is an explicit test for an empty list. When a rule
that fires contains an ADD statement it will not be possible
for any previous rule with a NOT statement referring to that
list to be the next rule to fire. Likewise, if a rule that
fires contains a MARK statement and no ADD statement refer-
ring to that same list, it is possible that the list will
- 22 -
become empty making it possible for the rule with the NOT
statement that previously failed to become true. If it is
determined that it is possible for a rule to fire after the
NOT test, that rule becomes the candidate rule and no
further testing is done.
Consider the case of a rule with no NOT statements that
recursively searches STM for a situation. If this rule
fails, it will continue to fail until something is added to
STM to make it true. If all rules searched STM recursively
it would be known when a rule fires that of the rules that
precede the current rule, only those rules that search for
something added to STM by the current rule can possibly fire
in the next pass.
If the current rule adds something to STM, control
could continue with the first rule that searches for that
something rather than the first rule in LTM. If no rule
prior to the current rule searches for those things added to
STM by the current rule or if the current rule adds nothing
to STM then no prior rule can execute. Control could con-
tinue with the current rule rather than at the beginning of
LTM. By causing control to continue with a rule later than
the first rule the amount of searching is reduced.
The case of a rule that performs only a linear search
on STM must also be considered. The previous conclusion
about items being added to STM is still true; a rule that
adds something to STM can cause a linear search rule to
become true. With linear search it is also possible that a
rule will become true if something is removed from STM. If
a linear rule searches for several similar items which are
present but not correctly ordered it is possible for this
linear search to fail where a recursive search would not
have failed. If there were excess items, removing one or
more may cause a different linear assignment which could
make a linear rule true. This is the MARK test. Examples
of this situation are non-trivial, but where correctness is
an issue these cases can not be overlooked.
The TRC optimizer selects a continuation point for each
rule based on what the rule adds to or deletes from STM
rather than testing each rule from the beginning of LTM.
The continuation point is the first rule that could fire
based on the NOT and ADD tests for all rules, and the MARK
test for linear rules. The TRC optimizer is somewhat naive
in that it considers only items added or deleted with the
ADD and MARK statements. The optimizer is unaware of any
changes that may have been made to STM by user code. The
caveat is if STM is modified in user code the optimizer may
produce incorrect code. The optimizer, which can be invoked
with a command line option (-O), tests each rule individu-
ally and ignores those rules that were hand optimized in the
specification.
More information about the Mod.sources
mailing list