TRC - expert system building tool (part 2 of 8)
sources-request at panda.UUCP
sources-request at panda.UUCP
Sat Feb 8 08:51:14 AEST 1986
Mod.sources: Volume 3, Issue 110
Submitted by: ihnp4!dicomed!ndsuvax!nckary (Daniel D. Kary)
This is NOT a shell archive. Simply delete everything up to and including
the cut mark and save the result as tutorial.doc.
Dan Kary
ihnp4!dicomed!ndsuvax!nckary
-------------- cut here ---------------
An Introduction to TRC
Daniel D. Kary
North Dakota State University
Computer Science Department
300 Minard Hall
Fargo, ND 58102
_A_B_S_T_R_A_C_T
TRC is a compiler that is useful in building
expert systems. The input is a rule based system
whose input is syntactically similar to the input
to YACC[1]. The output is a set of C language
procedures.
While not all features of TRC are discussed,
the major features of the language are presented.
Example code is used to illustrate the features
and references to more detailed documentation are
included. This may be the best starting point for
first time users.
_1. _I_N_T_R_O_D_U_C_T_I_O_N
The fundamental notion that virtually the entire field
of expert systems is built upon is the situation/action rule
paradigm for problem solving. This paradigm is on the one
hand the embodyment of simplicity and on the other hand a
tool that is stunningly powerful.
The situation/action rule paradigm is a way of embody-
ing both information about a problem and the way the infor-
mation is applied to solving the problem in a single struc-
ture. Consider this trivial problem, you have a pile of
coins and wish to reduce it to the smallest number of coins
of equal value. One of the rules in a system to solve this
problem could be:
IF: there are five pennys in the pile
THEN: substitute a nickel for the five pennys.
A situation/action rule has the form of an IF...THEN...
statement, common to virtually every progamming language.
- 2 -
The IF part is the situation, the rule checks to see if this
situation exists, if it does the THEN or action part is exe-
cuted. In addition to the rules there is a pattern matcher,
which is invoked in the situation part to determine if the
situation exists. This pattern matcher is typically more
powerful that what is available in an IF statement in a pro-
gramming language. The pattern matcher searches a data
base. The data base contains all the information that is
specific to this instance of the problem, and in some cases
information that is germain to all or many instances of the
problem. In the trivial example given here, the data base
would contain the pile of coins. Finally there is a stra-
tegy for deciding which rule to test next. Usually the
rules are tested in a pre-specified order. When a rule
whose situation part is found to be true, its action part is
executed. The strategy for testing rules usually continues
with the first rule each time any rule fires.
This simple paradigm has been used to build systems
with expert levels of problem solving ability in areas as
diverse as elucidating the structure of hydrocarbons[2] to
diagnosing blood diseases[3] or pulmonary function
disorder[4]. In each case a system of rules is built up,
each rule embodying a 'rule of thumb' or a piece of 'common
wisdom' or 'accepted practice' specific to the problem
domain and used by a human expert in solving the problem.
The system of rules is referred to as a 'rule based system'
or an 'expert system'. The expert system attempts to solve
a problem by emulating the problem solving behavior of a
human expert. Expert systems are often easy to modify or
extend. Just as a human expert gains expertise, rules
representing new knowledge can be added to an expert system.
Since the situation/action rule is basically an
IF...THEN... construct, why is a special language needed?
Writing these rules in a traditional programming language
can be tedious, a single situation part may require multiple
conditional tests. In a system with a large number of
rules, the structure of the system may be difficult to see
and difficult to modify because the many details of the pro-
gramming language tend to hide the structure. Writing the
code that maintains and searches the data base that is
referred to in the situation part is tedious and repiti-
tious, making it an ideal subject for automation.
The rest of this tutorial is devoted to describing TRC,
a tool for building expert systems. The next section, sec-
tion 2, describes the overall format of the input to TRC.
Section 3 presents a sample set of rules to give an initial
overview of how TRC works. The remaining sections present
semi-formal descriptions of the syntax of TRC.
This document does not contain all the information
needed by advanced users. _T_h_e _T_R_C _L_a_n_g_u_a_g_e _R_e_f_e_r_e_n_c_e
- 3 -
_M_a_n_u_a_l[_5] contains a complete formal description of the
features of TRC.
_2. _B_A_S_I_C _S_P_E_C_I_F_I_C_A_T_I_O_N_S
Every TRC program consists of five sections, the
header, definitions, short term memory (data base), long
term memory (rules) and the trailer. These sections are
separated by double percent "%%" marks as in YACC specifica-
tions. The form of a full specification is illustrated in
figure 1. The header, short term memory and trailer are
optional so the minimum specification would contain only the
definitions and long term memory. All of the %% marks must
be present in each specification file.
header
%%
definitions
%%
short term memory
%%
long term memory
%%
trailer
Figure 1: TRC Specification
The purpose of the header and trailer sections is to
permit the inclusion of C language code in the program, much
as in YACC. One of the common features of compiled pro-
cedural languages is that data objects and data types used
in the program must be declared or defined before they are
used. This is also true for TRC. While TRC is not a pro-
cedural language, declaring objects before using them sim-
plifys the process of translating to C which is a procedural
language.
The remaining components of the TRC grammar are tradi-
tional components of expert systems. The short term memory
section, herinafter abbreviated STM, is sometimes called the
data base. It contains the data that is searched in the
situation part of a rule. Expert systems are usually
modeled after the problem solving behavior of a single
expert or a group of experts in solving a single problem or
class of problems. The information specific to the current
instance of the problem is usually gathered by the expert,
manipulated in the solving of the problem and then forgot-
ten. The way this information is remembered and then for-
gotten in the human brain and the area of the human brain
where it is stored is called short term memory by psycholo-
gists. Using that same name here refers to the similarity
of purpose.
- 4 -
The long term memory, herinafter referred to as LTM,
contains the rules and again is a reference to human brain
processes. This name is a reference to the part of the
human brain that remembers things for a long time. The
expert usually has a set of formal procedures and informal
rules of thumb that are used in solving the problem. The
data changes with each instance of the problem, but the
rules for solving it remain the same. Human experts have
the ability to gather more expertise, to learn more about
the problem. The experts learning behavior is imitated by
adding new rules or modifying existing rules in LTM.
_3. _W_R_I_T_I_N_G _R_U_L_E_S
The coin problem mentioned in the introduction will be
used to illustrate the syntax of writing a rule. This
presentation is made only as an illustration. A complete
description of the syntax of a rule will be given in section
[?]. A set of four rules that will reduce the pile of coins
will be given. The rule mentioned in the introduction
searched the data base for five pennys and replaced them
with a nickel. To express this as a TRC rule, write:
R1:
5 PENNY
=>
MARK 5 PENNY
ADD NICKEL
;
All rules begin with a label, in this case 'R1:'. A
label is a token followed by a colon, and a token is a
string of characters (upper case or lower case), digits
and/or the underscore character. The first character of a
token must be a character. Labels are used to refer to the
rule by name in optimizing and user specified control.
These issues are discussed in section 6.
The label is followed by the situation part, which
specifies what to search for in the data base (STM). In
this case we are specifying a search for five items of type
PENNY. The arrow symbol ( => ) marks the end of the situa-
tion part and the beginning of the action part. If the
situation part evaluates to true (there are 5 pennies in
STM) the action part will be executed. There are two state-
ments in the action part of this rule. The statement 'MARK 5
PENNY' specifys that the 5 pennies that were found in STM
should now be removed. The statement 'ADD NICKEL' specifys
that one more nickel should be added to STM (the pile of
coins). So this rule solves a small part of the problem by
performing a simple transaction.
The remaining three rules needed to reduce the pile of
- 5 -
coins to the minimum number of equal value (assuming only
pennies, nickels, dimes and quarters are used) are listed in
figure 2.
R2:
2 NICKEL
=>
MARK 2 NICKEL
ADD DIME
;
R3:
2 DIME
NICKEL
=>
MARK 2 DIME
NICKEL
ADD QUARTER
;
R4:
3 DIME
=>
MARK 3 DIME
ADD QUARTER
NICKEL
;
Figure 2: Coin Rules
This simple set of rules illustrates both the indepen-
dence of each rule and the interaction of the rules. Each
rule describes a transaction that will reduce the number of
coins in the pile without changing the total value of all
the coins in the pile. Suppose the pile of coins consists
of three dimes and one nickel. Initially R4 is the only
rule whose situation part is true. After the action part of
R4 is executed, R2 becomes true by virtue of the fact that
R4 added a nickel. After R2 is executed the pile is reduced
to a quarter and a dime, the minimum number of coins to make
thirty-five cents.
As was mentioned earlier, each of these rules is basi-
cally an IF...THEN.. construct. The meaning and purpose of
each of these transactions is quite evident when expressed
as a situation/action rule. The same may not be true of an
equivalent program written in a procedural language that
included IF...THEN.. statements, procedure calls for search-
ing the data base (STM) and procedure calls to remove coins
from the data base or add coins to the data base.
If we want to include other coins, a half dollar or
- 6 -
dollar coin, we can easily add another rule or two. Unusual
coins, perhaps a twenty cent piece and a thirty-five cent
piece, may force us to rewrite a previous rule or reorder
the rules. These changes are easily acomplished in a rule
based language like TRC, they may not be so easily accom-
plished in a procedural language.
All upper case letters were used for all the tokens, or
words, in this set of rules. All reserved words in TRC are
all upper case. MARK and ADD are reserved words that can be
used in the action part. The rule labels and the names of
the things that are being searched for in STM (PENNY, DIME
etc.) were also expressed in upper case. This is a sug-
gested convention. Either upper or lower or both may be
used. Later we will see how C language code can be embedded
in both the situation and action parts. Most of the C
language is written in lower case, writing TRC in upper case
will make it easier to distinguish the two.
_4. _D_A_T_A _D_E_F_I_N_I_T_I_O_N
Now that the basic idea of a rule based system has been
presented a more formal look at the syntax of TRC is in
order. TRC rules will request that a data base be searched,
or that things be added to or removed from the data base.
The code for searching the data base or adding new things to
the data base or removing things from the data base is gen-
erated by TRC. In order for TRC to generate this code it
must know what kinds of things are going to be in the data
base (STM).
Suppose an expert system that dealt with the real value
of coins, rather than just their face value was needed.
Information about each coin might include not only it's dom-
ination, but also the year and site of it's minting, it's
condition and it's numismatic value. The types of things
that are referred to in the rules (coins, in this case) will
be called objects. The attributes or values that are asso-
ciated with each object will be referred to as 'elements'.
All objects (and their elements) that will be referred to in
the rules must first be defined in the definition section of
the code.
The definition section of the code consists of a list-
ing of the definitions of each of the object types. A YACC
grammar for a single definition is given in figure 3.
Strong type checking is enforced by the compiler, the type
(INT, FLOAT or STRING) of items to be compared must be
identical. Each definition which contains an item_list
results in the declaration of a structure in the output
code. STM consists of lists for each of the types declared
and a count of the number of items in each list. For
objects which contain no elements, TRC generates no list and
no searching code, checking only the count.
- 7 -
definition : TOKEN
| TOKEN '(' item_list ')'
;
item_list : item_list item
;
item : TOKEN ':' type
;
type : INT
| FLOAT
| STRING
| POINTER
;
Figure 3: YACC Grammar for a Definition
The purpose of the POINTER type is to generate a
pointer to a structure of the type of the list. This is a
'hook' that permits building arbitrary structures in user
code. There is no direct support for this type.
Though it is not necessary in any sense, it may be a
good idea to use all upper case character for object and
element definitions. This will make these items much more
visible in the code output by TRC should it become necessary
to review that code. Some correct definitions include:
A (A1 : STRING
A2 : INT)
B (B1 : FLOAT B2 : INT)
C
These definitions create three classes of objects.
Objects in class A contain a string and an integer, those in
B contain a double precision floating point value and an
integer and those in object class C contain no elements.
_5. _S_H_O_R_T _T_E_R_M _M_E_M_O_R_Y
The purpose of the STM section of the code is to permit
the initial contents of STM to be specified. TRC will pro-
duce a single procedure, _i_n_i_t, which adds the listed objects
to STM. Objects are added to STM by insertion at the head
of the list. The objects listed in the STM section of the
code are inserted in the opposite order that they are listed
so the final result is that STM is initialized just as
listed.
This section of the code is intended to serve two pur-
poses. When an expert system is being developed it provides
- 8 -
a way to place data in STM without having to write input
routines. After the rules are written and debugged and a
separate input routine has been written, this section can be
used to specify an initial condition that may be needed for
every instance of the problem. Figure 5 gives a YACC gram-
mar for a single entry in STM.
entry : count TOKEN
| count TOKEN '(' init_list ')'
;
count : /* empty */
| INTEGER
;
init_list : /* empty */
| init_list init_item
;
init_item : TOKEN ARROW INTEGER
| TOKEN ARROW DOUBLE
| TOKEN ARROW STR
;
Figure 5: YACC Grammar for a STM Entry
The objects to be added are just listed, along with the
initial value of any elements the object may have. String
elements that are not explicitly initialized are set equal
to the null string, numeric elements that are not explicitly
initialized are set equal to zero. Suppose the following
objects were declared in a definition part:
A (A1 : STRING A2 : INT)
B (B1 : FLOAT B2 : INT)
C
Some correct entries would include:
10 A (A2 => 9)
B (B1 => 1.1)
2 B (B1 => 2.2
B2 => 6)
C
A
Some incorrect entries are:
10 (A2 => 9) /* the object name is missing */
B (B1 => 9) /* FLOAT literals MUST contain
a decimal point */
C (C2 => 1) /* object C does not include
- 9 -
element C2 */
_6. _L_O_N_G _T_E_R_M _M_E_M_O_R_Y
LTM is the section where the rules are enumerated. TRC
generates a loop which tests the situation part of each rule
in the order they are listed. When a rule is found who's
situation part is true, that rule's action part is executed.
Testing will normally resume at the beginning of the list of
rules. The grammar, or syntax, for a single rule will now
be presented. Code examples will illustrate the syntax.
LTM may begin with optional switches to turn on trac-
ing, profiling or backtracking. These options, which may
also be turned on with command line options, are discussed
in section 8. Following the option part is a listing of the
rules. Figure 5 gives a YACC grammar for a single rule.
Each rule begins with a label. This label is copied
unmodified to the output source code and is used as a label
in the main loop. It will aid the readability of the output
if labels are all upper case and as descriptive as possible.
Following the label is the left hand side or situation part
of the rule. This part of the rule is where the search
strategy is specified and the items to search for are
enumerated.
There are two search strategies, linear and recursive.
The linear strategy is the default. In the linear search
strategy each match causes a linear search from the begin-
ning of STM. The first object that matches the test is
marked as in use. Subsequent searches will ignore the pre-
viously marked item. As soon as any single match fails the
entire rule fails. Consider the following example:
A (A.A1 == "TEST")
This specification requests that two objects be
searched for, one where the elements of the object A can
have any value and one where element A1 of object A is the
string "TEST". The code generated by TRC for this rule will
first check that the list of A objects has at least two
objects, little sense in searching a list when it is known
that the search will fail. If the list has at least two
objects, the first object will match the first A, since no
values were specified for the elements any object will
match. The list of A objects is then searched for one in
which element A1 is equal to the string "TEST". The success
or failure of the rule depends on the success or failure of
this search.
Clearly is is possible that only one object in list A
had element A1 equal to "TEST". It is also possible that the
object whose element A1 was equal to "TEST" was the first
- 10 -
production : label lhs '=>' rhs ';'
;
label : TOKEN ':'
;
lhs : /* empty */
| lhs match
;
match : RECURS
| count TOKEN
| NOT TOKEN
| count '(' free_variable match_list ')'
;
free_variable : /* empty */
| HAT TOKEN TOKEN
;
match_list : /* empty */
| match_list match_element
;
match_element : TOKEN '.' TOKEN relop INTEGER
| TOKEN '.' TOKEN relop DOUBLE
| TOKEN '.' TOKEN relop STR
| TOKEN '.' TOKEN relop TOKEN '.' TOKEN
;
rhs : optional_part pass_part
;
optional_part : /* empty */
| optional_part option
;
option : MARK mark_list
| ADD add_list
| OPTIMIZE TOKEN
;
mark_list : mark_item
| mark_list mark_item
;
mark_item : count TOKEN
;
add_list : entry
| add_list entry
;
pass_part : /* empty */
| C_COCE
;
relop : "=="
| "!="
| "<="
| '<'
| ">="
| '>'
;
Figure 5: YACC Grammar for a Rule
- 11 -
object in the list. If that were in fact the case then the
rule would fail even though it could have been true with a
different search strategy. In this simple case the problem
could be avoided by simply reordering the situation part so
that the object with an element A1 equal to "TEST" was
searched for first.
Not all examples are this simple, and that is the rea-
son for the recursive search strategy. In the recursive
strategy if a given test fails, the previous test is undone
and redone from the point where the selection was made.
This process continues until a match is found or it is found
that a match is not possible. The recursive search is a
more powerful pattern matching tool, but it is much more
expensive in execution time. The search time for the linear
search is order N while for the recursive strategy it is
order N squared. For large N this is a very substantial
difference.
To select a recursive search, the reserved word RECURS
is included in the situation part. The clearest code will
result if RECURS immediately follows the rule's label. If a
rule is declared RECURS, the recursive search will apply to
all objects in the situation part. There is no way to
search recursively for some objects and linearly for others
in a single rule. The scope of the RECURS declaration is
one rule. Many expert system development tools use only the
powerful but time consuming recursive search technique.
Making this facility optional enables the user to exercise
some control on the search time. It is also possible that
the order that the objects occur in the list is important,
in this case the linear search would be required. TRC
always inserts new objects at the front of the list and
never reorders a list or drops an element from a list,
unless specifically directed to.
It is sometimes necessary to compare an element of one
object with an element of some previously found object,
rather than to some literal value. To do this a name for
the previously found object is needed. A name that is
assigned to an object is referred to as a free_variable.
The scope of a free_variable is the current rule. Using the
previous definitions some examples are:
(^A FOO)
(A.A1 == FOO.A1)
(^A BAR
A.A1 != "TEST")
(B.B2 != BAR.A2)
The first line in this example picks the first object
of the A list and assigns the free_variable FOO to that
object. In the second line the A list is searched for an
object whose element A1 is equal to the element A1 found in
- 12 -
the first line. The third line searches the A list for an
object whose element A1 is not equal to "TEST" and assigns
the free_variable BAR to this object. The final line
searches the B list for an object with an element B2 not
equal to the element A2 found in the previous search.
Notice what is happening here, elements from different lists
are being compared. This comparison is permitted because
both elements are integers, so the types of the elements
match. In complex matches like this it is frequently neces-
sary to use the recursive search.
A new definition is needed to consider yet another
case:
C (C1 : INT C2 : INT)
The final case to be considered is the case where two
elements of the same object are compared. There are two
equivalent ways to specify this:
(^C FOO
C.C1 == FOO.C2)
(C.C1 == C.C2)
TRC will generate identical code for either of these
statements. In each line a specification is made that the C
list be searched for an object where elements C1 and C2 are
equal. There is a subtle but important difference between
these similar examples and all previous examples. In all
previous cases the right hand side of the relational expres-
sion evaluated to an absolute value before the search began.
In this example the absolute value of the right hand side of
the relation changes with every object that is tested.
There is a small code overhead for this type of testing,
which is noticeable only if used on many different elements
of many different types of objects.
Finally the form NOT TOKEN, where TOKEN is some object
is an explicit test for an empty list. The case of search-
ing a list for the absence of some element is discussed
later.
The situation and action parts are separated by the
ARROW symbol, "=>". The action part can contain MARK, ADD
and OPTIMIZE statements. The MARK statement lists the
number and type of objects to remove from STM. The only
items that may be removed from STM by a MARK statement are
objects that were enumerated in the situation part. This
restriction is necessary because those objects are the only
ones that definitely are in STM.
The ADD statement lists objects that are to be added to
STM. Objects are inserted at the head of their respective
lists in the order they are listed in the action part.
- 13 -
Objects are always ADDed to STM before objects are removed,
regardless of the order of ADD and MARK statements. This is
necessary because ADD statements can refer to elements that
are about to be removed. Assume the previous definitions of
A and B for this example rule:
RULE:
(A.A1 == "BAR")
(^A FOO
A.A1 == "FOO")
=>
MARK 2 A
ADD B (B1 => 3.14159
B2 => FOO.A2)
{
printf("RULE is firing\n");
}
;
TRC will generate code that will search the A list
first for an object with element A1 equal to "BAR" and then
for an object with element A1 equal to "FOO". If both of
these searches succeed the action part will execute. The
MARK statement specifies that both A objects are to be
removed from STM. This could also have been specified:
MARK A A
or
MARK A
MARK A
The MARK statement causes objects to be removed in the
order they were found. It is possible for a situation to
exist where it is not desirable to remove the objects in the
order they were found. In the example above it may be
desirable to remove the second type A object, but not the
first. Objects may be MARKed based on their free_variable
name. The following statement will cause only the second
type A object from the sample rule to be MARKed:
MARK FOO
The example ADD statement adds an object of type B to
list B copying a value out of list A. The code generated by
TRC will do the ADD first then the MARK since the ADD state-
ment refers to a MARKed element.
The code section is executed after the ADD and MARK
code and simply prints a message. It is included here to
demonstrate what a code section looks like. Information on
techniques used to refer to objects in the code section is
presented in _T_h_e _T_R_C _L_a_n_g_u_a_g_e _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l[_5]. The
final semicolon is included in the TRC syntax to give the
parser a point to sync on in case of syntax errors in the
- 14 -
source.
The OPTIMIZE statement is used to tell TRC that after
the current rule executes it is not necessary to search the
list of rules from the very beginning of LTM, rather the
search can begin with the named rule. The naming of the
OPTIMIZE statement refers to its primary, but not neces-
sarily only purpose. The OPTIMIZE statement can be used to
impose a control structure on LTM. For convenience the
label "Start" alway precedes the first rule and the label
"End" always follows the last rule.
The TRC grammar does not include a way to specify a
search for the absence of some element. This can be accom-
plished using the OPTIMIZE statement and a side effect of
the search strategy. The LTM section in Figure 7 demon-
strates this possibility.
RULE1:
(A.A1 == "FOO")
=>
OPTIMIZE RULE3
;
RULE2:
=>
{
/* do your thing here */
}
;
RULE3:
/* system continues here */
Figure 7: Testing for the Absence of a Pattern
Figure 7 illustrates a technique for testing for the
absence of some element. RULE1 tests for the presence of
the element and uses the OPTIMIZE statement to branch around
RULE2 if it is found. The situation part of RULE2 is empty.
An empty situation part always evaluates to true. RULE2
will always fire when RULE1 fails and never be tested when
RULE1 fires. The combination of RULE1 and RULE2 is a rule
that tests for the absence of an element.
_7. _H_E_A_D_E_R _a_n_d _T_R_A_I_L_E_R
The header and trailer are lexically identical and
serve similar purposes; the inclusion of C code that is not
related to a specific rule but is of a more global nature.
The syntax is identical for the header, trailer and code
section of long term memory. The code section must begin
with an open brace '{' and end with a closed brace '}'. A
code section is recognized by the input scanner using a very
- 15 -
trivial algorithm; when an open brace is encountered a code
section begins. The 'brace count' is set to one and each
time an open brace is encountered the 'brace count' is
incremented. Each time a closed brace is encountered the
'brace count' is decremented. When the 'brace count' is
zero the code section is presumed to have ended. All text
in the code section, except the initial open brace and final
closed brace, is passed through untouched.
This simple algorithm avoids the potential problem of
having to parse the C language within TRC, but it is very
easy to defeat. If the number of braces in a code section
is not balanced, the end of the section will not be deter-
mined correctly - this includes braces embedded in comments.
A single missing brace may cause the entire compilation to
be aborted. Worse yet would be two complimentary missing
braces in separate sections of the specification. Very
large pieces of the specification may be passed. These
problems are common in programming languages and not diffi-
cult to avoid. Sample valid headers and trailers are illus-
trated in figure 6.
{
/* this is a sample valid header section */
struct mystruct{
int a, b, c;
struct mystruct *next;
};
}
%%
definitions
%%
short term memory
%%
long term memory
%%
{
/* this is a sample valid trailer section */
myprocedure()
{
/* do my thing in here */
}
}
Figure 6: Sample Header and Trailer
On the somewhat related subject of comments, C style
comments may be included anywhere in the specification that
a space may occur. Comments that are not part of a code
section are recognized by the scanner and are discarded.
Nested comments outside of the code section are not
- 16 -
supported, comments occuring in a code section are passed
through.
The code in the header section is written on the output
file _h_e_a_d_e_r._h. This file is included in all output files.
The header section should be used to declare structures and
variables which may be used in the action part of a rule.
Since this code will be included in several files it should
not contain initialized data or procedures, which would
cause duplicate definition errors at compile time.
The code in the trailer section is written on the out-
put file _l_o_o_p._c. This is the code file which contains the
main loop, and includes the definitions of all the struc-
tures and global variables manipulated by the inference
engine. Including procedures in the trailer is a convenient
way of gaining visibility of those objects.
_8. _O_P_T_I_O_N_S
The option section occurs at the beginning of LTM. The
option section may be empty, but any options must precede
the first rule. Options may also be specified with command
line flags. There are several options; TRACE, PROFILE,
BACKTRACK, DUMP, RECURS, ZERO and SAVE.
The TRACE option causes a runtime trace to be created.
The primary purpose of this trace is to facilitate generat-
ing explanations. People seldom take the advice of an
expert without a satisfactory explanation of why the advice
should be followed. It may not be reasonable to expect peo-
ple to take advice of an expert system that can not explain
itself. The trace is a list of rules that were fired, or
inferences that were drawn. Having the trace facilitates
explanations of the type, "I found that a, b and c were true
and therefore concluded that d should be pursued". The gen-
eration of explanations is left to user code, only the trace
is provided.
Turning the TRACE option causes the procedure
_a_p_p_e_n_d__t_r_a_c_e and the structure _t_r_a_c_e to be generated. Each
time a rule is fired the procedure append_trace is called
with the number of the rule that fired. This number is
appended to the list of rules. The list structure is
defined:
struct trace{
int rule;
struct trace *next;
} *trace_front, *trace_back;
The pointer _t_r_a_c_e__f_r_o_n_t points to the head of the list
and _t_r_a_c_e__b_a_c_k points to the last item in the list. Rules
- 17 -
are numbered beginning with 1 from the start of LTM. if the
label of the rule would be more convenient it can be
obtained from the rule_names array, e.g. the label of rule
one is:
rule_names[1]
The PROFILE option generates two arrays in which counts
of the number of times each rule executed and each match was
searched are stored. The procedure _p_r_i_n_t__p_r_o_f_i_l_e will print
a summary of the execution of the inference engine.
The BACKTRACK option generates a structure and pro-
cedures needed to implement backtracking. These structures
and procedures are detailed in _T_h_e _T_R_C _R_e_f_e_r_e_n_c_e _M_a_n_u_a_l[_5].
Backtracking is a technique for searching a problem space.
When a dead end is reached, the last decision is undone and
the search continues. In the inference engine generated by
TRC one backtracking step is taken each time all the rules
in LTM fail. When backtracking is enabled, objects marked
for deletion from STM are saved in a back track structure
along with their former position. The number of items added
by the rule is saved in the same structure. To undo a rule
the formerly added objects are deleted and the formerly
deleted objects are restored to their original position in
the STM.
The backtrack procedures assume that the ADD and MARK
statements are the only way that STM is modified. If user
code modifies STM the backtrack save and restore procedures
will have to be modified to be cognizant of user code
changes to STM.
In a system with backtracking it is essential that some
rule recognizes when the problem is solved and returns to
the calling procedure. If no rule does this, the system
will perform every manipulation on STM that it can in every
order that it can and finally will return with STM fully
restored to it's original state. Thus vast resources can be
consumed to to obtain the same results that not calling the
inference engine at all would produce.
The DUMP option causes code to print the contents of
STM on the standard output to be generated. The procedure
_d_u_m_p__s_t_m will print out the contents of each list of objects
in STM on the standard output. There is also a procedure
_d_u_m_p_%_s__s_t_r_u_c_t generated for each object defined, where "%s"
is replaced by the object name. Calls to generate dumps of
the entire STM or specific lists may be embedded in the
C_Code part. TRC itself never generates calls to the dump
procedures.
The RECURS option is the only option that may be placed
in the option part of LTM or in the situation part of a
- 18 -
rule. If RECURS is used in the option part all rules will
default to the recursive search strategy. This option can
be turned off for a given rule by including NORECURS in the
situation part of the rule.
The ZERO option will generate a single procedure, _z_e_r_o.
This procedure will free all the structures that are dynami-
cally allocated by the TRC generated code. The structures
that are allocated dynamically include STM, the backtracking
stack (if backtracking is enabled), the profiling arrays (if
profiling is enabled) and the trace list (if tracing is
enabled). TRC will generate code for any combination of
options. This is useful in situations where the expert sys-
tem is called more than once. The zero procedure will clean
up anything left by a previous invocation.
The SAVE option will generate procedures to write all
dynamically allocated structures to a file and procedures to
restore those structures from a previously written file.
This option makes it easy to write expert systems which
checkpoint their own execution. It is then possible to res-
tart execution in the case of a crash without having to redo
all the work that has already been done. It is necessary to
save all dynamically allocated structures including STM, the
backtracking stack, profiling arrays and the trace list.
Separate procedures are generated for saving and reloading
each of these structures.
_9. _E_N_V_I_R_O_N_M_E_N_T
TRC is a compiler that translates a rule based system
to a set of C language procedures. It is useful in develop-
ing expert systems. TRC produces only an inference engine
and supporting structures, input and output processing must
be added with additional code, presumably in C. The minimum
external code is a main procedure that will initialize STM
and call the inference engine. Figure 8 is a minimal main
procedure that includes examples of calls to procedures to
dump STM, print the trace list and print the execution pro-
file. This assumes that the DUMP, TRACE and PROFILE options
were turned on.
The output of TRC is written on several files in the
current directory. The file names generated are; add.c,
dump.c, free.c, loop.c, misc.c, search.c, relink.c,
backtrack.c, profile.c, zero.c and save.c. A sample
Makefile is given in Figure 8. The reference to main.c
refers to user supplied code.
- 19 -
#include <stdio.h>
#include "loop.h"
extern char *rule_names[];
main()
{
struct trace *temp;
/* initialize STM */
init();
printf("Initial STM");
dump_stm();
/* call the inference engine */
loop();
printf("Final STM");
dump_stm();
/* dump the contents of the trace structure */
temp = trace_front;
while(temp){
printf("%s",rule_names[temp->rule]);
temp = temp->next;
}
print_profile();
}
Figure 7: Sample Main Procedure
# Makefile for expert systems generated by TRC
PROG = loop
OBJS = add.o backtrack.o dump.o free.o loop.o \
misc.o profile.o relink.o save.o \
search.o zero.o main.o
INCS = loop.h
CC = cc
all: $(PROG)
$(CC) -c -O $<
$(OBJS): $(INCS)
$(PROG): $(OBJS)
$(CC) -o $@ $(OBJS) $(LIBS)
Figure 8: Sample Makefile
- 20 -
BIBLIOGRAPHY
1. Johnson, S. C. (1975), "YACC: Yet Another Compiler Com-
piler", Computer Science Technical Report No. 32, Bell
Laboratories, Murray Hill, NJ.
2. Lindsay, Robert, et.al (1980), _A_p_p_l_i_c_a_t_i_o_n_s _o_f _A_r_t_i_f_i_c_i_a_l
_I_n_t_e_l_l_i_g_e_n_c_e _f_o_r _C_h_e_m_i_c_a_l _I_n_f_e_r_e_n_c_e: _T_h_e _D_E_N_D_R_A_L _P_r_o_j_e_c_t,
McGraw Hill, New York.
3. Davis, R., B.G. Buchanan and E.H. Shortliffe (1977),
"Production Rules as a Representation for a Knowledge-Based
Consultation Program", Artificial Intelligence, Volume 8,
Issue 1, February 1977.
4. Feigenbaum, E.A., (1978), "The Art of Artificial Intelli-
gence: Themes and case studies of knowledge engineering",
IJCAI.
5. Kary, Daniel D. (1985), "The TRC Reference Manual", North
Dakota State University, Division of Mathematical Sciences,
Fargo, ND.
More information about the Mod.sources
mailing list