Talking PC6300 For The Blind, part 1 of 11
eklhad at ihuxv.UUCP
eklhad at ihuxv.UUCP
Wed Feb 11 23:59:47 AEST 1987
<>
A Talking Console Device Driver
For The AT&T PC6300
ABSTRACT
A new console device driver gives the AT&T PC6300 the power of
speech, allowing blind workers to use the micro computer
effectively. This device driver takes all standard output
generated by the PC6300, redirects it into an internal buffer, and
enables the blind worker to read the text in an efficient and
productive manner. Arbitrary line oriented applications can be run
without modifications. Using traditional software generation
tools, the blind worker can develop personalized software, or
modify the talking device driver itself. The software expects a
Votrax Type N Talk speech unit attached to the serial port, but a
modular design allows other synthesizers to be substituted without
much reprogramming. The source is in the public domain, and is
available on floppy disks or via electronic mail.
1. INTRODUCTION
Since computers have become an indispensable tool in many
professions, the blind worker's productivity often depends
critically on an efficient, user friendly human-machine interface.
This becomes even more important as CD-ROM peripherals enable the
micro computer to act as a reference library. Already, a PC6300
can be equipped with an electronic version of the encyclopedia for
only $1,000. For the first time, the blind researcher may have
access to inexpensive electronic information.
While a talking terminal provides an interface to any computer, the
combination is unnecessarily complex, and many cannot afford to buy
or rent both the talking terminal and the target computer. A
general purpose talking micro computer would allow blind workers to
exploit the capabilities latent in every micro computer, including
terminal emulation if desired. A new talking device driver for the
PC6300 brings this goal within easy reach.
The software has the following features:
1. A screen independent buffer to capture standard output.
2. Normal display for sighted co-workers.
3. Audio feedback accompanying displayed text or error
conditions.
4. Reading the text at interrupt level while application
programs monopolize the CPU.
5. User defined pronunciations for words or symbols.
6. User defined key/command correspondence maps.
Subsequent sections describe these features in detail.
2. ARCHITECTURE
2.1 Method And Machine
Most talking programs, by virtue of being programs, perform one
function, be it terminal emulation, word processing, or file
management. A talking "computer" must be much more flexible. This
requirement implies changes to the resident operating system. On
the PC6300, this amounts to replacing the keyboard/screen device
driver, a relatively simple task. This, along with low cost, is a
strong argument for the PC6300. Smaller machines (e.g. Apple2E)
possess inflexible ROM resident operating systems, making major
modifications difficult. Memory constraints are also a factor
here. More powerful micros (e.g. Unix based) make device driver
modifications difficult due to the complexity of the operating
system.
Other important features of the PC6300 include a speaker for audio
feedback, a keyboard with full ASCII and function keys, and an
easily accessible interrupt system. Reading text at interrupt
level enhances productivity considerably since the user can review
output during program execution.
2.2 Speech Synthesizer
Internal speech peripherals that consume memory and CPU time are
not appropriate, since, by definition, other application programs
must run on the PC6300 along with the synthesizing software.
Instead, the speech unit should be a low cost, off-the-shelf
device, that is easily attached to the micro computer via the
serial or parallel port. The unit must convert an ASCII stream
into analogue speech, and provide appropriate control and feedback.
The speech unit should possess the following features:
@ Costs less than $500.
@ Wide range of speaking rates.
@ Appropriate x-on x-off or RS232 flow control.
@ A flush (shut up) command to terminate speech and clear
internal buffers.
@ A small phase delay between the incoming ASCII stream and the
actual speech signal.
Many underestimate the importance of the last criterion. Extensive
buffering (as in the Echo speech peripheral) is unacceptable. The
micro computer's internal "cursor" should track the actual speech
as accurately as possible. Using the Votrax causes the cursor to
lead the speech by about fifteen words, but this will have to do.
Surprisingly, speech quality is relatively unimportant; the user
quickly adapts to the specific speech synthesizer. Cutting costs
is usually more important. Although the Votrax Type N Talk
incorporates relatively old technology, it is still the best
synthesizer for this application.
2.3 Screen Oriented World
The number of screen oriented application programs is monotonically
increasing. This device driver makes no concession to these
programs; in fact, it actively opposes them. Function and control
keys have been redefined, in order to simplify reading operations
by providing single key-stroke commands. The visible cursor and
the internal cursor (where the text is read) are completely
independent. Any cursor control escape sequences produced by
application software will be read literally, distracting the user.
These line oriented constraints were not introduced casually, but
after much thought and study. Supplementing a screen oriented
program with speech is irreparably inefficient. One can never
reproduce the benefits of a two dimensional visual search and scan.
Instead, one is left with only the inconveniences. I have
implemented talking screen oriented editors in the past, and the
efficiency doesn't begin to compare with line oriented editors.
The cost of hearing each letter or word that the cursor passes over
is just too high. To illustrate, cut a small whole in a large
sheet of paper, and hold it in front of your terminal. The whole
should be the size of a 5 letter word on the screen. Now try
running a visual editor, tracking the cursor with the whole in your
sheet of paper. Even simple edits become slow and error prone.
Therefore, all application programs are assumed to be line
oriented. For this reason, this system may not be optimal for
partially sighted workers, who may prefer a magnified screen
supplemented with speech. Never try to be all things to all
people.
2.4 Audio Feedback
Audio feedback is an important feature that differentiates this
system from commercial talking terminals. When the PC6300 displays
any characters, the system simulates the sounds of a 1500 baud
printer. Like most printers, whitespace is silent and carriage
return generates a unique sound. With this feedback, a programmer
always knows when the computer is producing output.
Typically, commercial speech terminals run in one of two modes. In
the polling mode, no audible indication accompanies generated text.
This forces the user to constantly ask the terminal, via keyboard
driven speech directives, whether additional data has been
displayed. This is inefficient and frustrating. It is easy to
miss unexpected messages, such as "system going down in 3 minutes,
save all files now!!" In the second mode, all text is read
automatically. Here too, the correlation in time is lost. A
computer can generate screen after screen of text while the speech
synthesizer translates the first line. In addition, users rarely
want to read everything, word for word, and the blind user should
not be forced to wade through a deluge of data to determine whether
any unexpected messages were generated. Since speech, because of
its speed, cannot provide timely feedback, simulating the sounds of
a printer via an audio device is essential.
These sounds provide information as well. Since characters,
spaces, and carriage returns generate different sounds, the user
often receives considerable information about the text being
displayed. Solid lines, English text, assembly language, high
level language, tables, and blank lines all sound different. Some
common messages can be inferred from the sound patterns produced,
eliminating the need to read the text, and improving productivity.
Along with printer simulation, The PC6300 produces several other
sounds, usually associated with error conditions. These sounds
include:
1. A long tone indicating an active or enabled mode.
2. A short beep indicating a command error. This sound also
accompanies control-G (industry standard).
3. A low buzz indicating a faulty or missing RS232 connection,
or an inactive speech unit.
4. A fast sequence of high notes indicating a boundary
condition, such as reading beyond the internal buffer.
3. EQUIPMENT AND SETUP
Along with the AT&T PC6300, the system requires a Votrax Type N
Talk unit, an RS232 serial cable, and a speaker with mini phone
jack.
The cable accompanying the Votrax unit cannot be used as is, since
the PC6300 has a non-standard RS232 pinout. Other IBM compatibles
tolerate standard RS232 cables. Unfortunately, PC6300 users must
construct a new cable. The following connections are required:
Votrax PC6300
male female
1 <-> 1
7 <-> 7
2 <-> 3
3 <-> 2
4 <-> 4
5 <-> 5
20 <-> 6
8 <-> 23
To configure the system, connect the PC6300 to the Votrax unit via
the constructed RS232 cable, and set the Votrax baud rate to 9600
baud. This is done via dip switches on the back of the unit. Only
the switch nearest the speaker jack should be down. Finally,
connect the speaker and power supply to the Type N Talk unit. To
use the system, simply insert a disc containing the speech software
and turn on the Votrax and PC6300, and MS-DOS will automatically
incorporate the talking device driver. The entry in config.sys
specifies the size of the virtual screen. The line
"DEVICE=talkcon.dev 7" causes the device driver to allocate 7K for
its internal buffer. This is consistent with most ramdisc device
drivers.
4. DEVICE DRIVER COMMANDS
The user reads the accumulated text by entering various control
characters and function keys. Associating appropriate commands to
these keys is a significant human factors problem. Some key
assignments are user friendly; others are disastrous.
By default, function keys control reading, allowing a user to read
text or programs with single key strokes. Home key control
characters examine individual letters and words. The user can
directly verify text as it is being entered without abandoning the
home keys. Inconvenient control characters activate features that
are rarely used. The system doesn't interpret any <alt> keys,
since they are difficult to access quickly. The key assignments
are table driven, and easily modified. The file "talkcon.sys",
described in a later section, contains the key/command map.
The effect of each control character and function key is explained
below. In this section, the symbol '^' indicates a control
character, while F1 through F10 represent the ten function keys and
#0 through #9 represent the keys comprising the numeric keypad.
The term "cursor" always refers to the internal cursor, where text
is read.
F1: Positions the cursor at the start of the internal buffer.
This buffer is circular, and it "scrolls", like a large
character oriented screen.
F2: Moves up to the previous line and starts reading.
F3: Positions the cursor at the end of the internal buffer.
F4: Moves the cursor to the beginning of the current line and
starts reading.
F5: Reads the last complete line in the buffer. This allows the
user to skip blank lines and the prompt (if any), and read the
output from the previous command directly.
F6: Advances the cursor to the next line and starts reading.
F7: Clears the internal buffer.
F8: Moves the cursor down two lines and starts reading.
F9: Toggles the control character buffering mode. When enabled,
control characters in standard out are placed in the internal
buffer along with the text. By default, control characters
fall into the bit bucket. Newline and bell are always placed
in the buffer regardless of this parameter.
F10: Toggles the 1-line reading mode. When the mode is enabled,
the system stops reading after each line; otherwise it reads
to the end of the buffer. The user can always interrupt
reading by entering any command in this list.
^P: Announces the function of the next key entered. This allows a
new user to review the command keys in this list.
^S: Moves the cursor back one space and speaks the current
character.
^D: Speaks the character that the cursor is currently on.
^F: Moves the cursor forward one space and speaks the current
character.
^E: Moves the cursor up one row and speaks the current character.
^C: Moves the cursor down one row and speaks the current
character.
^R: Indicates the case of the letter pointed to by the cursor,
sounding the "enabled" tone if it is upper case.
^T: Speaks the word associated with the current character. This
prevents phonetic ambiguity, enabling the user to
differentiate letters easily. The NATO standard phonetic
alphabet is used (see table I).
^J: Moves the cursor back one token and speaks the current token.
A token is a sequence of letters or digits, or a punctuation
mark.
^K: Speaks the token that the cursor is currently on.
^L: Moves the cursor forward one token and speaks the current
token.
^W: Gives the cursor's location by column number. When entering
text, the sequence ^V ^W announces the current column (useful
for Fortran programming).
^Q: takes the next character entered and passes it to MS-DOS
directly. This feature is used to send control characters
(e.g. ^S, ^Q, ^C) to the operating system.
^V: Same as F3.
^O: Same as F4.
^N: Same as F8.
^B: Same as F5.
#0: Toggles the transparent mode. When enabled, the talking
device driver is transparent, passing control and function
keys to MS-DOS directly. This allows a sighted co-worker to
run visual editors (whatever) without rebooting. The sundry
sounds that usually accompany output are suppressed. In
short, the new talking device driver emulates the original
MS-DOS console device driver.
#1: Same as ^S.
#2: Same as ^D.
#3: Same as ^F.
#4: Same as ^E.
#6: Same as ^C.
#7: Same as ^J.
#8: Same as ^K.
#9: Same as ^L.
TABLE I
Phonetic Alphabet
alpha hotel oscar uniform
bravo india papa victor
charlie juliet quebec wiskey
delta kilo romeo x-ray
echo lima sierra yankee
foxtrot mike tango zulu
golf november
5. PRONUNCIATION TABLE
The user can direct the PC6300 to deliberately misspell words, so
they will be pronounced correctly. The table containing these
substitutions is kept in memory. This table also contains user
defined pronunciations for each punctuation mark. The program
"tcset" reads an ASCII file containing word and punctuation
pronunciations, and constructs these tables for the talking device
driver. The autoexec.bat script should execute this program to
initialize the device driver's tables. Of course, the program can
be run again at any time.
The tcset program reads the ASCII file "talkcon.sys" to obtain the
user defined pronunciations. This text file is line oriented, and
can be modified using your favorite edittor. The syntax of each
entry is: "old word", whitespace, "substituted text". The
substituted text consists of letters, numbers, or spaces. If a
line in the table contains "read reed", the PC6300 replaces the
word "READ" with the word "REED" in the speech stream. The line
"% percent" determines the word used for the symbol '%'. Lines
beginning with whitespace hold comments.
The substitution table in the device driver is limited to 2K, so
don't expect to correct every mispronunciation under the sun. The
software understands a few simple suffixes, such as regular
plurals. If talkcon.sys contains "read reed", the words "reads"
and "reading" will be modified accordingly. The software
recognizes "s", "es", "ies" (plurals), "d", "ed", "ied" (past
tense), and "ing" (participle).
The file talkcon.sys may also contain key/command assignments to
map particular functions to different keys. Again, entries are
line oriented, and they are of the form "key = command-number."
Keys are specified using the notation in the previous section (e.g.
^V, F3, #4). Available commands and their corresponding numbers
are documented in the example talkcon.sys file provided with this
software package.
6. ACRONYMS
While reading, the device driver expands (apparently)
unpronounceable words into their constituent letters. Thus, many
acronyms and obscure variable names will be spelled out. The
pronounceability test is quite simplistic, examining only the first
four letters of each word. If these letters are all vowels, or all
consonants, the word is spelled. If two or three vowels are
present, the word is pronounced. When exactly one vowel is
present, the word is spelled, unless the consonant cluster matches
a predefined English cluster (table lookup). This simple algorithm
usually works well. As always, the user can place specific
variables or acronyms in the replacement table.
7. TERMINAL EMULATION
In theory, any terminal emulator can be run unmodified,
transforming the talking PC6300 into a talking terminal.
Unfortunately, most terminal programs monopolize the function keys.
Furthermore, they often provide cursor control, paging, and many
other unwanted visual features. A simple, no frills terminal
emulator that avoids function and control keys would improve
productivity considerably. Such a program has been written, and is
included in this software package. When running, it simply
shuffles characters from stdin to the serial port, and from the
serial port to stdout. Since interrupt routines control serial
I/O, characters are not lost while the device driver reads the
accumulated text. X-on / X-off flow control is implemented in both
directions. Alt keys activate a few simple features.
alt-X: Exit the terminal program and return to MS-DOS. Data
terminal ready is disabled, equivalent to hanging up.
alt-L: Leave the terminal emulator temporarily, and return to MS-
DOS. Data terminal ready remains active; the user can
return to the terminal session at any time.
alt-B: Send a break.
alt-S: Display modem status. The characters A, C, and S represent
active (data set ready), carrier detect, and clear to send
respectively.
alt-R: Toggle the baud rate. The serial I/O data rate toggles
between 1200 baud and 300 baud. Since the talking console
device driver often runs with interrupts disabled for
several milliseconds, higher baud rates are not supported.
Except for file transfers, a higher baud rate would not
improve the productivity of the blind worker.
alt-D: Download a file. Characters from the serial port are
redirected into the named file. The path from stdin to the
serial port is unaffected. When the emulator receives ^Z
(MS-DOS EOF), it closes the file and sends subsequent
characters to stdout as before. The following sequence can
be used to download a text file from a Unix machine:
1. Enter "stty -echo tab0".
2. Hit alt-D, followed by the file name.
3. Enter "cat file ; echo '\032'".
4. Watch the progress display (one '.' per kilobyte
transferred), and wait for the Unix prompt.
5. Reset the stty parameters.
alt-U: Upload a file. Alt-U followed by a file name sends
characters from the named file to the serial port. The path
from the serial port to stdout is unaffected. Characters
entered at the keyboard are discarded, although the alt keys
in this list are still interpreted. If a disaster occurs,
the user can always exit the terminal program using alt-X.
As before, echoing and tab expansion should be disabled.
Use the Unix command "cat >file" to capture the uploaded
text. When the file is transferred (indicated by a carriage
return), enter ^D (Unix EOF) to close the Unix file.
Industry standard file transfer mechanisms (e.g. ctrm) might be
preferable. They are less flexible (not every host machine is so
equipped), but they detect and correct errors, and are more robust.
8. SOFTWARE
The software is written in Microsoft assembly, version 3.0 or
above. To build the driver, assemble the four source files, link
the resulting object files, and run exe2bin to produce the device
driver. Talkcon.obj must be loaded first. No external library or
startup routines are required. The programs tcset.c and savebuf.c
are written in Microsoft C, version 4.0 or above.
The software package consists of the following sourcefiles:
MKTALK.BAT: Batch script to build the device driver.
TALKCON.ASM: Device driver interface functions for MS-DOS.
EVENTS.ASM: Routines that process speech commands at keyboard
interrupt level.
READING.ASM: Routines that control continuous reading at real time
interrupt level.
SYNTH.ASM: Interface functions that control the specific speech
synthesizer.
PARMS.H: Header file containing parameters for the talking
device driver.
TCSET.C: Program that reads an ASCII file of pronunciation
corrections, and constructs the corresponding device
driver tables.
TALKCON.SYS: Ascii file containing the pronunciation corrections
and the key/command map.
SAVEBUF.C: Program that takes the accumulated output in the
device driver's buffer, and stores it in a text file.
TERMINAL.ASM: Simple terminal emulator.
9. CAVEATS
@ Some application programs bypass the device driver, displaying
output via the BIOS routines, or writing directly into screen
memory. There is no way to read output produced in this
manner.
@ The "transparent mode" command cannot be reassigned to another
key. However, another function, including the nul function,
can be assigned to #0, eliminating the "transparent" option.
@ For some reason, #5 is not easily accessible. Therefore,
commands cannot be assigned to #5.
@ When the internal cursor is positioned at the top of the
circular buffer, the scrolling buffer drags the cursor along
as the PC6300 generates additional text. Thus, the cursor
remains at the top of the buffer. Since scrolling rates
exceed human speech, reading text at the top of the buffer
while the buffer scrolls can be quite an interesting
experience.
@ A few important real time functions are controlled by CPU
loops rather than timer interrupts. If this device driver is
ported to another IBM PC compatible with a different clock
rate, the "CLKRATE" macro in parms.h must be redefined before
building the software.
@ It is remotely possible to encounter a dangerous race
condition if text is being read while the "tcset" program
modifies the pronunciation tables.
@ Since the MS-DOS keyboard buffer is frustratingly small (15
characters), this device driver contains its own interrupt
level type ahead buffer. The KBSIZ parameter in parms.h
determines the size of this buffer (currently 120 characters).
This allows function keys to remain operational when the type
ahead buffer is full.
@ The beeps, clicks, and constant chatter may drive your friends
crazy.
--
You know ... if it ain't patina, it's verdigris.
Karl Dahlke ihnp4!ihnet!eklhad
More information about the Comp.sources.unix
mailing list