Problem with spell
Guy Harris
guy%gorodish at Sun.COM
Sat Mar 7 07:43:53 AEST 1987
> There is really no solution, short of a total (and perhaps needed) rewrite
> of spell, a program that originated in the dark ages on a PDP without
> separate I & D space.
It's worth noting that other approaches to spelling checkers have
been successful, even on small machines. A company called Proximity
Technology (formerly Proximity Devices; I presume they're still
around) build a spelling checker that just looked words up in a
dictionary. The first version made one pass over the document to
gather a sorted list of words with duplicates eliminated. The second pass
went through that list and eliminated words not found in the
dictionary; the dictionary was compressed using several techniques
(common prefix elimination, Huffman-coding, etc.) so an ~80,000 word
list took only (if I remember correctly) about 170KB. The third pass
went through the document and printed the lines containing words in
the remaining list of misspelled words.
The bulk of the time was spent in the decompression; the word list
was indexed by the first three characters of the word, so lookup was
reasonably quick. At CCI we incorporated this code into an
interactive spelling checker that was part of our word processor; you
could tell it to check from the current cursor position to the end of
the document, or mark a block of text and tell it to check that
block, and the third pass would stop at each potentially-misspelled
word and offer you the chance to correct it. The word list could
also be purchased with hyphenation points included; our word
processor could be told to insert discretionary hyphens at these
points. Other people, including vendors of word processors based on
micros (definitely 16-bit micros, perhaps 8-bit ones), also included
this code in their products.
Proximity then rewrote it completely to use one pass; it just took
every word in the document and looked it up in the word list. They
did this in order to make it work better when used interactively; the
old version took a while to complete passes 1 and 2, so you had to
sit there for a while before it started showing potential
misspellings. They also used different compressing techniques that
weren't so CPU-intensive; when incorporated into CCI's word
processor, the new one started showing the errors fairly quickly. I
believe this version traded space for time, by entering a match/nomatch
indication into an in-memory cache, so I don't know whether it would
have worked well on a small-address-space or small-physical-memory
machine.
More information about the Comp.unix.questions
mailing list