What does "spell" do wrong?
John Cowan
cowan at marob.MASA.COM
Thu May 18 04:31:16 AEST 1989
In article <1007 at kuling.UUCP> irf at kuling.UUCP (Bo Thide') writes:
>In article <7084 at saturn.ucsc.edu> jaap at chromo.UUCP (Jacob Wilbrink) writes:
>>I've been wondering what the program "spell" does, since it
>>seems to make very many errors. Some examples of words it thinks
>>are spelled correctly are
>>
>>utomsrr
>>mgdesou
>>aneorxx
>
>All these words are caught as misspelled by the HP-UX version of spell(1).
>
My version of 'spell' catches them also. However, in defense of the program,
it is not designed to be 100% reliable. 'Spell' uses a hashing scheme.
Each word is stripped of prefixes and suffixes, and the resulting base form
is hashed and looked up in a bit table. If the bit is 0, the word is
certainly misspelled; if the bit is 1, the word is assumed correct. There
are 30,000 1-bits in a 10^27 bit table, so the probability of false positives
is about 1/4000.
According to Doug McIlroy, the author of 'spell', a typical document contains
20 misspelled words or less. Therefore, about 1% of documents contain a
misspelled word that is not reported.
Source: Jon Bentley, >Programming Pearls<, ISBN 0-201-10331-1.
More information about the Comp.unix.questions
mailing list