What does "spell" do wrong?

John Cowan cowan at marob.MASA.COM
Thu May 18 04:31:16 AEST 1989


In article <1007 at kuling.UUCP> irf at kuling.UUCP (Bo Thide') writes:
>In article <7084 at saturn.ucsc.edu> jaap at chromo.UUCP (Jacob Wilbrink) writes:
>>I've been wondering what the program "spell" does, since it
>>seems to make very many errors. Some examples of words it thinks
>>are spelled correctly are
>>
>>utomsrr
>>mgdesou
>>aneorxx
>
>All these words are caught as misspelled by the HP-UX version of spell(1).
>

My version of 'spell' catches them also.  However, in defense of the program,
it is not designed to be 100% reliable.  'Spell' uses a hashing scheme.
Each word is stripped of prefixes and suffixes, and the resulting base form
is hashed and looked up in a bit table.  If the bit is 0, the word is 
certainly misspelled; if the bit is 1, the word is assumed correct.  There
are 30,000 1-bits in a 10^27 bit table, so the probability of false positives
is about 1/4000.

According to Doug McIlroy, the author of 'spell', a typical document contains
20 misspelled words or less.  Therefore, about 1% of documents contain a
misspelled word that is not reported.

Source:  Jon Bentley, >Programming Pearls<, ISBN 0-201-10331-1.



More information about the Comp.unix.questions mailing list