ispell repost (less dict) 01/02: enhanced, fixed
geoff at desint.UUCP
geoff at desint.UUCP
Sat Mar 14 20:01:00 AEST 1987
: This is a definitive integrated/enhanced ispell (except the dictionary).
: Everybody else's work has been installed, and many other bugs have
: been fixed. I have also written a spelling-list suffix muncher.
: See the first file in the shar (UPDATE) for more details.
:
: Also, don't forget to pick up my three companion postings of dictionary
: diff's in net.sources.bugs.
:
: Geoff Kuenning
: {hplabs,ihnp4}!trwrb!desint!geoff
:
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
# UPDATE
# Makefile
# ispell.man
# README
# WISHES
# expand.awk
# expand1.sed
# expand2.sed
# munchlist.sh
# ispell.el
# buildhash.c
# This archive created: Sat Mar 14 00:58:44 1987
export PATH; PATH=/bin:$PATH
echo shar: extracting "'UPDATE'" '(5252 characters)'
if test -f 'UPDATE'
then
echo shar: will not over-write existing file "'UPDATE'"
else
sed 's/^X //' << \SHAR_EOF > 'UPDATE'
X Ispell enhancements - 3/13/87
X
X (See three companion postings in net.sources.bugs).
X
X Here are the enhancements to ispell that I mentioned a couple of days ago.
X Because of the number of changes, several of the context diff's are bigger
X than the original files. In addition, many people have gotten confused
X about versions, since enhancements/fixes have been made by six different
X people, counting myself (for the list, see the end of ispell.man). I
X have integrated all of these fixes and enhancements in one place.
X
X For these reasons, I have decided to repost all of the sources for ispell,
X with one exception -- the dictionary. (A couple of small files, such
X as ispell.el, are unchanged, but I decided to repost them any for
X completeness. If you didn't have ispell before, you now need only the
X dictionary).
X
X The dictionary is a special case: if you think about it, even ordinary
X diff's will always work with "patch" on that each-line-is-unique file.
X An out-of-place insertion can be corrected by sorting the dictionary
X after patching (something that is done anyway as a side effect of the
X new "munchlist" script). Because of this, I have decided not to repost
X the sizable dictionary. In the process of testing this code, it occurred
X to me to run dict.191 through UNIX "spell"; the results of that are
X given in three companion postings in net.sources.bugs, which seemed
X like a more appropriate place for the diffs. (The postings are not
X divided because of their size; see comments in the postings for my
X reasons).
X
X Now, here's what I've done:
X
X In ispell itself:
X
X - The personal dictionary is now hashed, just like the main one, and
X supports suffixes just like the main one. (It's not actually
X integrated with the main one, because expanding the main one
X is inefficient and poses a minor but troublesome technical
X problem). A personal dictionary of 28000+ words can be read in
X within a few minutes (hey, nobody's perfect -- whatcha doing
X with such a big dictionary anyway? :-).
X - New option "-c" is used by the new munchlist script to generate
X suggested root/suffix combinations.
X - The -d option can now specify /dev/null, if you want to use
X only your personal dictionary (this also saves startup time
X with -c, and is used by the "munchlist" script, which is why
X I put it in).
X - The -p option is now more flexible about its handling of pathnames.
X An absolute pathname is always interpreted literally. A
X relative pathname from WORDLIST is looked up in $HOME first,
X then in the current directory. The -p option behaves in the
X reverse fashion: current directory first, then $HOME. This
X behavior seems more intuitive to me; I'd be interested in
X opinions of others if you don't find it intuitive.
X - Perhaps most important, I have completely overhauled the logic
X in good.c, so that it (I think) matches what the README file
X says it should, no more, no less. The code has been extensively
X tested, notably by interaction with the new expansion scripts;
X nevertheless because of the extent of the changes and the
X nature of the logic, I'd suggest a bit of suspicion for a while.
X A technique we've found useful here is to do your normal work
X with ispell, and then do a final check with UNIX spell or some
X other slow, inconvenient program to make sure ispell didn't
X screw up.
X
X New scripts:
X
X - expand.awk: an obsolete (but correct) awk script that does
X the same thing as expand[12].sed, except slower. The awk
X script is also much easier to understand than the sed scripts.
X Superseded by the sed scripts, except for very short input.
X - expand[12].sed: the sed pipe
X
X "sed -f expand1.sed $file | sed -f expand2.sed"
X
X where "$file" is a raw dictionary file with suffixes
X (e.g., dict.191), generates a list of each root alone, plus
X the root expanded with each possible suffix (e.g.,
X "BOTH/R/Z" produces "BOTH", "BOTHER", and "BOTHERS"). The
X output should usually be sorted with the -u switch before
X further processing. These scripts are used by 'munchlist';
X they are also useful for (a) checking an ispell dictionary
X with some other spell-checking program and (b) figuring
X out what a particular suffix does to a certain word without
X reading the README file.
X - munchlist.sh: a slow, but effective, shell script that takes
X lists of expanded or unexpanded words as input and reduces
X them to a (usually smaller) list of roots and suffixes. The
X result is written to standard output. I think the documentation
X forgot to mention the input must be one word per line. I
X have successfully used this script to combine dict.191 with
X /usr/dict/words; it's also useful (and a lot faster) on
X private dictionaries. For private dictionaries. it will also
X remove any word that has since been added to the main dictionary.
X
X Oh yes, I almost forgot: the original documentation didn't mention
X that ispell is a long-name program. If your "File:" display on the
X top line actually contains the misspelled word, you have long-name problems.
X My fixes don't address long names, because I finally have a way to
X compile long-name programs, thanks to "hash8".
X
X Geoff Kuenning
X geoff at ITcorp.COM
X ...!trwrb!desint!geoff
SHAR_EOF
if test 5252 -ne "`wc -c < 'UPDATE'`"
then
echo shar: error transmitting "'UPDATE'" '(should have been 5252 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'Makefile'" '(1198 characters)'
if test -f 'Makefile'
then
echo shar: will not over-write existing file "'Makefile'"
else
sed 's/^X //' << \SHAR_EOF > 'Makefile'
X # -*- Mode: Text -*-
X
X # Look over config.h before building.
X #
X # LIBDIR, DEFHASH, DEFDICT should match definitions in config.h.
X #
X # The ifdef NO8BIT may be used if 8 bit extended text characters
X # cause problems, or you simply don't wish to allow the feature.
X #
X # the argument syntax for buildhash to make alternate dictionary files
X # is simply:
X #
X # buildhash <infile> <outfile>
X
X CFLAGS = -O
X BINDIR = /usr/local/bin
X LIBDIR = /usr/local/lib
X DEFHASH = ispell.hash
X DEFDICT = dict.191
X
X # TERMLIB = -lcurses
X TERMLIB = -ltermlib
X all: buildhash ispell $(DEFHASH)
X
X ispell.hash: buildhash $(DEFDICT)
X buildhash
X
X install: buildhash ispell $(DEFHASH)
X cp ispell ${BINDIR}/ispell
X cp munchlist.sh $(BINDIR)/munchlist
X cp ispell.hash ${LIBDIR}/${DEFHASH}
X cp expand1.sed expand2.sed $(LIBDIR)
X chmod 755 ${BINDIR}/ispell $(BINDIR)/munchlist
X chmod 644 ${LIBDIR}/$(DEFHASH) $(LIBDIR)/expand1.sed \
X $(LIBDIR)/expand2.sed
X
X buildhash: buildhash.o hash.o
X cc -o buildhash buildhash.o hash.o
X
X ispell: ispell.o term.o good.o lookup.o hash.o tree.o
X cc $(CFLAGS) -o ispell ispell.o term.o good.o lookup.o \
X hash.o tree.o $(TERMLIB)
X
X clean:
X rm -f *.o buildhash ispell core a.out mon.out hash.out \
X *.stat *.cnt
SHAR_EOF
if test 1198 -ne "`wc -c < 'Makefile'`"
then
echo shar: error transmitting "'Makefile'" '(should have been 1198 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'ispell.man'" '(8455 characters)'
if test -f 'ispell.man'
then
echo shar: will not over-write existing file "'ispell.man'"
else
sed 's/^X //' << \SHAR_EOF > 'ispell.man'
X .\" -*- Mode:Text -*-
X .\"
X .TH ISPELL local MIT
X .SH NAME
X ispell \- Correct spelling for a file
X .br
X munchlist \- Combine suffixes in a spelling list
X .SH SYNOPSIS
X .B ispell
X [
X .B \-x
X |
X .B \-d
X file |
X .B \-p
X file |
X .B \-w
X chars ] file .....
X .br
X .B ispell
X [
X .B \-d
X file |
X .B \-p
X file |
X .B \-w
X chars ]
X .B \-l
X .br
X .B ispell
X [
X .B \-d
X file |
X .B \-p
X file
X ]
X .B \-a
X .br
X .B ispell
X [
X .B \-d
X file |
X .B \-p
X file |
X .B \-w
X chars ]
X .B \-c
X .br
X .B munchlist
X [
X .B \-d
X file |
X .B \-e
X |
X .B \-w
X chars ]
X [ files ]
X .SH DESCRIPTION
X .PP
X .I Ispell
X is fashioned after the
X .I spell
X program from ITS (called
X .I ispell
X on Twenex systems.) The most common usage is "ispell filename". In this
X case,
X .I ispell
X will display each word which does not appear in the dictionary, and
X allow you to change it. If there are "near misses" in the dictionary
X (words which differ by only a single letter, a missing or extra letter,
X or a pair of transposed letters), then they are also displayed. If you
X think the word is correct as it stands, you can type either "Space" to
X accept it this one time, or "I" to accept it and put it in your private
X dictionary. If one of the near misses is the word you want, type the
X corresponding number. Finally, if none of these choices is right, you
X can type "R" and you will be prompted for a replacement word.
X If you want to see a list of words that might be close using wildcard
X characters, type "L" to lookup a word in the system dictionary.
X .PP
X When a misspelled word is found, it is printed at the top of the screen.
X Any near misses will be printed on the following lines, and finally, two
X lines containing the word are printed at the bottom of the screen. If
X your terminal can type in reverse video, the word itself is highlighted.
X .PP
X The
X .B \-l
X or "list" option to
X .I ispell
X is used to produce a list of misspelled words from the standard input.
X .PP
X The
X .B \-a
X is intended to be used from other programs through a pipe. In this
X mode,
X .I ispell
X expects the standard input to consist of single words. Each word is
X read, and a single line is written to the standard output. If the word
X was found in the main dictionary, or your personal dictionary, then the
X line contains only a '*'. If the word was found through suffix removal,
X then the line contains a '+', a space, and the root word. If the word
X is not in the dictionary, but there are near misses, then the line
X contains an '&', a space, and a list of the near misses separated by
X spaces. Also, each near miss is capitalized the same as the input
X words. Finally, if the word neither appears in the dictionary, and
X there are no near misses, then the line contains only a '#'. This mode
X is also suitable for interactive use when you want to figure out the
X spelling of a single word. (These characters are the same as the codes
X that the real spell program uses.)
X .PP
X The
X .B \-x
X option causes
X .I ispell
X to remove the .bak file that it normally leaves. The .bak file contains
X the pre-corrected text. If there are file opening / writing errors,
X the .bak file may be left for recovery purposes even with the -x option.
X .PP
X The
X .B \-d
X option is used to specify an alternate hashed dictionary file,
X other than the default. If the filename does not begin with a "/",
X the library directory for the default dictionary file is prefixed.
X This is useful to allow dictionaries which prefer alternate british
X spellings ("centre", "tyre", etc), or add lists of special-purpose
X jargon and acronyms for subclasses of documents. There are some shortcomings
X in attempting to provide foreign-language dictionaries, but something
X like "-dfrench" could be made to work somewhat.
X The
X .B \-d
X option may specify
X .IR /dev/null ,
X in which case the dictionary is limited to the personal one.
X This may be useful for certain private dictionaries.
X .PP
X The
X .B \-p
X option is used to specify an alternate personal dictionary file.
X If the file name does not begin with "/", $HOME is prefixed. Also, the
X shell variable WORDLIST may be set, which renames the personal dictionary
X in the same manner. The command line overrides WORDLIST setting. If
X neither is present "ispell.words" is used.
X .PP
X The
X .B \-w
X option may be used to specify characters other than alphabetics
X which may also appear in words. For instance,
X .B \-w
X "&" will allow "AT&T"
X to be picked up. Underscores are useful in many technical documents.
X There is an admittedly crude provision in this option for 8-bit international
X characters. If "n" appears in the character string, the three characters
X following are a DECIMAL code 0 - 255, for the character. There must be
X three decimal characters in all cases, so you have to prepend with 0's,
X for instance, to include bells and formfeeds in your words (an admittedly
X silly thing to do, but aren't most pedagogical examples):
X .PP
X n007n012
X .PP
X Numeric digits other than the three following "n" are simply numeric
X characters. Use of "n" does not conflict with anything because actual
X alphabetics have no meaning - alphabetics are already accepted.
X .I Ispell
X will typically be used with input from a file, meaning that preserving
X parity for possible 8 bit characters from the input text is OK. If you
X specify the -l option, and actually type text from the terminal, this may
X create problems if your stty settings preserve parity.
X .PP
X The
X .B \-c
X option is primarily intended for use by the
X .I munchlist
X shell script.
X In this mode, a list of words is read from the standard input.
X For each word, a list of possible root words and suffixes will be
X written to the standard output.
X Some of the root words will be illegal and must be filtered from the
X output by other means;
X the
X .I munchlist
X script does this.
X As an example, the command "echo BOTHER | ispell -c" produces:
X .PP
X .RS
X .nf
X BOTH
X BOTHE/R
X BOTH/R
X .fi
X .RE
X .PP
X The
X .I munchlist
X shell script is used to reduce the size of dictionary files,
X primarily personal dictionary files.
X It is also capable of combining dictionaries from various sources.
X The given
X .I files
X are read (standard input if no arguments are given),
X reduced to a minimal set of roots and suffixes that will match the
X same list of words, and written to standard output.
X .PP
X Normally, words that are in the default dictionary are removed by
X .I munchlist
X during processing.
X If the list is to be used with a different dictionary, the
X .B \-d
X option can be used to specify an alternate (hashed) dictionary file
X containing words to be removed from the output list.
X If a dictionary file of
X .I /dev/null
X is specified, no words will be removed from the output;
X this is useful when munching the primary dictionary file.
X .PP
X The
X .B \-w
X option is passed on to
X .IR ispell .
X The
X .B \-e
X ("efficient") option causes the script to use a slower algorithm that uses
X somewhat less space in TMPDIR (normally
X .IR /usr/tmp ")."
X .PP
X It is possible to install
X .I ispell
X in such a way as to only support ASCII range text if desired.
X .SH DEFAULT FILES
X /usr/public/lib/ispell.hash
X .br
X /usr/dict/web2 for the Lookup function
X .br
X $HOME/ispell.words user's private dictionary
X .br
X /usr/public/lib/expand[12].sed sed scripts for expanding suffixes
X .SH SEE ALSO
X spell(1), egrep(1), look(1)
X .SH BUGS
X It takes about five seconds for
X .I ispell
X to read in the hash table.
X .sp
X Perhaps more than ten choices should be allowed for near misses.
X .sp
X The hash table is stored as a quarter-megabyte array, so a PDP-11
X version does not seem likely.
X .sp
X .I Ispell
X should understand more
X .I troff
X syntax, and deal more intelligently with contractions.
X .sp
X While alternate dictionaries for foreign languages could be defined, and
X the international characters included in words, rules concerning
X word endings / pluralization accommodate english only.
X .sp
X .I Munchlist
X is very slow, and requires tremendous amounts of temporary file space for
X large dictionaries.
X It does respect the TMPDIR environment variable, so this space can be
X redirected.
X However, a lot of the temporary space it needs is for sorting, so TMPDIR
X is only a partial help on systems with an uncooperative
X .IR sort (1).
X As a benchmark, the 15000-word
X .I dict.191
X takes about 1200 blocks in TMPDIR, and 2000 in
X .IR sort "'s"
X temporary directories.
X On a 68000 workstation, it runs for the better part of an hour.
X Munching
X .I dict.191
X with
X .I /usr/dict/words
X (28000 words output)
X took another 1500 blocks or so, and ran for about three hours.
X .SH AUTHOR
X Pace Willisson (pace at mit-vax)
X .br
X Enhanced by James Woods, Bob McQueer, Bill Randle, Marc Ries, Rob McMahon,
X and Geoff Kuenning.
SHAR_EOF
if test 8455 -ne "`wc -c < 'ispell.man'`"
then
echo shar: error transmitting "'ispell.man'" '(should have been 8455 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'README'" '(6256 characters)'
if test -f 'README'
then
echo shar: will not over-write existing file "'README'"
else
sed 's/^X //' << \SHAR_EOF > 'README'
X -*- Mode:Text -*-
X
X Ispell consists of two programs: the actual spelling checker, "ispell",
X and the hash table builder, "buildhash". Everything is set up so you
X can just say "make install" and have everything happen. You might want
X to edit the makefile, and ispell.h to change the destination of the
X program and the hash table.
X
X The dictionary comes from the ITS spell dictionary. I got it from
X "ml:wba;dict 191", although I don't know that this is the copy currenty
X in use on the 20's around MIT.
X
X ----------------------------------------------------------------------
X
X Addendum:
X
X My eternal gratitude to the author of ispell -- I don't know how I
X ever lived without it. I received his permission to post ispell to
X the net and have added a GNU EMACS interface. Look in the file
X ispell.el for installation instructions.
X
X As far as I know, no one informally "supports" this program. If you
X would like to "adopt" it (collect fixes/enhancements and post a new
X version periodically), feel free to do so.
X
X I volunteer to collect dictionary diffs and post a composite diff
X periodically. If you add a lot of words to the main dictionary, send
X me the diffs between the the modified dictionary and the posted one.
X Also, if you have access to a TOPS20 system with a more complete
X dictionary in ispell format, send me the diffs if you can. Just
X PLEASE don't dump an entire dictionary to our site!
X
X The dictionary posted is one I snarfed from around here -- after
X comparison with the one originally supplied, ours appears a tad more
X complete and accurate.
X
X Walt Buehring
X Texas Instruments - Computer Science Center
X
X ARPA: Buehring%TI-CSL at CSNet-Relay
X UUCP: {smu, texsun, im4u, rice} ! ti-csl ! buehring
X
X ----------------------------------------------------------------------
X
X The following is the only documentation I could find about the format
X of the dictionary. It was written for the TOPS20 speller that ispell
X mimics, so I believe most the information is applicable. It should be
X useful if you want to add words to the main dictionary by hand. -WB
X
X ----------------------------------------------------------------------
X
X 11.6 Dictionary flags
X
X Words in SPELL's main dictionary (but not the other dictionaries) may
X have flags associated with them to indicate the legality of suffixes
X without the need to keep the full suffixed words in the dictionary. The
X flags have "names" consisting of single letters. Their meaning is as
X follows:
X
X Let # and @ be "variables" that can stand for any letter. Upper case
X letters are constants. "..." stands for any string of zero or more
X letters, but note that no word may exist in the dictionary which is not at
X least 2 letters long, so, for example, FLY may not be produced by placing
X the "Y" flag on "F". Also, no flag is effective unless the word that it
X creates is at least 4 letters long, so, for example, WED may not be
X produced by placing the "D" flag on "WE".
X
X "V" flag:
X ...E --> ...IVE as in CREATE --> CREATIVE
X if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
X
X "N" flag:
X ...E --> ...ION as in CREATE --> CREATION
X ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
X if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
X
X "X" flag:
X ...E --> ...IONS as in CREATE --> CREATIONS
X ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
X if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
X
X "H" flag:
X ...Y --> ...IETH as in TWENTY --> TWENTIETH
X if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
X
X "Y" FLAG:
X ... --> ...LY as in QUICK --> QUICKLY
X
X "G" FLAG:
X ...E --> ...ING as in FILE --> FILING
X if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
X
X "J" FLAG"
X ...E --> ...INGS as in FILE --> FILINGS
X if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
X
X "D" FLAG:
X ...E --> ...ED as in CREATE --> CREATED
X if @ .ne. A, E, I, O, or U,
X ... at Y --> ... at IED as in IMPLY --> IMPLIED
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#ED as in CROSS --> CROSSED
X or CONVEY --> CONVEYED
X "T" FLAG:
X ...E --> ...EST as in LATE --> LATEST
X if @ .ne. A, E, I, O, or U,
X ... at Y --> ... at IEST as in DIRTY --> DIRTIEST
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#EST as in SMALL --> SMALLEST
X or GRAY --> GRAYEST
X
X "R" FLAG:
X ...E --> ...ER as in SKATE --> SKATER
X if @ .ne. A, E, I, O, or U,
X ... at Y --> ... at IER as in MULTIPLY --> MULTIPLIER
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#ER as in BUILD --> BUILDER
X or CONVEY --> CONVEYER
X
X "Z FLAG:
X ...E --> ...ERS as in SKATE --> SKATERS
X if @ .ne. A, E, I, O, or U,
X ... at Y --> ... at IERS as in MULTIPLY --> MULTIPLIERS
X if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#ERS as in BUILD --> BUILDERS
X or SLAY --> SLAYERS
X
X "S" FLAG:
X if @ .ne. A, E, I, O, or U,
X ... at Y --> ... at IES as in IMPLY --> IMPLIES
X if # .eq. S, X, Z, or H,
X ...# --> ...#ES as in FIX --> FIXES
X if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
X ...@# --> ...@#S as in BAT --> BATS
X or CONVEY --> CONVEYS
X
X "P" FLAG:
X if @ .ne. A, E, I, O, or U,
X ... at Y --> ... at INESS as in CLOUDY --> CLOUDINESS
X if # .ne. Y, or @ = A, E, I, O, or U,
X ...@# --> ...@#NESS as in LATE --> LATENESS
X or GRAY --> GRAYNESS
X
X "M" FLAG:
X ... --> ...'S as in DOG --> DOG'S
X
X ----------------------------------------------------------------------
X
X [Whew! That's all very nice, but how about a quick reference... -WB]
X
X V - ive
X N - ion, tion, en
X X - ions, ications, ens
X H - th, ieth
X Y - ly
X G - ing
X J - ings
X D - ed
X T - est
X R - er
X Z - ers
X S - s, es, ies
X P - ness, iness
X M - 's
SHAR_EOF
if test 6256 -ne "`wc -c < 'README'`"
then
echo shar: error transmitting "'README'" '(should have been 6256 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'WISHES'" '(1211 characters)'
if test -f 'WISHES'
then
echo shar: will not over-write existing file "'WISHES'"
else
sed 's/^X //' << \SHAR_EOF > 'WISHES'
X Things remaining to be done to ispell:
X
X - The single biggest remaining deficiency (in my opinion) is the
X extensive misuse of 'strlen'. Strlen is often called repeatedly
X on the same string within a few lines of code. Worse, many
X routines accept a "length" parameter (which is usually passed
X by running 'strlen' within the arglist) but ignore it and
X actually require the string to be null-terminated. Somebody
X should do a systematic edit and clean this up. I wouldn't
X be surprised to learn that ispell spends 50% of its time in
X strlen.
X - The "munchlist" script can actually increase the size of a
X dictionary. For example, munching dict.191 (after my bugfixes
X to it) reduced the number of words by about 40, but increased
X the number of characters by a small percentage. This is
X because munchlist doesn't recognize duplicate suffixes that
X generate the same result, except for the three special
X "s-ending" suffixes J, Z, and X. For example, right now
X munchlist will make BATHER by adding the R suffix to both
X BATH and BATHE. It would be nice if munchlist could recognize
X the redundancy and reduce its output so that each word was made
X in only one way.
SHAR_EOF
if test 1211 -ne "`wc -c < 'WISHES'`"
then
echo shar: error transmitting "'WISHES'" '(should have been 1211 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'expand.awk'" '(5769 characters)'
if test -f 'expand.awk'
then
echo shar: will not over-write existing file "'expand.awk'"
else
sed 's/^X //' << \SHAR_EOF > 'expand.awk'
X BEGIN {FS = "/"}
X {
X print $1
X #Let # and @ be "variables" that can stand for any letter. Upper case
X #letters are constants. "..." stands for any string of zero or more
X #letters, but note that no word may exist in the dictionary which is not at
X #least 2 letters long, so, for example, FLY may not be produced by placing
X #the "Y" flag on "F". Also, no flag is effective unless the word that it
X #creates is at least 4 letters long, so, for example, WED may not be
X #produced by placing the "D" flag on "WE".
X size = length ($1)
X #
X # Break out the last two letters into "tail", and put
X # corresponding versions of the root with the tail trimmed
X # off into "trimmed". If they are vowels, set vowel[i].
X # (Actually, only vowel[2] is used).
X #
X for (i = 1; i < 3; i++)
X {
X tail[i] = substr ($1, size - i + 1, 1)
X if (tail[i] == "A" || tail[i] == "E" || tail[i] == "I" \
X || tail[i] == "O" || tail[i] == "U")
X vowel[i] = 1
X else
X vowel[i] = 0
X trimmed[i] = substr ($1, 1, size - i)
X }
X for (i = 2; i <= NF; i++)
X {
X if ($i == "V")
X {
X # ...E --> ...IVE as in CREATE --> CREATIVE
X # if # .ne. E, ...# --> ...#IVE as in PREVENT --> PREVENTIVE
X if (tail[1] == "E")
X print trimmed[1] "IVE"
X else
X print $1 "IVE"
X }
X else if ($i == "N" || $i == "X")
X {
X # ...E --> ...ION as in CREATE --> CREATION
X # ...Y --> ...ICATION as in MULTIPLY --> MULTIPLICATION
X # if # .ne. E or Y, ...# --> ...#EN as in FALL --> FALLEN
X # "X" flag:
X # ...E --> ...IONS as in CREATE --> CREATIONS
X # ...Y --> ...ICATIONS as in MULTIPLY --> MULTIPLICATIONS
X # if # .ne. E or Y, ...# --> ...#ENS as in WEAK --> WEAKENS
X if ($i == "N")
X plural = ""
X else
X plural = "S"
X if (tail[1] == "E")
X print trimmed[1] "ION" plural
X else if (tail[1] == "Y")
X print trimmed[1] "ICATION" plural
X else
X print $1 "EN" plural
X }
X else if ($i == "H")
X {
X # ...Y --> ...IETH as in TWENTY --> TWENTIETH
X # if # .ne. Y, ...# --> ...#TH as in HUNDRED --> HUNDREDTH
X if (tail[1] == "Y")
X print trimmed[1] "IETH"
X else
X print $1 "TH"
X }
X else if ($i == "Y")
X {
X # ... --> ...LY as in QUICK --> QUICKLY
X print $1 "LY"
X }
X else if ($i == "G" || $i == "G")
X {
X # ...E --> ...ING as in FILE --> FILING
X # if # .ne. E, ...# --> ...#ING as in CROSS --> CROSSING
X # "J" flag:
X # ...E --> ...INGS as in FILE --> FILINGS
X # if # .ne. E, ...# --> ...#INGS as in CROSS --> CROSSINGS
X if ($i == "G")
X plural = ""
X else
X plural = "S"
X if (tail[1] == "E")
X print trimmed[1] "ING" plural
X else
X print $1 "ING" plural
X }
X else if ($i == "D")
X {
X # ...E --> ...ED as in CREATE --> CREATED
X # if @ .ne. A, E, I, O, or U,
X # ... at Y --> ... at IED as in IMPLY --> IMPLIED
X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X # ...@# --> ...@#ED as in CROSS --> CROSSED
X # or CONVEY --> CONVEYED
X if (tail[1] == "E")
X print $1 "D"
X else if (tail[1] == "Y" && !vowel[2])
X print trimmed[1] "IED"
X else
X print $1 "ED"
X }
X else if ($i == "T")
X {
X # ...E --> ...EST as in LATE --> LATEST
X # if @ .ne. A, E, I, O, or U,
X # ... at Y --> ... at IEST as in DIRTY --> DIRTIEST
X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X # ...@# --> ...@#EST as in SMALL --> SMALLEST
X # or GRAY --> GRAYEST
X if (tail[1] == "E")
X print $1 "ST"
X else if (tail[1] == "Y" && !vowel[2])
X print trimmed[1] "IEST"
X else
X print $1 "EST"
X }
X else if ($i == "R" || $i == "Z")
X {
X # ...E --> ...ER as in SKATE --> SKATER
X # if @ .ne. A, E, I, O, or U,
X # ... at Y --> ... at IER as in MULTIPLY --> MULTIPLIER
X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X # ...@# --> ...@#ER as in BUILD --> BUILDER
X # or CONVEY --> CONVEYER
X # "Z" flag:
X # ...E --> ...ERS as in SKATE --> SKATERS
X # if @ .ne. A, E, I, O, or U,
X # ... at Y --> ... at IERS as in MULTIPLY --> MULTIPLIERS
X # if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X # ...@# --> ...@#ERS as in BUILD --> BUILDERS
X # or SLAY --> SLAYERS
X if ($i == "R")
X plural = ""
X else
X plural = "S"
X if (tail[1] == "E")
X print $1 "R" plural
X else if (tail[1] == "Y" && !vowel[2])
X print trimmed[1] "IER" plural
X else
X print $1 "ER" plural
X }
X else if ($i == "S")
X {
X # if @ .ne. A, E, I, O, or U,
X # ... at Y --> ... at IES as in IMPLY --> IMPLIES
X # if # .eq. S, X, Z, or H,
X # ...# --> ...#ES as in FIX --> FIXES
X # if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
X # ...@# --> ...@#S as in BAT --> BATS
X # or CONVEY --> CONVEYS
X if (tail[1] == "Y" && !vowel[2])
X print trimmed[1] "IES"
X else if (tail[1] == "S")
X print $1 "ES"
X else
X print $1 "S"
X }
X else if ($i == "P")
X {
X # if @ .ne. A, E, I, O, or U,
X # ... at Y --> ... at INESS as in CLOUDY --> CLOUDINESS
X # if # .ne. Y, or @ = A, E, I, O, or U,
X # ...@# --> ...@#NESS as in LATE --> LATENESS
X # or GRAY --> GRAYNESS
X if (tail[1] == "Y" && !vowel[2])
X print trimmed[1] "INESS"
X else
X print $1 "NESS"
X }
X else if ($i == "M")
X {
X # ... --> ...'S as in DOG --> DOG'S
X print $1 "'S"
X }
X }
X }
SHAR_EOF
if test 5769 -ne "`wc -c < 'expand.awk'`"
then
echo shar: error transmitting "'expand.awk'" '(should have been 5769 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'expand1.sed'" '(1607 characters)'
if test -f 'expand1.sed'
then
echo shar: will not over-write existing file "'expand1.sed'"
else
sed 's/^X //' << \SHAR_EOF > 'expand1.sed'
X /^[^/]*$/n
X /\/V/ {
X /^[^/]*E\// {
X s@\([^/]*\)E\([/A-Z]*\)/V@\1IVE\
X \1E\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/V@\1IVE\
X \1\2@; P; D
X }
X /\/N/ {
X /^[^/]*E\// {
X s@\([^/]*\)E\([/A-Z]*\)/N@\1ION\
X \1E\2@; P; D
X }
X /^[^/]*Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/N@\1ICATION\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/N@\1EN\
X \1\2@; P; D
X }
X /\/X/ {
X /^[^/]*E\// {
X s@\([^/]*\)E\([/A-Z]*\)/X@\1IONS\
X \1E\2@; P; D
X }
X /^[^/]*Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/X@\1ICATIONS\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/X@\1ENS\
X \1\2@; P; D
X }
X /\/H/ {
X /^[^/]*Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/H@\1IETH\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/H@\1TH\
X \1\2@; P; D
X }
X /\/Y/ {
X s@\([^/]*\)\([/A-Z]*\)/Y@\1LY\
X \1\2@; P; D
X }
X /\/G/ {
X /^[^/]*E\// {
X s@\([^/]*\)E\([/A-Z]*\)/G@\1ING\
X \1E\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/G@\1ING\
X \1\2@; P; D
X }
X /\/J/ {
X /^[^/]*E\// {
X s@\([^/]*\)E\([/A-Z]*\)/J@\1INGS\
X \1E\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/J@\1INGS\
X \1\2@; P; D
X }
X /\/D/ {
X /^[^/]*E\// {
X s@\([^/]*\)\([/A-Z]*\)/D@\1D\
X \1\2@; P; D
X }
X /^[^/]*[^AEIOU]Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/D@\1IED\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/D@\1ED\
X \1\2@; P; D
X }
X /\/T/ {
X /^[^/]*E\// {
X s@\([^/]*\)\([/A-Z]*\)/T@\1ST\
X \1\2@; P; D
X }
X /^[^/]*[^AEIOU]Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/T@\1IEST\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/T@\1EST\
X \1\2@; P; D
X }
X /\/R/ {
X /^[^/]*E\// {
X s@\([^/]*\)\([/A-Z]*\)/R@\1R\
X \1\2@; P; D
X }
X /^[^/]*[^AEIOU]Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/R@\1IER\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/R@\1ER\
X \1\2@; P; D
X }
SHAR_EOF
if test 1607 -ne "`wc -c < 'expand1.sed'`"
then
echo shar: error transmitting "'expand1.sed'" '(should have been 1607 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'expand2.sed'" '(622 characters)'
if test -f 'expand2.sed'
then
echo shar: will not over-write existing file "'expand2.sed'"
else
sed 's/^X //' << \SHAR_EOF > 'expand2.sed'
X /^[^/]*$/n
X /\/Z/ {
X /^[^/]*E\// {
X s@\([^/]*\)\([/A-Z]*\)/Z@\1RS\
X \1\2@; P; D
X }
X /^[^/]*[^AEIOU]Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/Z@\1IERS\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/Z@\1ERS\
X \1\2@; P; D
X }
X /\/S/ {
X /^[^/]*[^AEIOU]Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/S@\1IES\
X \1Y\2@; P; D
X }
X /^[^/]*[SXZH]\// {
X s@\([^/]*\)\([/A-Z]*\)/S@\1ES\
X \1\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/S@\1S\
X \1\2@; P; D
X }
X /\/P/ {
X /^[^/]*[^AEIOU]Y\// {
X s@\([^/]*\)Y\([/A-Z]*\)/P@\1INESS\
X \1Y\2@; P; D
X }
X s@\([^/]*\)\([/A-Z]*\)/P@\1NESS\
X \1\2@; P; D
X }
X /\/M/ {
X s@\([^/]*\)\([/A-Z]*\)/M@\1'S\
X \1\2@; P; D
X }
SHAR_EOF
if test 622 -ne "`wc -c < 'expand2.sed'`"
then
echo shar: error transmitting "'expand2.sed'" '(should have been 622 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'munchlist.sh'" '(6218 characters)'
if test -f 'munchlist.sh'
then
echo shar: will not over-write existing file "'munchlist.sh'"
else
sed 's/^X //' << \SHAR_EOF > 'munchlist.sh'
X : Use /bin/sh
X #
X # Given a list of words for ispell, generate a reduced list
X # in which all possible suffixes have been collapsed. The reduced
X # list will match the same list as the original.
X #
X # Usage:
X #
X # munchlist [ -d hashfile ] [ -e ] [ -w chars ] [ file ] ...
X #
X # Options:
X #
X # -d hashfile
X # Remove any words that are covered by 'hashfile'. The
X # default is the default ispell dictionary. The words
X # will be removed only if all suffixes are covered by
X # the hash file. A hashfile of /dev/null should be
X # specified when the main dictionary is being munched.
X # -e Economical algorithm. This will use much less temporary
X # disk space, at the expense of time. Useful with large files
X # (such as complete dictionaries).
X # -w Passed on to ispell (specify chars that are part of a word)
X #
X # The given input files are merged, then processed by 'ispell -c'
X # to generate possible suffix lists; these are then combined
X # and reduced. The final result is written to standard output.
X #
X # For portability to older systems, I have avoided getopt.
X #
X # Geoff Kuenning
X # 2/28/87
X #
X LIBDIR=/tmp2/lib # Must match config.h
X DEFDICT=dict.191 # Must match config.h
X EXPAND1=${LIBDIR}/expand1.sed
X EXPAND2=${LIBDIR}/expand2.sed
X TDIR=${TMPDIR:-/usr/tmp}
X TMP=${TDIR}/munch$$
X
X cheap=no
X dictopt=
X wchars=
X while [ $# != 0 ]
X do
X case "$1" in
X -d)
X case "$2" in
X /dev/null)
X dictopt=NONE
X ;;
X *)
X dictopt="-d $2"
X ;;
X esac
X shift
X ;;
X -e)
X cheap=yes
X ;;
X -w)
X wchars="-w $2"
X shift
X ;;
X *)
X break
X esac
X shift
X done
X #
X # Awk program to combine suffixes onto one line
X #
X AWKMUNCH='
X {
X if ($1 != old1 && old1 != "")
X {
X print old1 suffixes
X suffixes = ""
X }
X old1 = $1
X for (i = 2; i <= NF; i++)
X suffixes = suffixes "/" $i
X }
X END { if (old1 != "") print old1 suffixes }'
X #
X # Awk program to break suffixes up into one per line
X #
X AWKUNMUNCH='
X {
X print $1
X for (i = 2; i <= NF; i++)
X print $1 "/" $i
X }'
X trap "/bin/rm -f ${TMP}*; exit 1" 1 2 15
X #
X # Collect all the input (cat), convert to uppercase (tr), expand all
X # the suffix options (two sed's), and preserve (sorted) for later
X # joining. Unless an explicitly null dictionary was specified, remove
X # all expanded words that are covered by the dictionary (ispell).
X #
X if [ "X$dictopt" = "XNONE" ]
X then
X cat "$@" | tr '[a-z]' '[A-Z]' \
X | sed -f $EXPAND1 | sed -f $EXPAND2 | sort -u > ${TMP}a
X else
X cat "$@" | tr '[a-z]' '[A-Z]' \
X | sed -f $EXPAND1 | sed -f $EXPAND2 | sort -u \
X | ispell -l $dictopt -p /dev/null > ${TMP}a
X fi
X #
X # Munch the input to generate roots and suffixes (ispell -c). We are
X # only interested in words that have at least one suffix (egrep /); the
X # next step will pick up the rest. Some of the roots are illegal. We
X # use join to restrict the output to those root words that are found
X # in the original dictionary.
X #
X # Note: one disadvantage of this pipeline is that for a large file,
X # the join and awk may be sitting around for a long time while ispell
X # and sort run. You can get rid of this by splitting the pipe, at
X # the expense of more temp file space.
X #
X if [ $cheap = yes ]
X then
X ispell $wchars -c -d /dev/null -p /dev/null < ${TMP}a \
X | egrep / | sort -u -t/ +0 -1 +1 \
X | join -t/ - ${TMP}a | awk -F/ "$AWKMUNCH" > ${TMP}b
X else
X ispell $wchars -c -d /dev/null -p /dev/null < ${TMP}a \
X | egrep / | sort -u -t/ +0 -1 +1 \
X | join -t/ - ${TMP}a > ${TMP}b
X fi
X #
X # There is now one slight problem: the suffix flags X, J, and Z
X # are simply the addition of an "S" to the suffixes N, G, and R,
X # respectively. This produces redundant entries in the output file;
X # for example, ABBREVIATE/N/X and ABBREVIATION/S. We must get rid
X # of the unnecessary duplicates. The candidates are those words that
X # have only an "S" flag (egrep). We strip off the "S" (sed), and
X # generate a list of roots that might have made these words (ispell -c).
X # Of these roots, we select those that have the N, G, or R flags,
X # replacing each with the plural equivalent X, J, or Z (sed -n).
X # Using join once again, we select those that have legal roots
X # and put them in ${TMP}c.
X #
X if [ $cheap = yes ]
X then
X egrep '^[^/]*/S$' ${TMP}b | sed 's@/S$@@' \
X | ispell -c -d /dev/null -p /dev/null \
X | sed -n -e '/\/N/s/N$/X/p' -e '/\/G/s/G$/J/p' -e '/\/R/s/R$/Z/p' \
X | sort -u -t/ +0 -1 +1 \
X | join -t/ - ${TMP}a \
X | awk -F/ "$AWKMUNCH" > ${TMP}c
X else
X egrep '^[^/]*/S$' ${TMP}b | sed 's@/S$@@' \
X | ispell -c -d /dev/null -p /dev/null \
X | sed -n -e '/\/N/s/N$/X/p' -e '/\/G/s/G$/J/p' -e '/\/R/s/R$/Z/p' \
X | sort -u -t/ +0 -1 +1 \
X | join -t/ - ${TMP}a > ${TMP}c
X fi
X #
X # Now we have to eliminate the stuff covered by ${TMP}c from ${TMP}.
X # First, we re-expand the suffixes we just made (sed -f pair), and let
X # ispell re-create the /S version (ispell -c). We select the /S versions
X # only (egrep), sort them (sort) for comm, and use comm to delete these
X # from ${TMP}b. The output of comm (i.e., the trimmed version of
X # ${TMP}b) is combined with our special-suffixes file ${TMP}c (sort,
X # with preceding awk, if $cheap) and reduced in size (AWKMUNCH) to
X # produce a final list of all words that have at least one suffix.
X #
X if [ $cheap = yes ]
X then
X sed -f $EXPAND1 < ${TMP}c | sed -f $EXPAND2 \
X | ispell -c -d /dev/null -p /dev/null \
X | egrep '\/S$' | sort -u -t/ +0 -1 +1 | comm -13 - ${TMP}b \
X | awk -F/ "$AWKUNMUNCH" - ${TMP}c \
X | sort -u -t/ +0 -1 +1 - \
X | awk -F/ "$AWKMUNCH" > ${TMP}d
X else
X sed -f $EXPAND1 < ${TMP}c | sed -f $EXPAND2 \
X | ispell -c -d /dev/null -p /dev/null \
X | egrep '\/S$' | sort -u -t/ +0 -1 +1 | comm -13 - ${TMP}b \
X | sort -u -t/ +0 -1 +1 - ${TMP}c \
X | awk -F/ "$AWKMUNCH" > ${TMP}d
X fi
X /bin/rm -f ${TMP}[bc]
X #
X # Now a slick trick. Use ispell to select those (root) words from the original
X # list (${TMP}a) that are not covered by the suffix list (${TMP}d). Then we
X # merge these with the suffix list and sort it to produce the final output.
X #
X ispell $wchars -d /dev/null -p ${TMP}d -l < ${TMP}a | tr -d \\015 \
X | sort -u -t/ +0 -1 +1 - ${TMP}d
X /bin/rm -f ${TMP}*
SHAR_EOF
if test 6218 -ne "`wc -c < 'munchlist.sh'`"
then
echo shar: error transmitting "'munchlist.sh'" '(should have been 6218 characters)'
fi
chmod +x 'munchlist.sh'
fi # end of overwriting check
echo shar: extracting "'ispell.el'" '(6763 characters)'
if test -f 'ispell.el'
then
echo shar: will not over-write existing file "'ispell.el'"
else
sed 's/^X //' << \SHAR_EOF > 'ispell.el'
X ;;; Spelling correction interface for GNU EMACS using "ispell"
X
X ;;; Walt Buehring
X ;;; Texas Instruments - Computer Science Center
X ;;; ARPA: Buehring%TI-CSL at CSNet-Relay
X ;;; UUCP: {smu, texsun, im4u, rice} ! ti-csl ! buehring
X
X ;;; Depends on the ispell program snarfed from MIT-PREP in early
X ;;; 1986. The only interactive command is "ispell-word" which should be
X ;;; bound to M-$. If someone writes an "ispell-region" command,
X ;;; I would appreciate a copy.
X
X ;;; To fully install this, add this file to your GNU lisp directory and
X ;;; compile it with M-X byte-compile-file. Then add the following to the
X ;;; appropriate init file:
X
X ;;; (autoload 'ispell-word "ispell"
X ;;; "Check the spelling of word in buffer." t)
X ;;; (global-set-key "\e$" 'ispell-word)
X
X ;;; If run on a heavily loaded system, the timeout value in ispell-check
X ;;; and the initial sleep time in ispell-init-process may need to be increased.
X
X ;;; No warranty expressed or implied. All sales final. Void where prohibited.
X ;;; If you don't like it, change it.
X
X (defvar ispell-syntax-table nil)
X
X (if (null ispell-syntax-table)
X ;; The following assumes that the standard-syntax-table
X ;; is static. If you add words with funky characters
X ;; to your dictionary, the following may have to change.
X (progn
X (setq ispell-syntax-table (make-syntax-table))
X ;; Make certain characters word constituents
X (modify-syntax-entry ?' "w " ispell-syntax-table)
X (modify-syntax-entry ?- "w " ispell-syntax-table)
X ;; Get rid on existing word syntax on certain characters
X (modify-syntax-entry ?$ ". " ispell-syntax-table)
X (modify-syntax-entry ?% ". " ispell-syntax-table)))
X
X
X (defun ispell-word (&optional quietly)
X "Check spelling of word at or before dot.
X If word not found in dictionary, display possible corrections in a window
X and let user select."
X (interactive)
X (let* ((current-syntax (syntax-table))
X start end word poss replace)
X (unwind-protect
X (save-excursion
X ;; Ensure syntax table is reasonable
X (set-syntax-table ispell-syntax-table)
X ;; Move backward for word if not already on one.
X (if (not (looking-at "\\w"))
X (re-search-backward "\\w" (dot-min) 'stay))
X ;; Move to start of word
X (re-search-backward "\\W" (dot-min) 'stay)
X ;; Find start and end of word
X (or (re-search-forward "\\w+" nil t)
X (error "No word to check."))
X (setq start (match-beginning 0)
X end (match-end 0)
X word (buffer-substring start end)))
X (set-syntax-table current-syntax))
X (or quietly (message "Checking spelling of %s..." (upcase word)))
X (setq poss (ispell-check word))
X (cond ((eq poss t)
X (or quietly (message "Found %s" (upcase word))))
X ((stringp poss)
X (or quietly (message "Found it because of %s" (upcase poss))))
X ((null poss)
X (or quietly (message "Could Not Find %s" (upcase word))))
X (t (setq replace (ispell-choose poss))
X (if replace
X (progn
X (goto-char end)
X (delete-region start end)
X (insert-string replace)))))
X poss))
X
X
X (defun ispell-choose (choices)
X "Display possible corrections from list CHOICES. Return chosen word or nil
X if none chosen."
X (unwind-protect
X (save-window-excursion
X (let ((count 0)
X (words choices)
X (pick -1)
X (window-min-height 2))
X (overlay-window 3)
X (switch-to-buffer "*Choices*") (erase-buffer)
X (setq mode-line-format "-- %b --")
X (while words
X (if (> (+ 7 (current-column) (length (car words))) (window-width))
X (insert "\n"))
X (insert "(" (+ count ?a) ") " (car words) " ")
X (setq words (cdr words)
X count (1+ count)))
X (select-window (next-window))
X (while (eq pick -1)
X (message "Enter letter to replace word; Space to flush")
X (let* ((char (read-char))
X (num (1+ (- (upcase char) ?A))))
X (cond ((= char ? ) (setq pick 0))
X ((or (<= num 0) (> num count)) (ding))
X (t (setq pick num)))))
X (and (> pick 0) (nth (1- pick) choices))))
X ;; Protected forms...
X (bury-buffer "*Choices*")))
X
X
X (defun overlay-window (height)
X "Create a (usually small) window with HEIGHT lines and avoid
X recentering."
X (save-excursion
X (let ((oldot (save-excursion (beginning-of-line) (dot)))
X (top (save-excursion (move-to-window-line height) (dot)))
X newin)
X (if (< oldot top) (setq top oldot))
X (setq newin (split-window-vertically height))
X (set-window-start newin top))))
X
X
X (defvar ispell-process nil
X "Holds the process object for 'ispell'")
X
X ;;; create signal used by ispell-filter and ispell-check
X (put 'ispell-output 'error-conditions '(ispell-output))
X
X (defun ispell-check (word)
X "Check spelling of string WORD, return either t for an exact match, a string
X containing the root word for a match via suffix removal, a list of possible
X correct spellings, or nil for a complete miss."
X (ispell-init-process)
X (send-string ispell-process (concat word "\n"))
X (condition-case output
X (progn
X (sleep-for 20)
X (error "Timeout waiting for ispell process output"))
X (ispell-output (ispell-parse-output (car (cdr output))))))
X
X (defun ispell-parse-output (output)
X "Parse the OUTPUT string of 'ispell' and return a value as specified by the
X 'ispell-check' function."
X (cond
X ((string= output "*") t)
X ((string= output "#") nil)
X ((string= (substring output 0 1) "+")
X (substring output 2))
X (t
X (let ((choice-list '()))
X (while (not (string= output ""))
X (let* ((start (string-match "[A-z]" output))
X (end (string-match " \\|$" output start)))
X (if start
X (setq choice-list (cons (substring output start end)
X choice-list)))
X (setq output (substring output (1+ end)))))
X choice-list))))
X
X
X (defvar ispell-process-output ""
X "Holds partial output from the 'ispell' process")
X
X (defun ispell-filter (process output)
X "The filter-function for 'ispell'. Signals complete line using the
X ispell-output signal"
X (if (string= "\n" (substring output (1- (length output))))
X (progn
X (setq output (concat ispell-process-output
X (substring output 0 (1- (length output))))
X ispell-process-output "")
X (signal 'ispell-output (list output)))
X (setq ispell-process-output (concat ispell-process-output output))))
X
X (defun ispell-init-process ()
X "Check status of 'ispell' process and start if necessary; set up
X filter function for output."
X (if (or (not ispell-process)
X (not (eq (process-status ispell-process) 'run)))
X (progn
X (message "Starting new ispell process...")
X (and (get-buffer "*ispell*") (kill-buffer "*ispell*"))
X (setq ispell-process (start-process "ispell" "*ispell*"
X "ispell" "-a"))
X (set-process-filter ispell-process 'ispell-filter)
X (process-kill-without-query ispell-process)
X (sit-for 3))))
X
SHAR_EOF
if test 6763 -ne "`wc -c < 'ispell.el'`"
then
echo shar: error transmitting "'ispell.el'" '(should have been 6763 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'buildhash.c'" '(6459 characters)'
if test -f 'buildhash.c'
then
echo shar: will not over-write existing file "'buildhash.c'"
else
sed 's/^X //' << \SHAR_EOF > 'buildhash.c'
X /* -*- Mode: Text -*- */
X /*
X * buildhash.c - make a hash table for ispell
X *
X * Pace Willisson, 1983
X */
X
X #include <stdio.h>
X #include <sys/types.h>
X #include <sys/stat.h>
X #include <sys/param.h>
X #include "ispell.h"
X #include "config.h"
X
X #define NSTAT 100
X struct stat dstat, cstat;
X
X int numwords, hashsize;
X
X char *malloc();
X
X struct dent *hashtbl;
X
X char *Dfile;
X char *Hfile;
X
X char Cfile[MAXPATHLEN];
X char Sfile[MAXPATHLEN];
X
X main (argc,argv)
X int argc;
X char **argv;
X {
X FILE *countf;
X FILE *statf;
X int stats[NSTAT];
X int i;
X
X if (argc > 1) {
X ++argv;
X Dfile = *argv;
X if (argc > 2) {
X ++argv;
X Hfile = *argv;
X }
X else
X Hfile = DEFHASH;
X }
X else {
X Dfile = DEFDICT;
X Hfile = DEFHASH;
X }
X
X sprintf(Cfile,"%s.cnt",Dfile);
X sprintf(Sfile,"%s.stat",Dfile);
X
X if (stat (Dfile, &dstat) < 0) {
X fprintf (stderr, "No dictionary (%s)\n", Dfile);
X exit (1);
X }
X
X if (stat (Cfile, &cstat) < 0 || dstat.st_mtime > cstat.st_mtime)
X newcount ();
X
X if ((countf = fopen (Cfile, "r")) == NULL) {
X fprintf (stderr, "No count file\n");
X exit (1);
X }
X numwords = 0;
X fscanf (countf, "%d", &numwords);
X fclose (countf);
X if (numwords == 0) {
X fprintf (stderr, "Bad count file\n");
X exit (1);
X }
X hashsize = numwords;
X readdict ();
X
X if ((statf = fopen (Sfile, "w")) == NULL) {
X fprintf (stderr, "Can't create %s\n", Sfile);
X exit (1);
X }
X
X for (i = 0; i < NSTAT; i++)
X stats[i] = 0;
X for (i = 0; i < hashsize; i++) {
X struct dent *dp;
X int j;
X if (hashtbl[i].used == 0) {
X stats[0]++;
X } else {
X for (j = 1, dp = &hashtbl[i]; dp->next != NULL; j++, dp = dp->next)
X ;
X if (j >= NSTAT)
X j = NSTAT - 1;
X stats[j]++;
X }
X }
X for (i = 0; i < NSTAT; i++)
X fprintf (statf, "%d: %d\n", i, stats[i]);
X fclose (statf);
X
X filltable ();
X
X output ();
X exit(0);
X }
X
X output ()
X {
X FILE *outfile;
X struct hashheader hashheader;
X int strptr, n, i;
X
X if ((outfile = fopen (Hfile, "w")) == NULL) {
X fprintf (stderr, "can't create %s\n",Hfile);
X return;
X }
X hashheader.magic = MAGIC;
X hashheader.stringsize = 0;
X hashheader.tblsize = hashsize;
X fwrite (&hashheader, sizeof hashheader, 1, outfile);
X strptr = 0;
X for (i = 0; i < hashsize; i++) {
X n = strlen (hashtbl[i].word) + 1;
X fwrite (hashtbl[i].word, n, 1, outfile);
X hashtbl[i].word = (char *)strptr;
X strptr += n;
X }
X for (i = 0; i < hashsize; i++) {
X if (hashtbl[i].next != 0) {
X int x;
X x = hashtbl[i].next - hashtbl;
X hashtbl[i].next = (struct dent *)x;
X } else {
X hashtbl[i].next = (struct dent *)-1;
X }
X }
X fwrite (hashtbl, sizeof (struct dent), hashsize, outfile);
X hashheader.stringsize = strptr;
X rewind (outfile);
X fwrite (&hashheader, sizeof hashheader, 1, outfile);
X fclose (outfile);
X }
X
X filltable ()
X {
X struct dent *freepointer, *nextword, *dp;
X int i;
X
X for (freepointer = hashtbl; freepointer->used; freepointer++)
X ;
X for (nextword = hashtbl, i = numwords; i != 0; nextword++, i--) {
X if (nextword->used == 0) {
X continue;
X }
X if (nextword->next == NULL) {
X continue;
X }
X if (nextword->next >= hashtbl && nextword->next < hashtbl + hashsize) {
X continue;
X }
X dp = nextword;
X while (dp->next) {
X if (freepointer > hashtbl + hashsize) {
X fprintf (stderr, "table overflow\n");
X getchar ();
X break;
X }
X *freepointer = *(dp->next);
X dp->next = freepointer;
X dp = freepointer;
X
X while (freepointer->used)
X freepointer++;
X }
X }
X }
X
X
X readdict ()
X {
X struct dent d;
X char lbuf[100];
X FILE *dictf;
X int i;
X int h;
X char *p;
X
X if ((dictf = fopen (Dfile, "r")) == NULL) {
X fprintf (stderr, "Can't open dictionary\n");
X exit (1);
X }
X
X hashtbl = (struct dent *) calloc (numwords, sizeof (struct dent));
X if (hashtbl == NULL) {
X fprintf (stderr, "couldn't allocate hash table\n");
X exit (1);
X }
X
X i = 0;
X while (fgets (lbuf, sizeof lbuf, dictf) != NULL) {
X if (i % 1000 == 0) {
X printf ("%d ", i);
X fflush (stdout);
X }
X i++;
X
X p = &lbuf [ strlen (lbuf) - 1 ];
X if (*p == '\n')
X *p = 0;
X
X if (makedent (lbuf, &d) < 0)
X continue;
X
X d.word = malloc (strlen (lbuf) + 1);
X if (d.word == NULL) {
X fprintf (stderr, "couldn't allocate space for word %s\n", lbuf);
X exit (1);
X }
X strcpy (d.word, lbuf);
X
X h = hash (lbuf, strlen (lbuf), hashsize);
X
X if (hashtbl[h].used == 0) {
X hashtbl[h] = d;
X
X } else {
X struct dent *dp;
X
X dp = (struct dent *) malloc (sizeof (struct dent));
X if (dp == NULL) {
X fprintf (stderr, "couldn't allocate space for collision\n");
X exit (1);
X }
X *dp = d;
X dp->next = hashtbl[h].next;
X hashtbl[h].next = dp;
X }
X }
X printf ("\n");
X }
X
X /*
X * fill in the flags in d, and put a null after the word in s
X */
X
X makedent (lbuf, d)
X char *lbuf;
X struct dent *d;
X {
X char *p, *index();
X
X d->next = NULL;
X d->used = 1;
X d->v_flag = 0;
X d->n_flag = 0;
X d->x_flag = 0;
X d->h_flag = 0;
X d->y_flag = 0;
X d->g_flag = 0;
X d->j_flag = 0;
X d->d_flag = 0;
X d->t_flag = 0;
X d->r_flag = 0;
X d->z_flag = 0;
X d->s_flag = 0;
X d->p_flag = 0;
X d->m_flag = 0;
X
X p = index (lbuf, '/');
X if (p != NULL)
X *p = 0;
X if (strlen (lbuf) > WORDLEN - 1) {
X printf ("%s: word too big\n");
X return (-1);
X }
X
X if (p == NULL)
X return (0);
X
X p++;
X while (*p != NULL) {
X switch (*p) {
X case 'V': d->v_flag = 1; break;
X case 'N': d->n_flag = 1; break;
X case 'X': d->x_flag = 1; break;
X case 'H': d->h_flag = 1; break;
X case 'Y': d->y_flag = 1; break;
X case 'G': d->g_flag = 1; break;
X case 'J': d->j_flag = 1; break;
X case 'D': d->d_flag = 1; break;
X case 'T': d->t_flag = 1; break;
X case 'R': d->r_flag = 1; break;
X case 'Z': d->z_flag = 1; break;
X case 'S': d->s_flag = 1; break;
X case 'P': d->p_flag = 1; break;
X case 'M': d->m_flag = 1; break;
X case 0:
X fprintf (stderr, "no key word %s\n", lbuf);
X continue;
X default:
X fprintf (stderr, "unknown flag %c word %s\n",
X *p, lbuf);
X break;
X }
X p++;
X if (*p != '/' && *p != NULL && *p != '\n') {
X fprintf (stderr, "bad format %s (%c 0%o)\n",
X lbuf, *p, *p);
X break;
X }
X if (*p)
X p++;
X
X }
X return (0);
X }
X
X newcount ()
X {
X char buf[200];
X FILE *d;
X int i;
X
X fprintf (stderr, "Counting words in dictionary ...\n");
X
X if ((d = fopen (Dfile, "r")) == NULL) {
X fprintf (stderr, "Can't open dictionary\n");
X exit (1);
X }
X
X i = 0;
X while (fgets (buf, sizeof buf, d) != NULL) {
X i++;
X if (i % 1000 == 0) {
X printf ("%d ", i);
X fflush (stdout);
X }
X }
X fclose (d);
X printf ("\n%d words\n", i);
X if ((d = fopen (Cfile, "w")) == NULL) {
X fprintf (stderr, "can't create %s\n", Cfile);
X exit (1);
X }
X fprintf (d, "%d\n", i);
X fclose (d);
X }
SHAR_EOF
if test 6459 -ne "`wc -c < 'buildhash.c'`"
then
echo shar: error transmitting "'buildhash.c'" '(should have been 6459 characters)'
fi
fi # end of overwriting check
# End of shell archive
exit 0
--
Geoff Kuenning
{hplabs,ihnp4}!trwrb!desint!geoff
More information about the Comp.sources.unix
mailing list