ispell repost (less dict) 01/02: enhanced, fixed

Sat Mar 14 20:01:00 AEST 1987

: This is a definitive integrated/enhanced ispell (except the dictionary).
: Everybody else's work has been installed, and many other bugs have
: been fixed.  I have also written a spelling-list suffix muncher.
: See the first file in the shar (UPDATE) for more details.
:
: Also, don't forget to pick up my three companion postings of dictionary
: diff's in net.sources.bugs.
:
:	Geoff Kuenning
:	{hplabs,ihnp4}!trwrb!desint!geoff
:
#! /bin/sh
# This is a shell archive, meaning:
# 1. Remove everything above the #! /bin/sh line.
# 2. Save the resulting text in a file.
# 3. Execute the file with /bin/sh (not csh) to create the files:
#	UPDATE
#	Makefile
#	ispell.man
#	README
#	WISHES
#	expand.awk
#	expand1.sed
#	expand2.sed
#	munchlist.sh
#	ispell.el
#	buildhash.c
# This archive created: Sat Mar 14 00:58:44 1987
export PATH; PATH=/bin:$PATH
echo shar: extracting "'UPDATE'" '(5252 characters)'
if test -f 'UPDATE'
then
	echo shar: will not over-write existing file "'UPDATE'"
else
sed 's/^X //' << \SHAR_EOF > 'UPDATE'
X 		Ispell enhancements - 3/13/87
X 
X (See three companion postings in net.sources.bugs).
X 
X Here are the enhancements to ispell that I mentioned a couple of days ago.
X Because of the number of changes, several of the context diff's are bigger
X than the original files.  In addition, many people have gotten confused
X about versions, since enhancements/fixes have been made by six different
X people, counting myself (for the list, see the end of ispell.man).  I
X have integrated all of these fixes and enhancements in one place.
X 
X For these reasons, I have decided to repost all of the sources for ispell,
X with one exception -- the dictionary.  (A couple of small files, such
X as ispell.el, are unchanged, but I decided to repost them any for
X completeness.  If you didn't have ispell before, you now need only the
X dictionary).
X 
X The dictionary is a special case:  if you think about it, even ordinary
X diff's will always work with "patch" on that each-line-is-unique file.
X An out-of-place insertion can be corrected by sorting the dictionary
X after patching (something that is done anyway as a side effect of the
X new "munchlist" script).  Because of this, I have decided not to repost
X the sizable dictionary.  In the process of testing this code, it occurred
X to me to run dict.191 through UNIX "spell";  the results of that are
X given in three companion postings in net.sources.bugs, which seemed
X like a more appropriate place for the diffs.  (The postings are not
X divided because of their size;  see comments in the postings for my
X reasons).
X 
X Now, here's what I've done:
X 
X In ispell itself:
X 
X 	- The personal dictionary is now hashed, just like the main one, and
X 	  supports suffixes just like the main one.  (It's not actually
X 	  integrated with the main one, because expanding the main one
X 	  is inefficient and poses a minor but troublesome technical
X 	  problem).  A personal dictionary of 28000+ words can be read in
X 	  within a few minutes (hey, nobody's perfect -- whatcha doing
X 	  with such a big dictionary anyway? :-).
X 	- New option "-c" is used by the new munchlist script to generate
X 	  suggested root/suffix combinations.
X 	- The -d option can now specify /dev/null, if you want to use
X 	  only your personal dictionary (this also saves startup time
X 	  with -c, and is used by the "munchlist" script, which is why
X 	  I put it in).
X 	- The -p option is now more flexible about its handling of pathnames.
X 	  An absolute pathname is always interpreted literally.  A
X 	  relative pathname from WORDLIST is looked up in $HOME first,
X 	  then in the current directory.  The -p option behaves in the
X 	  reverse fashion:  current directory first, then $HOME.  This
X 	  behavior seems more intuitive to me;  I'd be interested in
X 	  opinions of others if you don't find it intuitive.
X 	- Perhaps most important, I have completely overhauled the logic
X 	  in good.c, so that it (I think) matches what the README file
X 	  says it should, no more, no less.  The code has been extensively
X 	  tested, notably by interaction with the new expansion scripts;
X 	  nevertheless because of the extent of the changes and the
X 	  nature of the logic, I'd suggest a bit of suspicion for a while.
X 	  A technique we've found useful here is to do your normal work
X 	  with ispell, and then do a final check with UNIX spell or some
X 	  other slow, inconvenient program to make sure ispell didn't
X 	  screw up.
X 
X New scripts:
X 
X 	- expand.awk:  an obsolete (but correct) awk script that does
X 	  the same thing as expand[12].sed, except slower.  The awk
X 	  script is also much easier to understand than the sed scripts.
X 	  Superseded by the sed scripts, except for very short input.
X 	- expand[12].sed:  the sed pipe
X 
X 	    "sed -f expand1.sed $file | sed -f expand2.sed"
X 
X 	  where "$file" is a raw dictionary file with suffixes
X 	  (e.g., dict.191), generates a list of each root alone, plus
X 	  the root expanded with each possible suffix (e.g.,
X 	  "BOTH/R/Z" produces "BOTH", "BOTHER", and "BOTHERS").  The
X 	  output should usually be sorted with the -u switch before
X 	  further processing.  These scripts are used by 'munchlist';
X 	  they are also useful for (a) checking an ispell dictionary
X 	  with some other spell-checking program and (b) figuring
X 	  out what a particular suffix does to a certain word without
X 	  reading the README file.
X 	- munchlist.sh:  a slow, but effective, shell script that takes
X 	  lists of expanded or unexpanded words as input and reduces
X 	  them to a (usually smaller) list of roots and suffixes.  The
X 	  result is written to standard output.  I think the documentation
X 	  forgot to mention the input must be one word per line.  I
X 	  have successfully used this script to combine dict.191 with
X 	  /usr/dict/words;  it's also useful (and a lot faster) on
X 	  private dictionaries.  For private dictionaries. it will also
X 	  remove any word that has since been added to the main dictionary.
X 
X Oh yes, I almost forgot:  the original documentation didn't mention
X that ispell is a long-name program.  If your "File:" display on the
X top line actually contains the misspelled word, you have long-name problems.
X My fixes don't address long names, because I finally have a way to
X compile long-name programs, thanks to "hash8".
X 
X 	Geoff Kuenning
X 	geoff at ITcorp.COM
X 	...!trwrb!desint!geoff
SHAR_EOF
if test 5252 -ne "`wc -c < 'UPDATE'`"
then
	echo shar: error transmitting "'UPDATE'" '(should have been 5252 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'Makefile'" '(1198 characters)'
if test -f 'Makefile'
then
	echo shar: will not over-write existing file "'Makefile'"
else
sed 's/^X //' << \SHAR_EOF > 'Makefile'
X # -*- Mode: Text -*-
X 
X # Look over config.h before building.
X #
X # LIBDIR, DEFHASH, DEFDICT should match definitions in config.h.
X #
X # The ifdef NO8BIT may be used if 8 bit extended text characters
X # cause problems, or you simply don't wish to allow the feature.
X #
X # the argument syntax for buildhash to make alternate dictionary files
X # is simply:
X #
X #   buildhash <infile> <outfile>
X 
X CFLAGS = -O
X BINDIR = /usr/local/bin
X LIBDIR = /usr/local/lib
X DEFHASH = ispell.hash
X DEFDICT = dict.191
X 
X # TERMLIB = -lcurses
X TERMLIB = -ltermlib
X all: buildhash ispell $(DEFHASH)
X 
X ispell.hash: buildhash $(DEFDICT)
X 	buildhash
X 
X install: buildhash ispell $(DEFHASH)
X 	cp ispell ${BINDIR}/ispell
X 	cp munchlist.sh $(BINDIR)/munchlist
X 	cp ispell.hash ${LIBDIR}/${DEFHASH}
X 	cp expand1.sed expand2.sed $(LIBDIR)
X 	chmod 755 ${BINDIR}/ispell $(BINDIR)/munchlist
X 	chmod 644 ${LIBDIR}/$(DEFHASH) $(LIBDIR)/expand1.sed \
X 	  $(LIBDIR)/expand2.sed
X 
X buildhash: buildhash.o hash.o
X 	cc -o buildhash buildhash.o hash.o
X 
X ispell: ispell.o term.o good.o lookup.o hash.o tree.o
X 	cc $(CFLAGS) -o ispell ispell.o term.o good.o lookup.o \
X 		hash.o tree.o $(TERMLIB)
X 
X clean:
X 	rm -f *.o buildhash ispell core a.out mon.out hash.out \
X 		*.stat *.cnt
SHAR_EOF
if test 1198 -ne "`wc -c < 'Makefile'`"
then
	echo shar: error transmitting "'Makefile'" '(should have been 1198 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'ispell.man'" '(8455 characters)'
if test -f 'ispell.man'
then
	echo shar: will not over-write existing file "'ispell.man'"
else
sed 's/^X //' << \SHAR_EOF > 'ispell.man'
X .\" -*- Mode:Text -*-
X .\"
X .TH ISPELL local MIT
X .SH NAME
X ispell \- Correct spelling for a file
X .br
X munchlist \- Combine suffixes in a spelling list
X .SH SYNOPSIS
X .B ispell
X [
X .B \-x
X |
X .B \-d
X file |
X .B \-p
X file |
X .B \-w
X chars ] file .....
X .br
X .B ispell
X [
X .B \-d
X file |
X .B \-p
X file |
X .B \-w
X chars ]
X .B \-l
X .br
X .B ispell
X [
X .B \-d
X file |
X .B \-p
X file
X ]
X .B \-a
X .br
X .B ispell
X [
X .B \-d
X file |
X .B \-p
X file |
X .B \-w
X chars ]
X .B \-c
X .br
X .B munchlist
X [
X .B \-d
X file |
X .B \-e
X |
X .B \-w
X chars ]
X [ files ]
X .SH DESCRIPTION
X .PP
X .I Ispell
X is fashioned after the
X .I spell
X program from ITS (called
X .I ispell
X on Twenex systems.)  The most common usage is "ispell filename".  In this
X case,
X .I ispell
X will display each word which does not appear in the dictionary, and
X allow you to change it.  If there are "near misses" in the dictionary
X (words which differ by only a single letter, a missing or extra letter,
X or a pair of transposed letters), then they are also displayed.  If you
X think the word is correct as it stands, you can type either "Space" to
X accept it this one time, or "I" to accept it and put it in your private
X dictionary.  If one of the near misses is the word you want, type the
X corresponding number.  Finally, if none of these choices is right, you
X can type "R" and you will be prompted for a replacement word.
X If you want to see a list of words that might be close using wildcard
X characters, type "L" to lookup a word in the system dictionary.
X .PP
X When a misspelled word is found, it is printed at the top of the screen.
X Any near misses will be printed on the following lines, and finally, two
X lines containing the word are printed at the bottom of the screen.  If
X your terminal can type in reverse video, the word itself is highlighted.
X .PP
X The
X .B \-l
X or "list" option to
X .I ispell
X is used to produce a list of misspelled words from the standard input.
X .PP
X The
X .B \-a
X is intended to be used from other programs through a pipe.  In this
X mode,
X .I ispell
X expects the standard input to consist of single words.  Each word is
X read, and a single line is written to the standard output.  If the word
X was found in the main dictionary, or your personal dictionary, then the
X line contains only a '*'.  If the word was found through suffix removal,
X then the line contains a '+', a space, and the root word.  If the word
X is not in the dictionary, but there are near misses, then the line
X contains an '&', a space, and a list of the near misses separated by
X spaces.  Also, each near miss is capitalized the same as the input
X words.  Finally, if the word neither appears in the dictionary, and
X there are no near misses, then the line contains only a '#'.  This mode
X is also suitable for interactive use when you want to figure out the
X spelling of a single word.  (These characters are the same as the codes
X that the real spell program uses.)
X .PP
X The
X .B \-x
X option causes
X .I ispell
X to remove the .bak file that it normally leaves.  The .bak file contains
X the pre-corrected text.  If there are file opening / writing errors,
X the .bak file may be left for recovery purposes even with the -x option.
X .PP
X The
X .B \-d
X option is used to specify an alternate hashed dictionary file,
X other than the default.  If the filename does not begin with a "/",
X the library directory for the default dictionary file is prefixed.
X This is useful to allow dictionaries which prefer alternate british
X spellings ("centre", "tyre", etc), or add lists of special-purpose
X jargon and acronyms for subclasses of documents.  There are some shortcomings
X in attempting to provide foreign-language dictionaries, but something
X like "-dfrench" could be made to work somewhat.
X The
X .B \-d
X option may specify
X .IR /dev/null ,
X in which case the dictionary is limited to the personal one.
X This may be useful for certain private dictionaries.
X .PP
X The
X .B \-p
X option is used to specify an alternate personal dictionary file.
X If the file name does not begin with "/", $HOME is prefixed.  Also, the
X shell variable WORDLIST may be set, which renames the personal dictionary
X in the same manner.  The command line overrides WORDLIST setting.  If
X neither is present "ispell.words" is used.
X .PP
X The
X .B \-w
X option may be used to specify characters other than alphabetics
X which may also appear in words.  For instance,
X .B \-w
X "&" will allow "AT&T"
X to be picked up.  Underscores are useful in many technical documents.
X There is an admittedly crude provision in this option for 8-bit international
X characters.  If "n" appears in the character string, the three characters
X following are a DECIMAL code 0 - 255, for the character.  There must be
X three decimal characters in all cases, so you have to prepend with 0's,
X for instance, to include bells and formfeeds in your words (an admittedly
X silly thing to do, but aren't most pedagogical examples):
X .PP
X n007n012
X .PP
X Numeric digits other than the three following "n" are simply numeric
X characters.  Use of "n" does not conflict with anything because actual
X alphabetics have no meaning - alphabetics are already accepted.
X .I Ispell
X will typically be used with input from a file, meaning that preserving
X parity for possible 8 bit characters from the input text is OK.  If you
X specify the -l option, and actually type text from the terminal, this may
X create problems if your stty settings preserve parity.
X .PP
X The
X .B \-c
X option is primarily intended for use by the
X .I munchlist
X shell script.
X In this mode, a list of words is read from the standard input.
X For each word, a list of possible root words and suffixes will be
X written to the standard output.
X Some of the root words will be illegal and must be filtered from the
X output by other means;
X the
X .I munchlist
X script does this.
X As an example, the command "echo BOTHER | ispell -c" produces:
X .PP
X .RS
X .nf
X BOTH
X BOTHE/R
X BOTH/R
X .fi
X .RE
X .PP
X The
X .I munchlist
X shell script is used to reduce the size of dictionary files,
X primarily personal dictionary files.
X It is also capable of combining dictionaries from various sources.
X The given
X .I files
X are read (standard input if no arguments are given),
X reduced to a minimal set of roots and suffixes that will match the
X same list of words, and written to standard output.
X .PP
X Normally, words that are in the default dictionary are removed by
X .I munchlist
X during processing.
X If the list is to be used with a different dictionary, the
X .B \-d
X option can be used to specify an alternate (hashed) dictionary file
X containing words to be removed from the output list.
X If a dictionary file of
X .I /dev/null
X is specified, no words will be removed from the output;
X this is useful when munching the primary dictionary file.
X .PP
X The
X .B \-w
X option is passed on to
X .IR ispell .
X The
X .B \-e
X ("efficient") option causes the script to use a slower algorithm that uses
X somewhat less space in TMPDIR (normally
X .IR /usr/tmp ")."
X .PP
X It is possible to install
X .I ispell
X in such a way as to only support ASCII range text if desired.
X .SH DEFAULT FILES
X /usr/public/lib/ispell.hash
X .br
X /usr/dict/web2		for the Lookup function
X .br
X $HOME/ispell.words	user's private dictionary
X .br
X /usr/public/lib/expand[12].sed		sed scripts for expanding suffixes
X .SH SEE ALSO
X spell(1), egrep(1), look(1)
X .SH BUGS
X It takes about five seconds for
X .I ispell
X to read in the hash table.
X .sp
X Perhaps more than ten choices should be allowed for near misses.
X .sp
X The hash table is stored as a quarter-megabyte array, so a PDP-11
X version does not seem likely.
X .sp
X .I Ispell
X should understand more
X .I troff
X syntax, and deal more intelligently with contractions.
X .sp
X While alternate dictionaries for foreign languages could be defined, and
X the international characters included in words, rules concerning
X word endings / pluralization accommodate english only.
X .sp
X .I Munchlist
X is very slow, and requires tremendous amounts of temporary file space for
X large dictionaries.
X It does respect the TMPDIR environment variable, so this space can be
X redirected.
X However, a lot of the temporary space it needs is for sorting, so TMPDIR
X is only a partial help on systems with an uncooperative
X .IR sort (1).
X As a benchmark, the 15000-word
X .I dict.191
X takes about 1200 blocks in TMPDIR, and 2000 in
X .IR sort "'s"
X temporary directories.
X On a 68000 workstation, it runs for the better part of an hour.
X Munching
X .I dict.191
X with
X .I /usr/dict/words
X (28000 words output)
X took another 1500 blocks or so, and ran for about three hours.
X .SH AUTHOR
X Pace Willisson (pace at mit-vax)
X .br
X Enhanced by James Woods, Bob McQueer, Bill Randle, Marc Ries, Rob McMahon,
X and Geoff Kuenning.
SHAR_EOF
if test 8455 -ne "`wc -c < 'ispell.man'`"
then
	echo shar: error transmitting "'ispell.man'" '(should have been 8455 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'README'" '(6256 characters)'
if test -f 'README'
then
	echo shar: will not over-write existing file "'README'"
else
sed 's/^X //' << \SHAR_EOF > 'README'
X -*- Mode:Text -*-
X 
X Ispell consists of two programs: the actual spelling checker, "ispell",
X and the hash table builder, "buildhash".  Everything is set up so you
X can just say "make install" and have everything happen.  You might want
X to edit the makefile, and ispell.h to change the destination of the
X program and the hash table.
X 
X The dictionary comes from the ITS spell dictionary.  I got it from
X "ml:wba;dict 191", although I don't know that this is the copy currenty
X in use on the 20's around MIT.
X 
X ----------------------------------------------------------------------
X 
X Addendum:
X 
X My eternal gratitude to the author of ispell -- I don't know how I
X ever lived without it.  I received his permission to post ispell to
X the net and have added a GNU EMACS interface.  Look in the file
X ispell.el for installation instructions.
X 
X As far as I know, no one informally "supports" this program.  If you
X would like to "adopt" it (collect fixes/enhancements and post a new
X version periodically), feel free to do so.
X 
X I volunteer to collect dictionary diffs and post a composite diff
X periodically.  If you add a lot of words to the main dictionary, send
X me the diffs between the the modified dictionary and the posted one.
X Also, if you have access to a TOPS20 system with a more complete
X dictionary in ispell format, send me the diffs if you can.  Just
X PLEASE don't dump an entire dictionary to our site!
X 
X The dictionary posted is one I snarfed from around here -- after
X comparison with the one originally supplied, ours appears a tad more
X complete and accurate.
X 
X Walt Buehring
X Texas Instruments - Computer Science Center
X 
X ARPA:  Buehring%TI-CSL at CSNet-Relay
X UUCP:  {smu, texsun, im4u, rice} ! ti-csl ! buehring
X 
X ----------------------------------------------------------------------
X 
X The following is the only documentation I could find about the format
X of the dictionary.  It was written for the TOPS20 speller that ispell
X mimics, so I believe most the information is applicable.  It should be
X useful if you want to add words to the main dictionary by hand.  -WB
X 
X ----------------------------------------------------------------------
X 
X 11.6  Dictionary flags
X 
X      Words  in SPELL's main dictionary (but not the other dictionaries) may
X have flags associated with  them  to  indicate  the  legality  of  suffixes
X without  the  need  to keep the full suffixed words in the dictionary.  The
X flags have "names" consisting of single  letters.    Their  meaning  is  as
X follows:
X 
X Let  #  and  @  be  "variables"  that can stand for any letter.  Upper case
X letters are constants.  "..."  stands  for  any  string  of  zero  or  more
X letters,  but note that no word may exist in the dictionary which is not at
X least 2 letters long, so, for example, FLY may not be produced  by  placing
X the  "Y"  flag  on "F".  Also, no flag is effective unless the word that it
X creates is at least 4 letters  long,  so,  for  example,  WED  may  not  be
X produced by placing the "D" flag on "WE".
X 
X "V" flag:
X         ...E --> ...IVE  as in CREATE --> CREATIVE
X         if # .ne. E, ...# --> ...#IVE  as in PREVENT --> PREVENTIVE
X 
X "N" flag:
X         ...E --> ...ION  as in CREATE --> CREATION
X         ...Y --> ...ICATION  as in MULTIPLY --> MULTIPLICATION
X         if # .ne. E or Y, ...# --> ...#EN  as in FALL --> FALLEN
X 
X "X" flag:
X         ...E --> ...IONS  as in CREATE --> CREATIONS
X         ...Y --> ...ICATIONS  as in MULTIPLY --> MULTIPLICATIONS
X         if # .ne. E or Y, ...# --> ...#ENS  as in WEAK --> WEAKENS
X 
X "H" flag:
X         ...Y --> ...IETH  as in TWENTY --> TWENTIETH
X         if # .ne. Y, ...# --> ...#TH  as in HUNDRED --> HUNDREDTH
X 
X "Y" FLAG:
X         ... --> ...LY  as in QUICK --> QUICKLY
X 
X "G" FLAG:
X         ...E --> ...ING  as in FILE --> FILING
X         if # .ne. E, ...# --> ...#ING  as in CROSS --> CROSSING
X 
X "J" FLAG"
X         ...E --> ...INGS  as in FILE --> FILINGS
X         if # .ne. E, ...# --> ...#INGS  as in CROSS --> CROSSINGS
X 
X "D" FLAG:
X         ...E --> ...ED  as in CREATE --> CREATED
X         if @ .ne. A, E, I, O, or U,
X                 ... at Y --> ... at IED  as in IMPLY --> IMPLIED
X         if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X                 ...@# --> ...@#ED  as in CROSS --> CROSSED
X                                 or CONVEY --> CONVEYED
X "T" FLAG:
X         ...E --> ...EST  as in LATE --> LATEST
X         if @ .ne. A, E, I, O, or U,
X                 ... at Y --> ... at IEST  as in DIRTY --> DIRTIEST
X         if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X                 ...@# --> ...@#EST  as in SMALL --> SMALLEST
X                                 or GRAY --> GRAYEST
X 
X "R" FLAG:
X         ...E --> ...ER  as in SKATE --> SKATER
X         if @ .ne. A, E, I, O, or U,
X                 ... at Y --> ... at IER  as in MULTIPLY --> MULTIPLIER
X         if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X                 ...@# --> ...@#ER  as in BUILD --> BUILDER
X                                 or CONVEY --> CONVEYER
X 
X "Z FLAG:
X         ...E --> ...ERS  as in SKATE --> SKATERS
X         if @ .ne. A, E, I, O, or U,
X                 ... at Y --> ... at IERS  as in MULTIPLY --> MULTIPLIERS
X         if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X                 ...@# --> ...@#ERS  as in BUILD --> BUILDERS
X                                 or SLAY --> SLAYERS
X 
X "S" FLAG:
X         if @ .ne. A, E, I, O, or U,
X                 ... at Y --> ... at IES  as in IMPLY --> IMPLIES
X         if # .eq. S, X, Z, or H,
X                 ...# --> ...#ES  as in FIX --> FIXES
X         if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
X                 ...@# --> ...@#S  as in BAT --> BATS
X                                 or CONVEY --> CONVEYS
X 
X "P" FLAG:
X         if @ .ne. A, E, I, O, or U,
X                 ... at Y --> ... at INESS  as in CLOUDY --> CLOUDINESS
X         if # .ne. Y, or @ = A, E, I, O, or U,
X                 ...@# --> ...@#NESS  as in LATE --> LATENESS
X                                 or GRAY --> GRAYNESS
X 
X "M" FLAG:
X         ... --> ...'S  as in DOG --> DOG'S
X 
X ----------------------------------------------------------------------
X 
X [Whew!  That's all very nice, but how about a quick reference...  -WB]
X 
X V -  ive
X N -  ion, tion, en
X X -  ions, ications, ens
X H -  th, ieth
X Y -  ly
X G -  ing
X J -  ings
X D -  ed
X T -  est
X R -  er
X Z -  ers
X S -  s, es, ies
X P -  ness, iness
X M -  's
SHAR_EOF
if test 6256 -ne "`wc -c < 'README'`"
then
	echo shar: error transmitting "'README'" '(should have been 6256 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'WISHES'" '(1211 characters)'
if test -f 'WISHES'
then
	echo shar: will not over-write existing file "'WISHES'"
else
sed 's/^X //' << \SHAR_EOF > 'WISHES'
X Things remaining to be done to ispell:
X 
X 	- The single biggest remaining deficiency (in my opinion) is the
X 	  extensive misuse of 'strlen'.  Strlen is often called repeatedly
X 	  on the same string within a few lines of code.  Worse, many
X 	  routines accept a "length" parameter (which is usually passed
X 	  by running 'strlen' within the arglist) but ignore it and
X 	  actually require the string to be null-terminated.  Somebody
X 	  should do a systematic edit and clean this up.  I wouldn't
X 	  be surprised to learn that ispell spends 50% of its time in
X 	  strlen.
X 	- The "munchlist" script can actually increase the size of a
X 	  dictionary.  For example, munching dict.191 (after my bugfixes
X 	  to it) reduced the number of words by about 40, but increased
X 	  the number of characters by a small percentage.  This is
X 	  because munchlist doesn't recognize duplicate suffixes that
X 	  generate the same result, except for the three special
X 	  "s-ending" suffixes J, Z, and X.  For example, right now
X 	  munchlist will make BATHER by adding the R suffix to both
X 	  BATH and BATHE.  It would be nice if munchlist could recognize
X 	  the redundancy and reduce its output so that each word was made
X 	  in only one way.
SHAR_EOF
if test 1211 -ne "`wc -c < 'WISHES'`"
then
	echo shar: error transmitting "'WISHES'" '(should have been 1211 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'expand.awk'" '(5769 characters)'
if test -f 'expand.awk'
then
	echo shar: will not over-write existing file "'expand.awk'"
else
sed 's/^X //' << \SHAR_EOF > 'expand.awk'
X BEGIN {FS = "/"}
X     {
X     print $1
X #Let  #  and  @  be  "variables"  that can stand for any letter.  Upper case
X #letters are constants.  "..."  stands  for  any  string  of  zero  or  more
X #letters,  but note that no word may exist in the dictionary which is not at
X #least 2 letters long, so, for example, FLY may not be produced  by  placing
X #the  "Y"  flag  on "F".  Also, no flag is effective unless the word that it
X #creates is at least 4 letters  long,  so,  for  example,  WED  may  not  be
X #produced by placing the "D" flag on "WE".
X     size = length ($1)
X     #
X     # Break out the last two letters into "tail", and put
X     # corresponding versions of the root with the tail trimmed
X     # off into "trimmed".  If they are vowels, set vowel[i].
X     # (Actually, only vowel[2] is used).
X     #
X     for (i = 1;  i < 3;  i++)
X 	{
X 	tail[i] = substr ($1, size - i + 1, 1)
X 	if (tail[i] == "A"  ||  tail[i] == "E" ||  tail[i] == "I" \
X 	  ||  tail[i] == "O"  ||  tail[i] == "U")
X 	    vowel[i] = 1
X 	else
X 	    vowel[i] = 0
X 	trimmed[i] = substr ($1, 1, size - i)
X 	}
X     for (i = 2;  i <= NF;  i++)
X 	{
X 	if ($i == "V")
X 	    {
X #		...E --> ...IVE  as in CREATE --> CREATIVE
X #		if # .ne. E, ...# --> ...#IVE  as in PREVENT --> PREVENTIVE
X 	    if (tail[1] == "E")
X 		print trimmed[1] "IVE"
X 	    else
X 		print $1 "IVE"
X 	    }
X 	else if ($i == "N"  ||  $i == "X")
X 	    {
X #	        ...E --> ...ION  as in CREATE --> CREATION
X #	        ...Y --> ...ICATION  as in MULTIPLY --> MULTIPLICATION
X #	        if # .ne. E or Y, ...# --> ...#EN  as in FALL --> FALLEN
X #	    "X" flag:
X #	        ...E --> ...IONS  as in CREATE --> CREATIONS
X #	        ...Y --> ...ICATIONS  as in MULTIPLY --> MULTIPLICATIONS
X #	        if # .ne. E or Y, ...# --> ...#ENS  as in WEAK --> WEAKENS
X 	    if ($i == "N")
X 		plural = ""
X 	    else
X 		plural = "S"
X 	    if (tail[1] == "E")
X 		print trimmed[1] "ION" plural
X 	    else if (tail[1] == "Y")
X 		print trimmed[1] "ICATION" plural
X 	    else
X 		print $1 "EN" plural
X 	    }
X 	else if ($i == "H")
X 	    {
X #	        ...Y --> ...IETH  as in TWENTY --> TWENTIETH
X #	        if # .ne. Y, ...# --> ...#TH  as in HUNDRED --> HUNDREDTH
X 	    if (tail[1] == "Y")
X 		print trimmed[1] "IETH"
X 	    else
X 		print $1 "TH"
X 	    }
X 	else if ($i == "Y")
X 	    {
X #	        ... --> ...LY  as in QUICK --> QUICKLY
X 	    print $1 "LY"
X 	    }
X 	else if ($i == "G"  ||  $i == "G")
X 	    {
X #	        ...E --> ...ING  as in FILE --> FILING
X #	        if # .ne. E, ...# --> ...#ING  as in CROSS --> CROSSING
X #	    "J" flag:
X #	        ...E --> ...INGS  as in FILE --> FILINGS
X #	        if # .ne. E, ...# --> ...#INGS  as in CROSS --> CROSSINGS
X 	    if ($i == "G")
X 		plural = ""
X 	    else
X 		plural = "S"
X 	    if (tail[1] == "E")
X 		print trimmed[1] "ING" plural
X 	    else
X 		print $1 "ING" plural
X 	    }
X 	else if ($i == "D")
X 	    {
X #	        ...E --> ...ED  as in CREATE --> CREATED
X #	        if @ .ne. A, E, I, O, or U,
X #	                ... at Y --> ... at IED  as in IMPLY --> IMPLIED
X #	        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X #	                ...@# --> ...@#ED  as in CROSS --> CROSSED
X #	                                or CONVEY --> CONVEYED
X 	    if (tail[1] == "E")
X 		print $1 "D"
X 	    else if (tail[1] == "Y"  && !vowel[2])
X 		print trimmed[1] "IED"
X 	    else
X 		print $1 "ED"
X 	    }
X 	else if ($i == "T")
X 	    {
X #	        ...E --> ...EST  as in LATE --> LATEST
X #	        if @ .ne. A, E, I, O, or U,
X #	                ... at Y --> ... at IEST  as in DIRTY --> DIRTIEST
X #	        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X #	                ...@# --> ...@#EST  as in SMALL --> SMALLEST
X #                                or GRAY --> GRAYEST
X 	    if (tail[1] == "E")
X 		print $1 "ST"
X 	    else if (tail[1] == "Y"  &&  !vowel[2])
X 		print trimmed[1] "IEST"
X 	    else
X 		print $1 "EST"
X 	    }
X 	else if ($i == "R"  ||  $i == "Z")
X 	    {
X #	        ...E --> ...ER  as in SKATE --> SKATER
X #	        if @ .ne. A, E, I, O, or U,
X #	                ... at Y --> ... at IER  as in MULTIPLY --> MULTIPLIER
X #	        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X #	                ...@# --> ...@#ER  as in BUILD --> BUILDER
X #	                                or CONVEY --> CONVEYER
X #	   "Z" flag:
X #	        ...E --> ...ERS  as in SKATE --> SKATERS
X #	        if @ .ne. A, E, I, O, or U,
X #	                ... at Y --> ... at IERS  as in MULTIPLY --> MULTIPLIERS
X #	        if # .ne. E or Y, or (# = Y and @ = A, E, I, O, or U)
X #	                ...@# --> ...@#ERS  as in BUILD --> BUILDERS
X #	                                or SLAY --> SLAYERS
X 	    if ($i == "R")
X 		plural = ""
X 	    else
X 		plural = "S"
X 	    if (tail[1] == "E")
X 		print $1 "R" plural
X 	    else if (tail[1] == "Y"  &&  !vowel[2])
X 		print trimmed[1] "IER" plural
X 	    else
X 		print $1 "ER" plural
X 	    }
X 	else if ($i == "S")
X 	    {
X #	        if @ .ne. A, E, I, O, or U,
X #	                ... at Y --> ... at IES  as in IMPLY --> IMPLIES
X #	        if # .eq. S, X, Z, or H,
X #	                ...# --> ...#ES  as in FIX --> FIXES
X #	        if # .ne. S, X, Z, H, or Y, or (# = Y and @ = A, E, I, O, or U)
X #	                ...@# --> ...@#S  as in BAT --> BATS
X #	                                or CONVEY --> CONVEYS
X 	    if (tail[1] == "Y"  &&  !vowel[2])
X 		print trimmed[1] "IES"
X 	    else if (tail[1] == "S")
X 		print $1 "ES"
X 	    else
X 		print $1 "S"
X 	    }
X 	else if ($i == "P")
X 	    {
X #	        if @ .ne. A, E, I, O, or U,
X #	                ... at Y --> ... at INESS  as in CLOUDY --> CLOUDINESS
X #	        if # .ne. Y, or @ = A, E, I, O, or U,
X #	                ...@# --> ...@#NESS  as in LATE --> LATENESS
X #	                                or GRAY --> GRAYNESS
X 	    if (tail[1] == "Y"  &&  !vowel[2])
X 		print trimmed[1] "INESS"
X 	    else
X 		print $1 "NESS"
X 	    }
X 	else if ($i == "M")
X 	    {
X #	        ... --> ...'S  as in DOG --> DOG'S
X 		print $1 "'S"
X 	    }
X 	}
X     }
SHAR_EOF
if test 5769 -ne "`wc -c < 'expand.awk'`"
then
	echo shar: error transmitting "'expand.awk'" '(should have been 5769 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'expand1.sed'" '(1607 characters)'
if test -f 'expand1.sed'
then
	echo shar: will not over-write existing file "'expand1.sed'"
else
sed 's/^X //' << \SHAR_EOF > 'expand1.sed'
X /^[^/]*$/n
X /\/V/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)E\([/A-Z]*\)/V@\1IVE\
X \1E\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/V@\1IVE\
X \1\2@; P; D
X }
X /\/N/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)E\([/A-Z]*\)/N@\1ION\
X \1E\2@; P; D
X     }
X     /^[^/]*Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/N@\1ICATION\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/N@\1EN\
X \1\2@; P; D
X }
X /\/X/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)E\([/A-Z]*\)/X@\1IONS\
X \1E\2@; P; D
X     }
X     /^[^/]*Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/X@\1ICATIONS\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/X@\1ENS\
X \1\2@; P; D
X }
X /\/H/ {
X     /^[^/]*Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/H@\1IETH\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/H@\1TH\
X \1\2@; P; D
X }
X /\/Y/ {
X     s@\([^/]*\)\([/A-Z]*\)/Y@\1LY\
X \1\2@; P; D
X }
X /\/G/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)E\([/A-Z]*\)/G@\1ING\
X \1E\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/G@\1ING\
X \1\2@; P; D
X }
X /\/J/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)E\([/A-Z]*\)/J@\1INGS\
X \1E\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/J@\1INGS\
X \1\2@; P; D
X }
X /\/D/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)\([/A-Z]*\)/D@\1D\
X \1\2@; P; D
X     }
X     /^[^/]*[^AEIOU]Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/D@\1IED\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/D@\1ED\
X \1\2@; P; D
X }
X /\/T/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)\([/A-Z]*\)/T@\1ST\
X \1\2@; P; D
X     }
X     /^[^/]*[^AEIOU]Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/T@\1IEST\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/T@\1EST\
X \1\2@; P; D
X }
X /\/R/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)\([/A-Z]*\)/R@\1R\
X \1\2@; P; D
X     }
X     /^[^/]*[^AEIOU]Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/R@\1IER\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/R@\1ER\
X \1\2@; P; D
X }
SHAR_EOF
if test 1607 -ne "`wc -c < 'expand1.sed'`"
then
	echo shar: error transmitting "'expand1.sed'" '(should have been 1607 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'expand2.sed'" '(622 characters)'
if test -f 'expand2.sed'
then
	echo shar: will not over-write existing file "'expand2.sed'"
else
sed 's/^X //' << \SHAR_EOF > 'expand2.sed'
X /^[^/]*$/n
X /\/Z/ {
X     /^[^/]*E\// {
X 	s@\([^/]*\)\([/A-Z]*\)/Z@\1RS\
X \1\2@; P; D
X     }
X     /^[^/]*[^AEIOU]Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/Z@\1IERS\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/Z@\1ERS\
X \1\2@; P; D
X }
X /\/S/ {
X     /^[^/]*[^AEIOU]Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/S@\1IES\
X \1Y\2@; P; D
X     }
X     /^[^/]*[SXZH]\// {
X 	s@\([^/]*\)\([/A-Z]*\)/S@\1ES\
X \1\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/S@\1S\
X \1\2@; P; D
X }
X /\/P/ {
X     /^[^/]*[^AEIOU]Y\// {
X 	s@\([^/]*\)Y\([/A-Z]*\)/P@\1INESS\
X \1Y\2@; P; D
X     }
X     s@\([^/]*\)\([/A-Z]*\)/P@\1NESS\
X \1\2@; P; D
X }
X /\/M/ {
X     s@\([^/]*\)\([/A-Z]*\)/M@\1'S\
X \1\2@; P; D
X }
SHAR_EOF
if test 622 -ne "`wc -c < 'expand2.sed'`"
then
	echo shar: error transmitting "'expand2.sed'" '(should have been 622 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'munchlist.sh'" '(6218 characters)'
if test -f 'munchlist.sh'
then
	echo shar: will not over-write existing file "'munchlist.sh'"
else
sed 's/^X //' << \SHAR_EOF > 'munchlist.sh'
X : Use /bin/sh
X #
X #	Given a list of words for ispell, generate a reduced list
X #	in which all possible suffixes have been collapsed.  The reduced
X #	list will match the same list as the original.
X #
X #	Usage:
X #
X #	munchlist [ -d hashfile ] [ -e ] [ -w chars ] [ file ] ...
X #
X #	Options:
X #
X #	-d hashfile
X #		Remove any words that are covered by 'hashfile'.  The
X #		default is the default ispell dictionary.  The words
X #		will be removed only if all suffixes are covered by
X #		the hash file.  A hashfile of /dev/null should be
X #		specified when the main dictionary is being munched.
X #	-e	Economical algorithm.  This will use much less temporary
X #		disk space, at the expense of time.  Useful with large files
X #		(such as complete dictionaries).
X #	-w	Passed on to ispell (specify chars that are part of a word)
X #
X #	The given input files are merged, then processed by 'ispell -c'
X #	to generate possible suffix lists;  these are then combined
X #	and reduced.  The final result is written to standard output.
X #
X #	For portability to older systems, I have avoided getopt.
X #
X #		Geoff Kuenning
X #		2/28/87
X #
X LIBDIR=/tmp2/lib			# Must match config.h
X DEFDICT=dict.191			# Must match config.h
X EXPAND1=${LIBDIR}/expand1.sed
X EXPAND2=${LIBDIR}/expand2.sed
X TDIR=${TMPDIR:-/usr/tmp}
X TMP=${TDIR}/munch$$
X 
X cheap=no
X dictopt=
X wchars=
X while [ $# != 0 ]
X do
X     case "$1" in
X 	-d)
X 	    case "$2" in
X 		/dev/null)
X 		    dictopt=NONE
X 		    ;;
X 		*)
X 		    dictopt="-d $2"
X 		    ;;
X 	    esac
X 	    shift
X 	    ;;
X 	-e)
X 	    cheap=yes
X 	    ;;
X 	-w)
X 	    wchars="-w $2"
X 	    shift
X 	    ;;
X 	*)
X 	    break
X     esac
X     shift
X done
X #
X # Awk program to combine suffixes onto one line
X #
X AWKMUNCH='
X     {
X     if ($1 != old1  &&  old1 != "")
X 	{
X 	print old1 suffixes
X 	suffixes = ""
X 	}
X     old1 = $1
X     for (i = 2;  i <= NF;  i++)
X 	suffixes = suffixes "/" $i
X     }
X     END { if (old1 != "") print old1 suffixes }'
X #
X # Awk program to break suffixes up into one per line
X #
X AWKUNMUNCH='
X     {
X     print $1
X     for (i = 2;  i <= NF;  i++)
X 	print $1 "/" $i
X     }'
X trap "/bin/rm -f ${TMP}*; exit 1" 1 2 15
X #
X # Collect all the input (cat), convert to uppercase (tr), expand all
X # the suffix options (two sed's), and preserve (sorted) for later
X # joining.  Unless an explicitly null dictionary was specified, remove
X # all expanded words that are covered by the dictionary (ispell).
X #
X if [ "X$dictopt" = "XNONE" ]
X then
X     cat "$@" | tr '[a-z]' '[A-Z]' \
X       | sed -f $EXPAND1 | sed -f $EXPAND2 | sort -u > ${TMP}a
X else
X     cat "$@" | tr '[a-z]' '[A-Z]' \
X       | sed -f $EXPAND1 | sed -f $EXPAND2 | sort -u \
X       | ispell -l $dictopt -p /dev/null > ${TMP}a
X fi
X #
X # Munch the input to generate roots and suffixes (ispell -c).  We are
X # only interested in words that have at least one suffix (egrep /);  the
X # next step will pick up the rest.  Some of the roots are illegal.  We
X # use join to restrict the output to those root words that are found
X # in the original dictionary.
X #
X # Note:  one disadvantage of this pipeline is that for a large file,
X # the join and awk may be sitting around for a long time while ispell
X # and sort run.  You can get rid of this by splitting the pipe, at
X # the expense of more temp file space.
X #
X if [ $cheap = yes ]
X then
X     ispell $wchars -c -d /dev/null -p /dev/null < ${TMP}a \
X       | egrep / | sort -u -t/ +0 -1 +1 \
X       | join -t/ - ${TMP}a | awk -F/ "$AWKMUNCH" > ${TMP}b
X else
X     ispell $wchars -c -d /dev/null -p /dev/null < ${TMP}a \
X       | egrep / | sort -u -t/ +0 -1 +1 \
X       | join -t/ - ${TMP}a > ${TMP}b
X fi
X #
X # There is now one slight problem:  the suffix flags X, J, and Z
X # are simply the addition of an "S" to the suffixes N, G, and R,
X # respectively.  This produces redundant entries in the output file;
X # for example, ABBREVIATE/N/X and ABBREVIATION/S.  We must get rid
X # of the unnecessary duplicates.  The candidates are those words that
X # have only an "S" flag (egrep).  We strip off the "S" (sed), and
X # generate a list of roots that might have made these words (ispell -c).
X # Of these roots, we select those that have the N, G, or R flags,
X # replacing each with the plural equivalent X, J, or Z (sed -n).
X # Using join once again, we select those that have legal roots
X # and put them in ${TMP}c.
X #
X if [ $cheap = yes ]
X then
X     egrep '^[^/]*/S$' ${TMP}b | sed 's@/S$@@' \
X       | ispell -c -d /dev/null -p /dev/null \
X       | sed -n -e '/\/N/s/N$/X/p' -e '/\/G/s/G$/J/p' -e '/\/R/s/R$/Z/p' \
X       | sort -u -t/ +0 -1 +1 \
X       | join -t/ - ${TMP}a \
X       | awk -F/ "$AWKMUNCH" > ${TMP}c
X else
X     egrep '^[^/]*/S$' ${TMP}b | sed 's@/S$@@' \
X       | ispell -c -d /dev/null -p /dev/null \
X       | sed -n -e '/\/N/s/N$/X/p' -e '/\/G/s/G$/J/p' -e '/\/R/s/R$/Z/p' \
X       | sort -u -t/ +0 -1 +1 \
X       | join -t/ - ${TMP}a > ${TMP}c
X fi
X #
X # Now we have to eliminate the stuff covered by ${TMP}c from ${TMP}.
X # First, we re-expand the suffixes we just made (sed -f pair), and let
X # ispell re-create the /S version (ispell -c).  We select the /S versions
X # only (egrep), sort them (sort) for comm, and use comm to delete these
X # from ${TMP}b.  The output of comm (i.e., the trimmed version of
X # ${TMP}b) is combined with our special-suffixes file ${TMP}c (sort,
X # with preceding awk, if $cheap) and reduced in size (AWKMUNCH) to
X # produce a final list of all words that have at least one suffix.
X #
X if [ $cheap = yes ]
X then
X     sed -f $EXPAND1 < ${TMP}c | sed -f $EXPAND2 \
X       | ispell -c -d /dev/null -p /dev/null \
X       | egrep '\/S$' | sort -u -t/ +0 -1 +1 | comm -13 - ${TMP}b \
X       | awk -F/ "$AWKUNMUNCH" - ${TMP}c \
X       | sort -u -t/ +0 -1 +1 - \
X       | awk -F/ "$AWKMUNCH" > ${TMP}d
X else
X     sed -f $EXPAND1 < ${TMP}c | sed -f $EXPAND2 \
X       | ispell -c -d /dev/null -p /dev/null \
X       | egrep '\/S$' | sort -u -t/ +0 -1 +1 | comm -13 - ${TMP}b \
X       | sort -u -t/ +0 -1 +1 - ${TMP}c \
X       | awk -F/ "$AWKMUNCH" > ${TMP}d
X fi
X /bin/rm -f ${TMP}[bc]
X #
X # Now a slick trick.  Use ispell to select those (root) words from the original
X # list (${TMP}a) that are not covered by the suffix list (${TMP}d).  Then we
X # merge these with the suffix list and sort it to produce the final output.
X #
X ispell $wchars -d /dev/null -p ${TMP}d -l < ${TMP}a | tr -d \\015 \
X   | sort -u -t/ +0 -1 +1 - ${TMP}d
X /bin/rm -f ${TMP}*
SHAR_EOF
if test 6218 -ne "`wc -c < 'munchlist.sh'`"
then
	echo shar: error transmitting "'munchlist.sh'" '(should have been 6218 characters)'
fi
chmod +x 'munchlist.sh'
fi # end of overwriting check
echo shar: extracting "'ispell.el'" '(6763 characters)'
if test -f 'ispell.el'
then
	echo shar: will not over-write existing file "'ispell.el'"
else
sed 's/^X //' << \SHAR_EOF > 'ispell.el'
X ;;; Spelling correction interface for GNU EMACS using "ispell"
X 
X ;;; Walt Buehring
X ;;; Texas Instruments - Computer Science Center
X ;;; ARPA:  Buehring%TI-CSL at CSNet-Relay
X ;;; UUCP:  {smu, texsun, im4u, rice} ! ti-csl ! buehring
X 
X ;;; Depends on the ispell program snarfed from MIT-PREP in early 
X ;;; 1986.  The only interactive command is "ispell-word" which should be
X ;;; bound to M-$.  If someone writes an "ispell-region" command, 
X ;;; I would appreciate a copy.
X 
X ;;; To fully install this, add this file to your GNU lisp directory and 
X ;;; compile it with M-X byte-compile-file.  Then add the following to the
X ;;; appropriate init file:
X 
X ;;;  (autoload 'ispell-word "ispell"
X ;;;    "Check the spelling of word in buffer." t)
X ;;;  (global-set-key "\e$" 'ispell-word)
X 
X ;;; If run on a heavily loaded system, the timeout value in ispell-check 
X ;;; and the initial sleep time in ispell-init-process may need to be increased.
X 
X ;;; No warranty expressed or implied.  All sales final.  Void where prohibited.
X ;;; If you don't like it, change it.
X 
X (defvar ispell-syntax-table nil)
X 
X (if (null ispell-syntax-table)
X     ;; The following assumes that the standard-syntax-table
X     ;; is static.  If you add words with funky characters
X     ;; to your dictionary, the following may have to change.
X     (progn
X       (setq ispell-syntax-table (make-syntax-table))
X       ;; Make certain characters word constituents
X       (modify-syntax-entry ?' "w   " ispell-syntax-table)
X       (modify-syntax-entry ?- "w   " ispell-syntax-table)
X       ;; Get rid on existing word syntax on certain characters 
X       (modify-syntax-entry ?$ ".   " ispell-syntax-table)
X       (modify-syntax-entry ?% ".   " ispell-syntax-table)))
X 
X 
X (defun ispell-word (&optional quietly)
X   "Check spelling of word at or before dot.
X If word not found in dictionary, display possible corrections in a window 
X and let user select."
X   (interactive)
X   (let* ((current-syntax (syntax-table))
X 	 start end word poss replace)
X     (unwind-protect
X 	(save-excursion
X 	  ;; Ensure syntax table is reasonable 
X 	  (set-syntax-table ispell-syntax-table)
X 	  ;; Move backward for word if not already on one.
X 	  (if (not (looking-at "\\w"))
X 	      (re-search-backward "\\w" (dot-min) 'stay))
X 	  ;; Move to start of word
X 	  (re-search-backward "\\W" (dot-min) 'stay)
X 	  ;; Find start and end of word
X 	  (or (re-search-forward "\\w+" nil t)
X 	      (error "No word to check."))
X 	  (setq start (match-beginning 0)
X 		end (match-end 0)
X 		word (buffer-substring start end)))
X       (set-syntax-table current-syntax))
X     (or quietly (message "Checking spelling of %s..." (upcase word)))
X     (setq poss (ispell-check word))
X     (cond ((eq poss t)
X 	   (or quietly (message "Found %s" (upcase word))))
X 	  ((stringp poss)
X 	   (or quietly (message "Found it because of %s" (upcase poss))))
X 	  ((null poss)
X 	   (or quietly (message "Could Not Find %s" (upcase word))))
X 	  (t (setq replace (ispell-choose poss))
X 	     (if replace
X 		 (progn
X 		   (goto-char end)
X 		   (delete-region start end)
X 		   (insert-string replace)))))
X     poss))
X 
X 
X (defun ispell-choose (choices)
X   "Display possible corrections from list CHOICES.  Return chosen word or nil 
X if none chosen."
X   (unwind-protect 
X       (save-window-excursion
X 	(let ((count 0)
X 	      (words choices)
X 	      (pick -1)
X 	      (window-min-height 2))
X 	  (overlay-window 3)
X 	  (switch-to-buffer "*Choices*") (erase-buffer)
X 	  (setq mode-line-format "--  %b  --")
X 	  (while words
X 	    (if (> (+ 7 (current-column) (length (car words))) (window-width))
X 		(insert "\n"))
X 	    (insert "(" (+ count ?a) ") " (car words) "  ")
X 	    (setq words (cdr words)
X 		  count (1+ count)))
X 	  (select-window (next-window))
X 	  (while (eq pick -1)
X 	    (message "Enter letter to replace word;  Space to flush")
X 	    (let* ((char (read-char))
X 		   (num (1+ (- (upcase char) ?A))))
X 	      (cond ((= char ? ) (setq pick 0))
X 		    ((or (<= num 0) (> num count)) (ding))
X 		    (t (setq pick num)))))
X 	  (and (> pick 0) (nth (1- pick) choices))))
X     ;; Protected forms...
X     (bury-buffer "*Choices*")))
X 
X 
X (defun overlay-window (height)
X   "Create a (usually small) window with HEIGHT lines and avoid
X recentering."
X   (save-excursion
X     (let ((oldot (save-excursion (beginning-of-line) (dot)))
X 	  (top (save-excursion (move-to-window-line height) (dot)))
X 	  newin)
X       (if (< oldot top) (setq top oldot))
X       (setq newin (split-window-vertically height))
X       (set-window-start newin top))))
X 
X 
X (defvar ispell-process nil
X   "Holds the process object for 'ispell'")
X 
X ;;; create signal used by ispell-filter and ispell-check
X (put 'ispell-output 'error-conditions '(ispell-output))
X 
X (defun ispell-check (word)
X "Check spelling of string WORD, return either t for an exact match, a string
X containing the root word for a match via suffix removal, a list of possible 
X correct spellings, or nil for a complete miss."
X   (ispell-init-process)
X   (send-string ispell-process (concat word "\n"))
X   (condition-case output
X       (progn
X 	(sleep-for 20)
X 	(error "Timeout waiting for ispell process output"))
X     (ispell-output (ispell-parse-output (car (cdr output))))))
X 
X (defun ispell-parse-output (output)
X "Parse the OUTPUT string of 'ispell' and return a value as specified by the 
X 'ispell-check' function."
X   (cond
X    ((string= output "*") t)
X    ((string= output "#") nil)
X    ((string= (substring output 0 1) "+")
X     (substring output 2))
X    (t
X     (let ((choice-list '()))
X       (while (not (string= output ""))
X 	(let* ((start (string-match "[A-z]" output))
X 	       (end (string-match " \\|$" output start)))
X 	  (if start
X 	      (setq choice-list (cons (substring output start end)
X 				      choice-list)))
X 	  (setq output (substring output (1+ end)))))
X       choice-list))))
X 
X 
X (defvar ispell-process-output ""
X   "Holds partial output from the 'ispell' process")
X 
X (defun ispell-filter (process output)
X   "The filter-function for 'ispell'.  Signals complete line using the 
X ispell-output signal"
X   (if (string= "\n" (substring output (1- (length output))))
X       (progn
X 	(setq output (concat ispell-process-output
X 			     (substring output 0 (1- (length output))))
X 	      ispell-process-output "")
X 	(signal 'ispell-output (list output)))
X       (setq ispell-process-output (concat ispell-process-output output))))
X 
X (defun ispell-init-process ()
X   "Check status of 'ispell' process and start if necessary; set up 
X filter function for output."
X   (if (or (not ispell-process)
X 	  (not (eq (process-status ispell-process) 'run)))
X       (progn
X 	(message "Starting new ispell process...")
X 	(and (get-buffer "*ispell*") (kill-buffer "*ispell*"))
X 	(setq ispell-process (start-process "ispell" "*ispell*"
X 					   "ispell" "-a"))
X 	(set-process-filter ispell-process 'ispell-filter)
X 	(process-kill-without-query ispell-process)
X 	(sit-for 3))))
X 
SHAR_EOF
if test 6763 -ne "`wc -c < 'ispell.el'`"
then
	echo shar: error transmitting "'ispell.el'" '(should have been 6763 characters)'
fi
fi # end of overwriting check
echo shar: extracting "'buildhash.c'" '(6459 characters)'
if test -f 'buildhash.c'
then
	echo shar: will not over-write existing file "'buildhash.c'"
else
sed 's/^X //' << \SHAR_EOF > 'buildhash.c'
X /* -*- Mode: Text -*- */
X /*
X  * buildhash.c - make a hash table for ispell
X  *
X  * Pace Willisson, 1983
X  */
X 
X #include <stdio.h>
X #include <sys/types.h>
X #include <sys/stat.h>
X #include <sys/param.h>
X #include "ispell.h"
X #include "config.h"
X 
X #define NSTAT 100
X struct stat dstat, cstat;
X 
X int numwords, hashsize;
X 
X char *malloc();
X 
X struct dent *hashtbl;
X 
X char *Dfile;
X char *Hfile;
X 
X char Cfile[MAXPATHLEN];
X char Sfile[MAXPATHLEN];
X 
X main (argc,argv)
X int argc;
X char **argv;
X {
X 	FILE *countf;
X 	FILE *statf;
X 	int stats[NSTAT];
X 	int i;
X 
X 	if (argc > 1) {
X 		++argv;
X 		Dfile = *argv;
X 		if (argc > 2) {
X 			++argv;
X 			Hfile = *argv;
X 		}
X 		else
X 			Hfile = DEFHASH;
X 	}
X 	else {
X 		Dfile = DEFDICT;
X 		Hfile = DEFHASH;
X 	}
X 
X 	sprintf(Cfile,"%s.cnt",Dfile);
X 	sprintf(Sfile,"%s.stat",Dfile);
X 
X 	if (stat (Dfile, &dstat) < 0) {
X 		fprintf (stderr, "No dictionary (%s)\n", Dfile);
X 		exit (1);
X 	}
X 
X 	if (stat (Cfile, &cstat) < 0 || dstat.st_mtime > cstat.st_mtime)
X 		newcount ();
X 
X 	if ((countf = fopen (Cfile, "r")) == NULL) {
X 		fprintf (stderr, "No count file\n");
X 		exit (1);
X 	}
X 	numwords = 0;
X 	fscanf (countf, "%d", &numwords);
X 	fclose (countf);
X 	if (numwords == 0) {
X 		fprintf (stderr, "Bad count file\n");
X 		exit (1);
X 	}
X 	hashsize = numwords;
X 	readdict ();
X 
X 	if ((statf = fopen (Sfile, "w")) == NULL) {
X 		fprintf (stderr, "Can't create %s\n", Sfile);
X 		exit (1);
X 	}
X 
X 	for (i = 0; i < NSTAT; i++)
X 		stats[i] = 0;
X 	for (i = 0; i < hashsize; i++) {
X 		struct dent *dp;
X 		int j;
X 		if (hashtbl[i].used == 0) {
X 			stats[0]++;
X 		} else {
X 			for (j = 1, dp = &hashtbl[i]; dp->next != NULL; j++, dp = dp->next)
X 				;
X 			if (j >= NSTAT)
X 				j = NSTAT - 1;
X 			stats[j]++;
X 		}
X 	}
X 	for (i = 0; i < NSTAT; i++)
X 		fprintf (statf, "%d: %d\n", i, stats[i]);
X 	fclose (statf);
X 
X 	filltable ();
X 
X 	output ();
X 	exit(0);
X }
X 
X output ()
X {
X 	FILE *outfile;
X 	struct hashheader hashheader;
X 	int strptr, n, i;
X 
X 	if ((outfile = fopen (Hfile, "w")) == NULL) {
X 		fprintf (stderr, "can't create %s\n",Hfile);
X 		return;
X 	}
X 	hashheader.magic = MAGIC;
X 	hashheader.stringsize = 0;
X 	hashheader.tblsize = hashsize;
X 	fwrite (&hashheader, sizeof hashheader, 1, outfile);
X 	strptr = 0;
X 	for (i = 0; i < hashsize; i++) {
X 		n = strlen (hashtbl[i].word) + 1;
X 		fwrite (hashtbl[i].word, n, 1, outfile);
X 		hashtbl[i].word = (char *)strptr;
X 		strptr += n;
X 	}
X 	for (i = 0; i < hashsize; i++) {
X 		if (hashtbl[i].next != 0) {
X 			int x;
X 			x = hashtbl[i].next - hashtbl;
X 			hashtbl[i].next = (struct dent *)x;
X 		} else {
X 			hashtbl[i].next = (struct dent *)-1;
X 		}
X 	}
X 	fwrite (hashtbl, sizeof (struct dent), hashsize, outfile);
X 	hashheader.stringsize = strptr;
X 	rewind (outfile);
X 	fwrite (&hashheader, sizeof hashheader, 1, outfile);
X 	fclose (outfile);
X }
X 
X filltable ()
X {
X 	struct dent *freepointer, *nextword, *dp;
X 	int i;
X 
X 	for (freepointer = hashtbl; freepointer->used; freepointer++)
X 		;
X 	for (nextword = hashtbl, i = numwords; i != 0; nextword++, i--) {
X 		if (nextword->used == 0) {
X 			continue;
X 		}
X 		if (nextword->next == NULL) {
X 			continue;
X 		}
X 		if (nextword->next >= hashtbl && nextword->next < hashtbl + hashsize) {
X 			continue;
X 		}
X 		dp = nextword;
X 		while (dp->next) {
X 			if (freepointer > hashtbl + hashsize) {
X 				fprintf (stderr, "table overflow\n");
X 				getchar ();
X 				break;
X 			}
X 			*freepointer = *(dp->next);
X 			dp->next = freepointer;
X 			dp = freepointer;
X 
X 			while (freepointer->used)
X 				freepointer++;
X 		}
X 	}
X }
X 
X 
X readdict ()
X {
X 	struct dent d;
X 	char lbuf[100];
X 	FILE *dictf;
X 	int i;
X 	int h;
X 	char *p;
X 
X 	if ((dictf = fopen (Dfile, "r")) == NULL) {
X 		fprintf (stderr, "Can't open dictionary\n");
X 		exit (1);
X 	}
X 
X 	hashtbl = (struct dent *) calloc (numwords, sizeof (struct dent));
X 	if (hashtbl == NULL) {
X 		fprintf (stderr, "couldn't allocate hash table\n");
X 		exit (1);
X 	}
X 
X 	i = 0;
X 	while (fgets (lbuf, sizeof lbuf, dictf) != NULL) {
X 		if (i % 1000 == 0) {
X 			printf ("%d ", i);
X 			fflush (stdout);
X 		}
X 		i++;
X 
X 		p = &lbuf [ strlen (lbuf) - 1 ];
X 		if (*p == '\n')
X 			*p = 0;
X 
X 		if (makedent (lbuf, &d) < 0)
X 			continue;
X 
X 		d.word = malloc (strlen (lbuf) + 1);
X 		if (d.word == NULL) {
X 			fprintf (stderr, "couldn't allocate space for word %s\n", lbuf);
X 			exit (1);
X 		}
X 		strcpy (d.word, lbuf);
X 
X 		h = hash (lbuf, strlen (lbuf), hashsize);
X 
X 		if (hashtbl[h].used == 0) {
X 			hashtbl[h] = d;
X 
X 		} else {
X 			struct dent *dp;
X 
X 			dp = (struct dent *) malloc (sizeof (struct dent));
X 			if (dp == NULL) {
X 				fprintf (stderr, "couldn't allocate space for collision\n");
X 				exit (1);
X 			}
X 			*dp = d;
X 			dp->next = hashtbl[h].next;
X 			hashtbl[h].next = dp;
X 		}
X 	}
X 	printf ("\n");
X }
X 
X /*
X  * fill in the flags in d, and put a null after the word in s
X  */
X 
X makedent (lbuf, d)
X char *lbuf;
X struct dent *d;
X {
X 	char *p, *index();
X 
X 	d->next = NULL;
X 	d->used = 1;
X 	d->v_flag = 0;
X 	d->n_flag = 0;
X 	d->x_flag = 0;
X 	d->h_flag = 0;
X 	d->y_flag = 0;
X 	d->g_flag = 0;
X 	d->j_flag = 0;
X 	d->d_flag = 0;
X 	d->t_flag = 0;
X 	d->r_flag = 0;
X 	d->z_flag = 0;
X 	d->s_flag = 0;
X 	d->p_flag = 0;
X 	d->m_flag = 0;
X 
X 	p = index (lbuf, '/');
X 	if (p != NULL)
X 		*p = 0;
X 	if (strlen (lbuf) > WORDLEN - 1) {
X 		printf ("%s: word too big\n");
X 		return (-1);
X 	}
X 
X 	if (p == NULL)
X 		return (0);
X 
X 	p++;
X 	while (*p != NULL) {
X 		switch (*p) {
X 		case 'V': d->v_flag = 1; break;
X 		case 'N': d->n_flag = 1; break;
X 		case 'X': d->x_flag = 1; break;
X 		case 'H': d->h_flag = 1; break;
X 		case 'Y': d->y_flag = 1; break;
X 		case 'G': d->g_flag = 1; break;
X 		case 'J': d->j_flag = 1; break;
X 		case 'D': d->d_flag = 1; break;
X 		case 'T': d->t_flag = 1; break;
X 		case 'R': d->r_flag = 1; break;
X 		case 'Z': d->z_flag = 1; break;
X 		case 'S': d->s_flag = 1; break;
X 		case 'P': d->p_flag = 1; break;
X 		case 'M': d->m_flag = 1; break;
X 		case 0:
X  			fprintf (stderr, "no key word %s\n", lbuf);
X 			continue;
X 		default:
X 			fprintf (stderr, "unknown flag %c word %s\n", 
X 					*p, lbuf);
X 			break;
X 		}
X 		p++;
X 		if (*p != '/' && *p != NULL && *p != '\n') {
X 			fprintf (stderr, "bad format %s (%c 0%o)\n", 
X 					lbuf, *p, *p);
X 			break;
X 		}
X 		if (*p)
X 			p++;
X 	
X 	}
X 	return (0);
X }
X 
X newcount ()
X {
X 	char buf[200];
X 	FILE *d;
X 	int i;
X 
X 	fprintf (stderr, "Counting words in dictionary ...\n");
X 
X 	if ((d = fopen (Dfile, "r")) == NULL) {
X 		fprintf (stderr, "Can't open dictionary\n");
X 		exit (1);
X 	}
X 
X 	i = 0;
X 	while (fgets (buf, sizeof buf, d) != NULL) {
X 		i++;
X 		if (i % 1000 == 0) {
X 			printf ("%d ", i);
X 			fflush (stdout);
X 		}
X 	}
X 	fclose (d);
X 	printf ("\n%d words\n", i);
X 	if ((d = fopen (Cfile, "w")) == NULL) {
X 		fprintf (stderr, "can't create %s\n", Cfile);
X 		exit (1);
X 	}
X 	fprintf (d, "%d\n", i);
X 	fclose (d);
X }
SHAR_EOF
if test 6459 -ne "`wc -c < 'buildhash.c'`"
then
	echo shar: error transmitting "'buildhash.c'" '(should have been 6459 characters)'
fi
fi # end of overwriting check
#	End of shell archive
exit 0
-- 

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff