Spelling utilities in C (Part 1 of 3)
Graham Toal
gtoal at tharr.UUCP
Thu Jun 28 17:01:49 AEST 1990
#!/bin/sh-----cut here-----cut here-----cut here-----cut here-----
# shar: Shell Archiver
# Run the following text with /bin/sh to create:
# README # copyr.h # dawg.h # dictlib.h # grope.h
cat - << \SHAR_EOF > README
This is a suite of programs for working with words. I am releasing
it to alt.sources primarily to invite people to test it on as many
platforms as possible (mail me back any bug reports please!) so that
the final code will be completely portable.
Unpack with unshar or sh < part1; sh < part2; sh < part3
Utilities for spelling checking and correction are supplied, but
the ultimate aim is to support all sorts of word-manipulation activities
such as a writers workbench and assorted word games (See PS).
This news posting is in three parts; part1: this file + headers;
part2: utility modules in .c files to be #include'd; part3: main
programs; just CC these and they should work -- no fancy makefiles
or link commands.
The code here doesn't have much of a user interface; I'm rather hoping
that people who pick it up will hook it in to their favourite operating
system as smoothly as they can. Perhaps someone with the time to
spare (who am I kidding :-( ) might integrate it with ispell for instance.
I've experimented with various interfaces already; I can accept TeX
input and write output suitable for correcting text in either emacs
or uEmacs. I'm not going to release these though, until I'm happy
with the internal code. Incidentally, the code will be totally
public domain - meaning that I've no objection to its being used
in commercial projects. HOWEVER I would appreciate if you didn't do so
until it is officially released (probably through comp.sources.misc)
because I would prefer to define portable file-formats and magic numbers
first.
OK, enough babble; here's how it all works:
1) acquire a list of words for yourself; I have one but at 690K it is
too big to post. (actually even these sources come to around 120K
so I hope they make it through OK. I don't think I have a shar
that splits, so I've divided into three shars myself.)
The word list should be sorted by ascii order.
2) cc dawg.c
I've deliberately made main programs into one source file, by
including *.c for utility modules; I know this is bad style but
it helps make compiling and porting easier (no worries about
long external names for instance, or non-case-sensitive linkers)
3) dawg yourdict dict
'yourdict' is your wordlist; but the second parameter should
be 'dict' at least while testing. This will generate dict.dwg
which is a compacted and *FAST* data structure for word access.
Building a dawg[See next section for meaning of name] from a word
list is a real memory pig; so I've written code which will let you
generate the dawg in chunks (traded off against a small loss of
compression density). (Interestingly enough it isn't a time-pig;
using a hash-table for node merging gives almost linear time - thanks
to Richard Hooker for that one).
If you don't have enough memory (say 2Mb?) you'll get a run-time
message suggesting that you recompile with cc -DSMALL_MEMORY dawg.c
(OK, so one day I'll make it select this mode itself at run-time)
IBM PC's and Mac's will get this mode by default. [See FOOTNOTE]
If you want to check that the data structure generated is ok,
3a) cc pdawg.c
3b) pdawg dict.dwg
This should decompile the data and print out the
original word list
4) cc dwgcheck.c
Compile a Mickey Mouse spelling checker
5) dwgcheck a few wurdz on the comand line ?
... and test it.
6) cc tell.c
Now try the 'advanced' stuff! - soundex correction... (I hates it :-()
7) tell me sume wrongli speld werds
and test it...
If you're getting into this stuff, here's another incidental
hack you might want to try...
7a) cc proot.c
7b) proot root
print out all words of form 'root*'
One day I'll hook this stuff into regexp; but not today :-)
----------
Enough examples for now. If that all worked you are doing well!
If it didn't, and you have the time, please find out what went wrong;
If you can fix it, mail me the fix please, else mail me a log of
the symptoms, such as compiler error messages.
By the way, I know that the spelling correction uses a tacky algorithm;
don't worry about it -- I'm working on some red hot stuff which will
put soundex to shame (which isn't hard :) ) [Late note; it now works
(apparently well) but I need a phonetic dictionary before it is useful :-(
Anyone got a phonetic dictionary? Alvey people out there?]
You might be interested in the data structures used; the main one is
a dawg - a 'Directed Acyclic Word Graph' -- it's essentially a
Directed Acyclic Graph implementing an FSM which describes the set of
all words in your lexicon. The name DAWG comes from Appel's in his paper
on Scrabble (although none of the code or algorithms do).
The second most important data structure is superimposed on this
one; it is a packing algorithm due to Liang (of TeX hyphenation
fame) which allows you to convert an implicitly linked trie
into an indexed trie *without* taking any more space. Liang is
a real smart cookie & it is a shame this technique of his is
not better known... (it should be up there among binary tree and
linked lists!)
OK, more examples:
8) cc pack.c
compile the packing code
9) pack dict.dwg dict.pck
this takes rather a long time; you'll get messages saying '1000 nodes'
'2000 nodes' etc roughly 1 every 20 seconds +/- 50%. There shouldn't
be more than a couple of screenfuls of these :-) I'll speed it up one
day.
If you want to check that this data structure generated is ok too,
9a) cc ppack.c
9b) ppack dict.pck
Just as you did for (3)
10) cc pckcheck.c
Just like dwgcheck but using the other type of data.
11) pckcheck sume moar possibly incorrectly speld woards
boring, huh?
12) Tha Tha Thats all folks!
---------------
Unless I've made some major cockup at the last minute, you should
actually have a reasonable chance of getting these working on your
machine, whatever it is. (Dec 20's and EBCDIC machines *possibly*
excepted - attempts welcome!)
As I said, please mail me your bugfixes, problems, suggestions etc.
By the way, the standard of coding isn't exceptionally high; this
implementation was always intended to be a prototype. A rewrite
of dawg.c and pack.c is definitely high on the priority list...
The actual application programs aren't too bad, but the interfaces
to the various utility routines are liable to change on an hour-by-hour
basis! (I've been trying to make them more consistent -- you should have
seen the last lot ;-) ).
However, the algorithms are all complete now; anyone who has ever needed
to convert a tree to a dag might be interested in the linear-time solution
in dawg.c - most other solutions I've seen have been NlogN or N^2.
If you're interested, there's a file dictlib.h which is a proposed interface
for the next iteration; comments are invited.
Share & Enjoy,
Graham Toal <gtoal at uk.ac.ed>
<FOOTNOTE>
I'm experimenting in this code with ways of finding out at compile
time various parameters needed for portability. There's a file
grope.h which does this. If things all work first time, great;
if not, you may have to change various #define's. The best way
to do that is to add code to grope.h which identifies your system,
and only make changes of the form:
#ifdef SYS_MYSYSTEM
#undef something
#define something (a different value)
#endif /* my system */
There are instructions for customising in grope.h itself. My overall
aim is to avoid special makefiles and special command line options.
(A bit ambitious I know, but nice idea if it works...)
(Steve Pemberton's config.c might be useful to you too if you are
hacking grope.h. It was reposted on usenet recently.)
</FOOTNOTE>
<PS>
Future hacks:
1) Anagrams
2) Crosswords
3) Scrabble
4) Anything where a text key can be used to lookup another
piece of text. This is implemented with the 'dawg_locate_prefix'
routine; it is effectively a cheap substitute for btrees etc.
(and better than a hash table because you can *do* things with it!)
4a) Reverse phone book 1234567=Fred Bloggs
4b) house style enforcer stupid=infelicitous
4c) uk/us spelling advisor sidewalk=pavement
4d) shorthand/macro expander f.y.i.=for your information
etc etc... (Most of wwb in fact)
5) Anything you can suggest! (or *implement* :-) )
</PS>
<MEMO>
Archive-name: dawgutils/part[1-3].shar
Reply-to: Graham Toal <gtoal%uk.ac.ed at nsfnet-relay.ac.uk>
1) Upload fixed version of dyntrie.c - remember bug with signed chars
being used as an index [comment rule in dawg.c?]
2) known design flaw: can't handle strings with 0 in them
3) known bug: dyntrie.c has problems if you enter a string
which *starts* with 255. This is due to sending 'end_of_node'
bit rather hackily. Can be fixed.
4) Test version to be posted to alt.sources by running on machine
with signed chars (eg MSDOS)
5) Remove hacky Malloc debugging macros in utils.c -- there might be
a problem if compiled on MSDOS but not under MICROSOFT C
6) Check that LIB_STRINGS is *not* defined when UX63 is set.
7) tweak the constants in pack.c to speed it up a lot
8) hack the pack.c heuristics into dyntrie.c to speed it up too.
</MEMO>
SHAR_EOF
cat - << \SHAR_EOF > copyr.h
/*
* Copyright 1989 Graham Toal & Richard Hooker
*
* Permission to use, copy, modify, and distribute this software and its
* documentation for any purpose with or without fee is hereby granted provided
* that the above copyright notice appear in all copies and that both that
* copyright notice and this permission notice appear in supporting
* documentation.
*
* This program is publicly available, but is NOT in the public domain. The
* difference is that copyrights granting rights for unrestricted use and
* redistribution have been placed on the software to identify its
* authors. You are allowed and encouraged to take this software and
* build commercial products, freeware products, shareware products, and
* any other kind of product you can think up, with the following restriction:
*
* If you redistribute the source to this program, or a derivitive of that
* source, you must include this copyright notice intact. If the system
* this source is distributed with is under a stricter license (such as
* a commercial license, the typical freeware "no commercial use" license,
* or the FSF copyleft) then you must provide the original source under the
* original terms.
*
* Edinburgh Software Products (E.S.P.) makes no representations about the
* suitability of this software for any purpose. It is provided "as is"
* without express or implied warranty.
*
* E.S.P. DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING
* ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL
* E.S.P. BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR
* ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER
* IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
* OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
*
* Authors: Graham Toal & Richard Hooker, Edinburgh Software Products,
* 6 Pypers Wynd, Prestonpans, East Lothian, Scotland.
*/
SHAR_EOF
cat - << \SHAR_EOF > dawg.h
/**
*
* DAWG.H
*
* Header file for Directed Acyclic Word Graph access
*
* The format of a DAWG node (8-bit arbitrary data) is:
*
* 31 24 23 22 21 0
* +--------------------+---+---+--+-------------------------------------+
* | Letter | W | N |??| Node pointer |
* +--------------------+---+---+--+-------------------------------------+
*
* where N flags the last edge in a node and W flags an edge as the
* end of a word. 21 bits are used to store the node pointer, so the
* dawg can contain up to 262143 edges. (and ?? is reserved - all code
* generating dawgs should set this bit to 0 for now)
*
* The root node of the dawg is at address 1 (because node 0 is reserved
* for the node with no edges).
*
* **** PACKED tries do other things, still to be documented!
*
**/
#define TRUE (0==0)
#define FALSE (0!=0)
#define DAWG_MAGIC 0x01234567
#define PACK_MAGIC 0x34567890
#define TERMINAL_NODE 0
#define ROOT_NODE 1
#define V_END_OF_WORD 23
#define M_END_OF_WORD (1L << V_END_OF_WORD)
#define V_END_OF_NODE 22 /* Bit number of N */
#define M_END_OF_NODE (1L << V_END_OF_NODE) /* Can be tested for by <0 */
#define V_FREE 22 /* Overloading this bit, as
packed tries need no end-of-node */
#define M_FREE (1L << V_FREE)
#define V_LETTER 24
#define M_LETTER 0xFF
#define M_NODE_POINTER 0x1FFFFFL /* Bit mask for node pointer */
/* max_chars and base_chars are a hangover from the days when this
was trying to save space, and dawgs were only able to contain
characters in the range 'a'..'z' 'A'..'Z' (squeezed into 6 bits).
Now that we're '8-bit clean' we no longer need these. Later code
in fact doesn't support the old style; but some procedures still
subtract 'BASE_CHAR' from the character to normalize it. Since it
is now 0 it is a no-op... */
#define MAX_CHARS 256
#define BASE_CHAR 0
/* Enough? */
#define MAX_WORD_LEN 256
#define MAX_LINE 256
typedef long NODE;
typedef long INDEX;
SHAR_EOF
cat - << \SHAR_EOF > dictlib.h
/* This header file does not describe any of the code in the
accompanying files; it is a rough sketch of the NEXT iteration
of the spelling/word utilities. Comments are invited. Missing
functionality or bits that won't work should be pointed out
please! Send mail to gtoal at ed.ac.uk (via nsfnet-relay.ac.uk
from US sites)
many thanks.
Graham Toal 23/06/90
*/
typedef long INDEX;
typedef long NODE;
#define ROOTNODE 1L
typedef struct DICT {
/* Although primitive procedures will be provided for handling the
basic dawg array, by using this interface applications can
remain independent of the implementation details of the
dictionary. */
/* So far there are three forms of dict; 1: a simple dawg with edges
as 4-byte integers; each node (a set of branches) is stored
sequentially. 2: An *indexed* form of the above -- much faster
to lookup in, but slower to walk through. Although indexing
would normally be heavy on space, this uses Liang's packed
overlapping tries, thus using almost exactly the same space as
method 1. 3: A form of 1, but with all the stops pulled out
to get as much bit-level compaction as possible; edges can
be two bytes instead of 4. (by Keith Bostic) */
/* This struct contains both access procedures and information;
most fields will be filled in by our code when the dictionary
is loaded by load_dict(). The fields may be added to in
later releases, but they will always be upwards compatible;
none of this data is saved to disk in this format, so changing
the struct layout will not cause problems as long as the names
remain the same. */
/* Note that some procedures take a user-supplied 'apply' parameter;
This 'apply' procedure has a 'context' parameter which is merely
a handle to allow the user to pass data in to his code without
having to use global variables. If the user's apply procedure
returns anything other than 0, it will cause an early exit, with
the users return code being returned from the calling function. */
/* It is intended that these are used in an object-oriented way;
although we will supply an external dict_check(word, dict) procedure, for
instance, all it will do is call dict->check(word, dict->dawg) */
NODE *dawg;
/* Root of the dictionary tree */
char *dict_name, /* "smiths" - used in command lines etc. */
*dict_info, /* "Johns Smiths Rhyming Dictionary,\n
(c) Smith & Jones, 1892\n
Whitechapel, London.\n" */
*dict_fname, /* "/usr/dict/smiths" */
*lang_charset; /* "ISO 8859/1 Latin 1" */
/* These are filled in by us to make life easier for the applications
programmer. We'll supply a default table for simple 7-bit ascii
dawgs; but this mechanism allows us the option of including all
the salient points about the character set used in a dictionary
in the dictionary itself. Because they are char*'s, we'll probably
end up sharing the same table between multiple dictionaries --
so don't write to them. (Although you may replace the _pointer_
with one to a table of your own. */
/* Later note: I've just found out about LOCALE.H etc. This stuff
may therefore change... */
int *chartype;
/* 1 whitespace */
/* 2 punctuation */
/* 4 blank */
/* 8 lower case letter */
/* 16 upper case letter */
/* 32 (decimal) digit */
/* 64 underscore */
/* 128 A-F and a-f */
/* 256 Ignore words starting
with this char */
/* 512 Include in definition
of a word -- mainly
alpha but also hyphen
and apostrophe */
char *lower, /* personal implementation of tolower() */
*unaccented, /* map accented chars to non-accenmted */
*lexorder; /* an arbitrary ordering for sorting:
e-acute would come after e and before f
*/
int (* check) (NODE *base, INDEX state, struct DICT *d, char *word);
/* Check a word - exact match; return true/false */
int (* fuzzy) (NODE *base, INDEX state, struct DICT *d, char *word,
char *value);
/* Check a word - fuzzy case, accents, hyphens etc; return true/false */
/* value is filled in with the dictionary's idea of what the
word should look like. */
/* Fancier correction procedures may be synthesised out of user code and
'walk' */
int (* correct) (
char *word,
NODE *base, INDEX state,
/* Normally dict's root, but may be subtree */
struct DICT *d, /* lowercase/accent information from here */
int maxwords,
int order, /* 0: by alpha
1: by highest score first
2: by lexical ordering
(ignores case, accents, hyphens)
*/
int (* apply) (
char *word, /* Corrected word */
int score, /* Normalised to range 0..255 */
void *context),
void *context);
/* Offer spelling corrections for the
given word. If maxwords > 0, return
the best <maxwords> words in order
of most-likely first; otherwise
many words may be returned, in
alphabetical order, coupled with
a score: 255 means exact match; 0
means totally different */
/* returns 0 if user returned 0 for
all applications */
int (* walk) (
NODE *base, INDEX state,
int (* apply) (char *word, void *context),
void *context);
/* Apply the procedure given to every
word in the dictionary tree or subtree,
passing in the user-supplied context */
/* returns 0 if user returned 0 for
all applications */
int (* lookup) (
NODE *base, INDEX state,
char *key,
/* Out */
INDEX values);
/* Return the index of the subtree of the
dictionary which has 'key' as a prefix;
for instance, in a reverse telephone
book, you might find entries:
0874=Anytown
0875=Cockenzie
0875=Port Seton
0875=Prestonpans
0876=Toytown
In this case, if searching for "0875=",
a subtree would be returned containing:
Cockenzie
Port Seton
Prestonpans
This subtree would then be walked
using the 'walk' function and a user-
written 'apply' procedure. */
/* returns 0 if successful */
/* The following two are performance hints ONLY. They should not
affect the correct functioning of a program. */
int (* uncache) (struct DICT *d);/* The space a dictionary uses may be
reclaimed by calling this. If the
dictionary is subsequently accessed,
there *may* be a performance penalty --
for instance the dictionary may be
accessed off disk. However on a machine
with sufficient memory, the system
may choose to leave the data in ram
until, say, malloc runs out of space.
If there is no convenient mechanism
for organising the demand-recacheing,
this may be a no-op. */
/* returns 0 if successful */
int (* recache) (struct DICT *d);/* The data of the dictionary is reloaded
into active memory where it will stay */
/* returns 0 if successful */
/* May be extended in the future */
} DICT;
/*
dict_load
This is provided as a primitive so that dictionaries can be loaded
in the most efficient way the machine supports. The space for the dict
is supplied *by* this procedure - not *to* it. This allows a Mach (or Emas)
implementation to mmap (or Connect) the file in memory - the connection
address being chosen by the OS and outside our control. It also allows
systems like RISC OS on the Acorn Archimedes to use an optimised whole-
file load call to pull the file off disk into real ram in a single disk
operation.
Since the data parts of the type 1 & 2 dicts are just a set of 4-byte
integers, it is easy to correct a faulty byte-sex by running down this
array *once at start-up* to reverse the order. This is easy if the
data is loaded into physical ram. If it is connected as a file by a VM system
however, the VM system must support copy-on-write; if it only has read-only
mapping there must be two files available -- one of each sex.
A *possible* scheme on VM systems is to byte-sex-reverse a whole
page the first time that page is touched. This may be a lot of unneccesary
work - I'd recommend keeping a correct-sex file on-line.
This code will create the DICT struct, and fill in the various fields,
either from the dictionary file itself or from private knowledge if not available.
*/
int dict_load_file(char *fname, DICT **dict_struct);
/* fname is the literal file name */
/* dict_struct is allocated by us,
not by the user. */
/* returns 0 if successfully loaded */
int dict_load_dict(char *dname, DICT **dict_struct);
/* dname is the dictionary name,
e.g. "words", or "web2" etc.
The corresponding file will be searched
for via a system-dependent path, logical
variable, or fixed place. */
/* dict_struct is allocated by us,
not by the user. */
/* returns 0 if successfully loaded */
int dict_unload(DICT *d); /* Use when totally finished with data */
/* These are user-level veneers for the internal procedures used above.
Use them in preference to the above. Use the above only if
you are a speed-freak! (but disguise them in macros such as:
#define dict_check(dawg, dict, word) dict->check(dict->dawg, dict, word)
These macros will of course go 'Bang!' if dict ptr is unassigned or NULL.
*/
/* In the worst case, if a machine has a poor quality compiler which
doesn't support procedure variables well, these routines could be
the *acual* implementation rather than just a pointer to it. */
int check (NODE *base, INDEX state, DICT *d, char *word);
int fuzzy (NODE *base, INDEX state, DICT *d, char *word, char *value);
int correct (char *word,
NODE *base, INDEX state,
/* Normally dictionary root, but may be subtree */
DICT *d, /* lowercase/accent information from here */
int maxwords,
int order, /* 0: by alpha
1: by highest score first
2: by lexical ordering
*/
int apply(char *word, /* Corrected word */
int score, /* Normalised to range 0..255 */
void *context),
void *context);
int walk (NODE *base, INDEX state, /* Yoyo fanatics will be pleased
to learn that they can 'walk the dawg'... :-) */
int apply(char *word, void *context),
void *context);
int lookup (NODE *base, INDEX state,
char *key,
/* Out */
INDEX values);
/* Now here come the fast internal procedures which the above eventually call */
int dawg_check(NODE *base, INDEX state, char *word);
/* Standard dawg */
int pack_check(NODE *base, INDEX state, char *word);
/* Overlapped indexed dawg */
int bsd_check(NODE *base, INDEX state, char *word);
/* Space-optimised dawg (called bsd because Keith Bostic is working
on this format for the next bsd4.4 spelling checker) */
/* Must make the internl procs agree with these... save the macros for later...
#define dict_check(word,dict) \
(dict != NULL ? (dict->check(dict->dawg, ROOTNODE, word)) : -1)
*/
int dawg_walk(NODE *base, INDEX state,
int apply(char *word, void *context),
void *context);
int pack_walk(NODE *base, INDEX state,
int apply(char *word, void *context),
void *context);
int bsd_walk(NODE *base, INDEX state,
int apply(char *word, void *context),
void *context);
/*
#define dict_walk(dict, apply, context) \
(dict != NULL ? (dict->walk(dict->dawg, ROOTNODE, apply, context)) : -1)
*/
int dawg_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values);
int pack_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values);
int bsd_lookup(NODE *base, INDEX state, char *key, /* Out */ INDEX values);
/* etc... */
/* Get list of possible next characters from this point in the tree,
put it into user-supplied array branch[256]. Filled with
NULL if no branch, or an INDEX if it points to more of the tree.
Words which may terminate on a particular letter have terminate[let] != 0
The user-supplied terminate array is deliberately a long array to allow
for expansion; it may be used one day to return arbitrary codes such as
'Noun' etc...
*/
void dawg_branch(NODE *base, INDEX state, NODE *branch, long *terminate);
void pack_branch(NODE *base, INDEX state, NODE *branch, long *terminate);
void bsd_branch(NODE *base, INDEX state, NODE *branch, long *terminate);
/* No internal proc yet... add it in the morning. */
/* In fact there will be even simpler procedures which can be used by
people who prefer to include the whole source of this package rather
than just the library interface parts. In applications where utmost
speed is a neccessity (eg Scrabble(tm)) you could use these, or write
your own. But most of us can use this clean interface with unnoticable
overhead. */
SHAR_EOF
cat - << \SHAR_EOF > grope.h
#ifndef grope_h
#define grope_h 1
#ifdef TESTING_DEFS
/*###################################################################*/
/*
* This is an exercise to assist in writing portable programs.
*
* To use as a header file, eg "grope.h", just leave this file
* as it is. To use it to define new entries, rename it as
* "grope.c" and compile it as "cc -DTESTING_DEFS grope".
*
* To customise this test program for your system, I have set it up
* so that all features are enabled. If you find that one feature
* causes a compile-time error, test a suitable set of '#ifdef's for
* your machine and #undef the values below which are not appropriate.
*
* Please do your best to see that your tests are unique, and that
* you do not undo any tests already implemented.
*
* One last request; PLEASE PLEASE PLEASE send me any updates you
* may make. If you cannot get through to me on email, send me a
* paper copy (or even better, a floppy copy :-)) to Grahan Toal,
* c/o 6 Pypers wynd, Prestonpans, East Lothian, Scotland EH32 9AH.
*
* Graham Toal <gtoal at uk.ac.ed>
* [PS: I can read DOS and RISC OS floppies only]
*/
#define STDLIBS 1
#define PROTOTYPES 1
#define KNOWS_VOID 1
#define KNOWS_LINE 1
#define KNOWS_REGISTER 1
#endif /* TESTING_DEFS */
/*###################################################################*/
#define SYS_ANY 1
#define COMPILER_ANY 1
/* Don't yet know what machine it is. Add a test once identified. */
/*--------------------------*/
#ifdef MSDOS
#define SYS_MSDOS 1
#endif
/*--------------------------*/
#ifdef __STDC__
#define STDLIBS 1
#define PROTOTYPES 1
#define KNOWS_VOID 1
/* If the compiler defines STDC it should have these. We can undef
them later if the compiler was telling lies! */
#endif
/*--------------------------*/
#ifdef MPU8086
#define SYS_MSDOS 1
/* Aztec does not #define MSDOS.
We assume it is Aztec on MSDOS from the MPU8086.
*/
#ifdef __STDC__
#define COMPILER_AZTEC 1
/* Aztec is known to also define __STDC__ (illegally). Therefore if
__STDC__ is not defined, it is probably not Aztec */
#endif
#endif
#ifdef SYS_MSDOS
/*--------------------------*/
#ifdef __STDC__
/*----------------------*/
#define COMPILER_MICROSOFT 1
/* I assume that the combination of #define MSDOS and #define __STDC__
without any other collaboration means MICROSOFT. (Who, incidentally,
are being naughty by declaring __STDC__) */
#define KNOWS_LINE 1
#else
/*----------------------*/
#ifdef __ZTC__
/*------------------*/
#define COMPILER_ZORTECH 1
#undef STDLIBS
/* Another system without locale.h */
#define PROTOTYPES 1
#else
/*------------------*/
/* A non-std msdos compiler */
#undef STDLIBS
#undef PROTOTYPES
/*------------------*/
#endif
/*----------------------*/
#endif
/*--------------------------*/
#endif
#ifdef TURBOC
/* Although Turbo C correctly does not define __STDC__, it has SOME
standard libraries (but not all - locale.h?) and accepts prototypes. */
#undef STDLIBS
#define PROTOTYPES 1
#define SYS_MSDOS 1
#define COMPILER_TURBOC 1
/* TURBOC is defined, but has no value. This allows it to be tested
with #if as well as #ifdef. */
#endif
/*--------------------------*/
#ifdef COMPILER_MICROSOFT
#undef STDLIBS
/* Again, like Turbo C, does not know locale.h */
#define PROTOTYPES 1
#endif
/*--------------------------*/
#ifdef SYS_MSDOS
#define MEMMODELS 1
/* We assume ALL MSDOS machines use memory models */
#endif
/*--------------------------*/
#ifdef UX63
#undef STDLIBS
#undef PROTOTYPES
#define SYS_UX63 1
#define COMPILER_PCC 1
/*#define LIB_STRINGS 1 - apparently not... */
#endif
/*--------------------------*/
#ifdef sun
#undef STDLIBS
#undef PROTOTYPES
#define SYS_SUN 1
#define COMPILER_PCC 1
#endif
/*--------------------------*/
#ifdef THINK_C
#define SYS_MAC 1
#define COMPILER_THINKC 1
#define KNOWS_VOID 1
#endif
/*--------------------------*/
#ifdef sparc
#undef STDLIBS
#undef PROTOTYPES
#define SYS_SPARC 1
#define COMPILER_PCC 1
#endif
/*--------------------------*/
#ifdef ARM
#define SYS_RISCOS 1
/* I fear Acorn may define 'ARM' on both unix and risc os versions.
Should find out whether they define others as well, to differentiate. */
#endif
#ifdef __ARM__
#define SYS_RISCOS 1
/* I fear Acorn may define 'ARM' on both unix and risc os versions.
Should find out whether they define others as well, to differentiate. */
#endif
/*--------------------------*/
#ifdef SYS_RISCOS
#define COMPILER_NORCROFT 1
#define KNOWS_REGISTER 1
#define KNOWS_LINE 1
#endif
/*--------------------------*/
#ifdef vms
#define SYS_VMS 1
#endif
/*--------------------------*/
/*--------------------------*/
#ifdef SYS_UX63
#undef SYS_ANY
#endif
#ifdef SYS_ARM
#undef SYS_ANY
#endif
#ifdef SYS_MSDOS
#undef SYS_ANY
#endif
#ifdef SYS_SUN
#undef SYS_ANY
#endif
#ifdef SYS_SPARC
#undef SYS_ANY
#endif
#ifdef SYS_RISCOS
#undef SYS_ANY
#endif
#ifdef SYS_MAC
#undef SYS_ANY
#endif
#ifdef SYS_VMS
#undef SYS_ANY
#endif
/*--------------------------*/
#ifdef COMPILER_PCC
#undef COMPILER_ANY
#endif
#ifdef COMPILER_MICROSOFT
#undef COMPILER_ANY
#endif
#ifdef COMPILER_TURBOC
#undef COMPILER_ANY
#endif
#ifdef COMPILER_ZORTECH
#undef COMPILER_ANY
#endif
#ifdef COMPILER_AZTEC
#undef COMPILER_ANY
#endif
#ifdef COMPILER_NORCROFT
#undef COMPILER_ANY
#endif
#ifdef COMPILER_THINKC
#undef COMPILER_ANY
#endif
/*--------------------------*/
/* ##################################################################### */
#ifdef TESTING_DEFS
#include <stdio.h>
/* ======================================================================= */
#ifdef STDLIBS
/* If any of these fail, make sure STDLIBS is not
defined for your machine. */
#include <stdlib.h> /* STDLIBS should not be defined. */
#include <time.h> /* STDLIBS should not be defined. */
#include <locale.h> /* STDLIBS should not be defined. */
#endif
/* ======================================================================= */
#ifdef KNOWS_VOID
void test() { /* KNOWS_VOID should not be defined */
/* Make sure your ifdef's don't #define KNOWS_VOID if this fails */
}
#endif
/* ======================================================================= */
#ifdef KNOWS_REGISTER
int regtest() {
register int i = 0; /* KNOWS_REGISTER should not be defined */
/* Make sure your ifdef's don't #define KNOWS_REGISTER if this fails */
return(i);
}
#endif
/* ======================================================================= */
#ifdef PROTOTYPES
int main(int argc, char **argv) /* PROTOTYPES should not be defined */
/* If this fails, make sure PROTOTYPES is not defined for your machine. */
#else
int main(argc,argv)
int argc;
char **argv;
#endif
{
/*-------------------------------------------------------------------------*/
printf("We should know just what the machine is, or 'SYS_ANY':\n");
#ifdef SYS_UX63
printf("#define SYS_UX63 %d\n", SYS_UX63);
#endif
#ifdef SYS_ARM
printf("#define SYS_ARM %d\n", SYS_ARM);
#endif
#ifdef SYS_MSDOS
printf("#define SYS_MSDOS %d\n", SYS_MSDOS);
#endif
#ifdef SYS_SUN
printf("#define SYS_SUN %d\n", SYS_SUN);
#endif
#ifdef SYS_SPARC
printf("#define SYS_SPARC %d\n", SYS_SPARC);
#endif
#ifdef SYS_RISCOS
printf("#define SYS_RISCOS %d\n", SYS_RISCOS);
#endif
#ifdef SYS_MAC
printf("#define SYS_MAC %d\n", SYS_MAC);
#endif
#ifdef SYS_VMS
printf("#define SYS_VMS %d\n", SYS_VMS);
#endif
#ifdef SYS_ANY
printf("#define SYS_ANY %d\n", SYS_ANY);
#endif
/*-------------------------------------------------------------------------*/
printf("Either the machine has different memory models or not:\n");
#ifdef MEMMODELS
printf("#define MEMMODELS %d\n", MEMMODELS);
#else
printf("#undef MEMMODELS\n");
#endif
/*-------------------------------------------------------------------------*/
printf("One compiler name should be given, or 'COMPILER_ANY':\n");
#ifdef COMPILER_PCC
printf("#define COMPILER_PCC %d\n", COMPILER_PCC);
#endif
#ifdef COMPILER_MICROSOFT
printf("#define COMPILER_MICROSOFT %d\n", COMPILER_MICROSOFT);
#endif
#ifdef COMPILER_TURBOC
printf("#define COMPILER_TURBOC %d\n", COMPILER_TURBOC);
#endif
#ifdef COMPILER_ZORTECH
printf("#define COMPILER_ZORTECH %d\n", COMPILER_ZORTECH);
#endif
#ifdef COMPILER_AZTEC
printf("#define COMPILER_AZTEC %d\n", COMPILER_AZTEC);
/* Can exist on other machines as well as MSDOS */
#endif
#ifdef COMPILER_LATTICE
/* Currently coming through as 'COMPILER_ANY' - haven't found one to test */
/* Can exist on other machines as well as MSDOS. Meanwhile pass it in */
/* on the command_line? */
printf("#define COMPILER_LATTICE %d\n", COMPILER_LATTICE);
#endif
#ifdef COMPILER_GCC
/* Ditto. Test in appropriate places for each machine. */
printf("#define COMPILER_GCC %d\n", COMPILER_GCC);
#endif
#ifdef COMPILER_NORCROFT
printf("#define COMPILER_NORCROFT %d\n", COMPILER_NORCROFT);
#endif
#ifdef COMPILER_THINKC
printf("#define COMPILER_THINKC %d\n", COMPILER_THINKC);
#endif
#ifdef COMPILER_ANY
printf("#define COMPILER_ANY %d\n", COMPILER_ANY);
#endif
/*-------------------------------------------------------------------------*/
printf("Either the compiler accepts ANSI-like libraries or not:\n");
#ifdef STDLIBS
printf("#define STDLIBS %d\n",STDLIBS);
#else
printf("#undef STDLIBS\n");
#endif
/*-------------------------------------------------------------------------*/
printf("Either the machine accepts ANSI prototypes or not:\n");
#ifdef PROTOTYPES
printf("#define PROTOTYPES %d\n", PROTOTYPES);
#else
printf("#undef PROTOTYPES\n");
#endif
/*-------------------------------------------------------------------------*/
printf("Either the machine needs nonstandard #include <strings.h> or not:\n");
#ifdef LIB_STRINGS
printf("#define LIB_STRINGS %d\n", LIB_STRINGS);
#else
printf("#undef LIB_STRINGS\n");
#endif
/*-------------------------------------------------------------------------*/
printf("Either the machine accepts the 'void' keyword or not:\n");
#ifdef KNOWS_VOID
printf("#define KNOWS_VOID %d\n", KNOWS_VOID);
#else
printf("#undef KNOWS_VOID\n");
#endif
printf("Either the machine accepts the 'register' keyword or not:\n");
/*-------------------------------------------------------------------------*/
printf("Either the compiler accepts the '__LINE__' directive or not:\n");
#ifdef KNOWS_LINE
printf("#define KNOWS_LINE %d\n", KNOWS_LINE);
#else
printf("#undef KNOWS_LINE\n");
#endif
/*-------------------------------------------------------------------------*/
printf("Either the machine accepts the 'register' keyword or not:\n");
#ifdef KNOWS_REGISTER
printf("#define KNOWS_REGISTER %d\n", KNOWS_REGISTER);
#else
printf("#undef KNOWS_REGISTER\n");
#endif
/* Note - this is intended to be used only if your compiler accepts
'register' *AND* compiles code with it correctly! Some compilers
show up bugs when register optimisations are attempted! */
/*-------------------------------------------------------------------------*/
if (argv[argc] != 0) printf("Warning! argv[argc] != NULL !!! (Non ansii)\n");
}
#endif /* TESTING_DEFS */
#endif /* Already seen? */
SHAR_EOF
More information about the Alt.sources
mailing list