soundex algorithm wanted
Chris Torek
chris at umcp-cs.UUCP
Fri Sep 5 00:06:47 AEST 1986
In article <1239 at whuxl.UUCP> mike at whuxl.UUCP (BALDWIN) writes:
> register char c, lc, prev = '0';
`register int' generates better code on my compiler, and still works.
> if (isalpha(*name)) {
First you should test isascii(*name) (a nit).
> lc = tolower(*name);
Watch out! Some tolower()s fail miserably if !isupper(c).
Anyway, assuming that the basic algorithm is ... sound, I would
change the driver routine, so:
#include <ctype.h>
#define SDXLEN 4
char *
soundex(name)
register char *name;
{
static char buf[SDXLEN+1];
static char codes[] = "01230120022455012623010202";
register int c, i = 0, prev;
char *strcpy();
#ifdef lint
/* lint cannot tell that prev is set before used */
prev = 0;
#endif
(void) strcpy(buf, "a000");
while ((c = *name++) != 0 && i < SDXLEN) {
/*
* Throw out non-alphabetics, and convert upper case
* to lower.
*/
if (!isascii(c) || !isalpha(c))
continue;
if (isupper(c))
c = tolower(c);
/*
* Non-first characters must translate to non-zero codes
* that are different from the previous code; throw out
* those that translate to zero or to prev.
*/
if (i > 0 && ((c = codes[c - 'a']) == '0' || c == prev))
continue;
buf[i++] = prev = c;
}
return (buf);
}
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP: seismo!umcp-cs!chris
CSNet: chris at umcp-cs ARPA: chris at mimsy.umd.edu
More information about the Comp.sources.unix
mailing list