Soundex algorithm
Chris Torek
chris at mimsy.UUCP
Tue Jul 12 07:51:27 AEST 1988
[I have deleted groups comp.theory and comp.ai since Soundex has little
to do with these]
In article <12520 at sunybcs.UUCP> stewart at sunybcs.uucp (Norman R. Stewart)
writes:
>2: Apply the following rules to produce a code of one letter and
> three numbers.
> A: The first letter of the word becomes the initial character
> in the code.
> B: When two or more letters from the same group occur together
> only the first is coded.
> C: If two letters from the same group are seperated by an H or
> a W, code only the first.
> D: Group 7 letters are never coded (this does not include the
> first letter in the word, which is always coded).
[I thought Soundex codes were usually fixed at four symbols.]
What if more than two letters from the same group are separated by H
or W? For instance: FDHTWTHTWL. Is this encoded as F334 or as F34?
The table has L=4, R=6; I find this surprising, as both R and L are
semivowels and they are easily confused by those who did not grow up
with the distinction (e.g., some Orientals).
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris at mimsy.umd.edu Path: uunet!mimsy!chris
More information about the Comp.lang.c
mailing list