fuzzy strcmp
Istvan Mohos
istvan at hhb.UUCP
Fri Dec 22 22:38:39 AEST 1989
tchrist at convexe.uucp (Tom Christiansen @ Convex Computer) writes:
>I'm looking for an algorithm that would allow me to determine
>whether two strings were similar. Thus
>
> "abcde" !~ "xyzzy"
> "this old man can read" =~ "that old man can't read"
>
>... perhaps just
> float strfzcmp(string1,string2)
I must confess, my first reaction was: thank God, Tom 's finally found
a problem he can't solve in Perl. :-)
You may want to try running the *diff* algorithm along the individual
characters of the two strings (rather than applying it to successive
lines of two files); the ratio of the number of failed chars to the
byte count of the two strings is a dandy float in the range 0.---1.
Thus,
strfzcmp("abcde","xyzzy") --> 1.
strfzcmp("this old man can read","that old man can't read") --> .136363..
--
Istvan Mohos
...uunet!pyrdc!pyrnj!hhb!istvan
HHB Systems 1000 Wyckoff Ave. Mahwah NJ 07430 201-848-8000
====================================================================
More information about the Comp.unix.wizards
mailing list