Fast strcmp() wanted.
Carl Edman
cedman at lynx.ps.uci.edu
Fri Oct 5 10:17:10 AEST 1990
In article <1145 at exodus.Eng.Sun.COM> falk at peregrine.Sun.COM (Ed Falk) writes:
In article <6003 at hplabsz.HPL.HP.COM> sartin at hplabs.hp.com (Rob Sartin) writes:
>In article <CEDMAN.90Sep29091115 at lynx.ps.uci.edu> cedman at lynx.ps.uci.edu (Carl Edman) writes:
>>string structure (or better class, long live C++ ! :-) which calculates
>>a 32-bit CRC for each string the first time and stores it somewhere.
>>Then only 1 (inlined) longword-compare will do the stringcomparisons
>>for you.
>
>Afraid not. It'll give you an estimate of whether the strings match
>(correctly identifying those that don't). You will need to then
>actually compare the strings if they are the same. This method will
>also be unable to reproduce strcmp's behavior (strcmp returns a signed
>result indicated the <, =, > by being negative, zero, positive), it will
>only return a boolean (match, no match).
Also, these two strings
"ab\0x"
"ab\0y"
(where x and y are any garbage that happens to be in memory after the
terminating '\0') will be evaluated as unequal.
There are workarounds for both problems, of course, but I think there
won't be much of a performance improvement after you've done all it
requires to get it right.
Please , take note ! I did NOT suggest that this method is always the
best, but I gave 3 or 4 different methods, and some hints, about when
I would use which one.
And I still hold that the CRC-method could be by far the most
efficient under some conditions:
- each word is checked often. Then a small overhead like f.e.
cleaning up the garbage behind the end of the string for the
one time calculation wouldn't matter
- strings are long, so the overhead of 32-bit per string is
justified
- you only need to know if 2 strings are identical
- most compares between unequal strings
You can even imagine systems where it could be possible to remove
the actual string from memory, and simply assume that if the 32-bit
CRC match, the strings match. Such systems would have to be tolerant
about an occasional mismatch.
If memory serves correctly the above approach is used in some
implementations for diff. (only to give one practical, real world
example)
Carl Edman
Theorectial Physicist,N.:A physicist whose | Send mail
existence is postulated, to make the numbers | to
balance but who is never actually observed | cedman at golem.ps.uci.edu
in the laboratory. | edmanc at uciph0.ps.uci.edu
More information about the Alt.sources
mailing list