Fast strcmp() wanted.

Carl Edman cedman at lynx.ps.uci.edu
Fri Oct 5 10:17:10 AEST 1990


In article <1145 at exodus.Eng.Sun.COM> falk at peregrine.Sun.COM (Ed Falk) writes:
   In article <6003 at hplabsz.HPL.HP.COM> sartin at hplabs.hp.com (Rob Sartin) writes:
   >In article <CEDMAN.90Sep29091115 at lynx.ps.uci.edu> cedman at lynx.ps.uci.edu (Carl Edman) writes:
   >>string structure (or better class, long live C++ ! :-) which calculates
   >>a 32-bit CRC for each string the first time and stores it somewhere.
   >>Then only 1 (inlined) longword-compare will do the stringcomparisons
   >>for you.
   >
   >Afraid not.  It'll give you an estimate of whether the strings match
   >(correctly identifying those that don't).  You will need to then
   >actually compare the strings if they are the same.  This method will
   >also be unable to reproduce strcmp's behavior (strcmp returns a signed
   >result indicated the <, =, > by being negative, zero, positive), it will
   >only return a boolean (match, no match).

   Also, these two strings

	   "ab\0x"
	   "ab\0y"

   (where x and y are any garbage that happens to be in memory after the
   terminating '\0') will be evaluated as unequal.

   There are workarounds for both problems, of course, but I think there
   won't be much of a performance improvement after you've done all it
   requires to get it right.

Please , take note ! I did NOT suggest that this method is always the
best, but I gave 3 or 4 different methods, and some hints, about when
I would use which one.
And I still hold that the CRC-method could be by far the most
efficient under some conditions:

	- each word is checked often. Then a small overhead like f.e.
	cleaning up the garbage behind the end of the string for the
	one time calculation wouldn't matter

	- strings are long, so the overhead of 32-bit per string is
	justified

	- you only need to know if 2 strings are identical

	- most compares between unequal strings

You can even imagine systems where it could be possible to remove
the actual string from memory, and simply assume that if the 32-bit
CRC match, the strings match. Such systems would have to be tolerant
about an occasional mismatch.

If memory serves correctly the above approach is used in some
implementations for diff. (only to give one practical, real world
example)

	Carl Edman



Theorectial Physicist,N.:A physicist whose   | Send mail
existence is postulated, to make the numbers |  to
balance but who is never actually observed   | cedman at golem.ps.uci.edu
in the laboratory.                           | edmanc at uciph0.ps.uci.edu



More information about the Alt.sources mailing list