strstr sources: a summary of responses
Manoj Srivastava
srivasta at umvlsi.ecs.umass.edu
Sun Aug 26 06:37:13 AEST 1990
I want to thank all the people who replied to my request for a
srtstr function, I really appreciated the response I got.
Most of the answers were some variation of the linear (brute
force) sub-string match algorithm O(NM) in the worst case [where N =
strlen (text) and M = strlen (substring)]. Here is a gem:
--------cut here--------
#include <string.h>
char *strstr(register char const *s, register char const *t) {
do {
register char const *ss = s;
register char const *tt = t;
do {
if (*tt == '\0') return ((char *)s);
} while (*ss++ == *tt++);
} while (*s++ != '\0');
return (NULL);
}
--------cut here--------
There were other examples, but since all of them are a
variation of the same theme, I shall not waste net bandwidth by
posting them all.
In the mean while I had hacked up the following based on the
Knuth-Morris-Pratt method [I guess I should have made it clear that I
was looking for "safeguards" which I (naively) assume are built into
library routines as opposed to code by "mere" programmers ;-) From
the lib excerpts which were posted I guess lib programmers are human
too.;-) ;-) ]. This is O(N+M) in the worst case.
--------cut here--------
#include <string.h>
char *
strstr (char * cs, char * ct)
{
register int i;
register int j;
int M = strlen (ct);
int N = strlen (cs);
int * next = (int *) malloc ((M + 1) * sizeof(int));
/* initialize the next array */
next[0] = -1;
for (i = 0, j = -1; i < M; i++, j++, next[i] = (ct[i] == ct[j]) ?
next[j] : j)
while ((j >= 0) && (ct[i] != ct[j]))
j = next[j];
/*now for the pattern match */
for (i = 0, j = 0; j < M && i < N - M + j + 1; i++, j++)
while ((j >= 0) && (cs[i] != ct[j]))
j = next[j];
free (next);
if (j == M)
return & cs [i - M];
else
return NULL;
}
--------cut here--------
And finally, an implementation of the Boyer-Moore algorithm. I
am not sure offhand, but while the worst case complexity remains
O(N+M), the avg case behaviour is O(N/M) ???
------8-<---Cut here---8-<-----
/* strstr, Copyright (c) 1990 by John Lacey. -*- C -*-
This program is the ANSI C strstr() string routine. It is
basically a string search routine.
This implementation uses the Boyer-Moore algorithm.
Some rough timings found this implementation about twice as fast as
the following straight string search.
char *
strstr( const char *text, const char * pattern )
{
int i = 0, j = 0;
while ( text[i] && pattern[j] )
{
i++;
j = ( text[i] == pattern[j] ) ? j + 1 : 0;
}
return (char *) ( ( pattern[j] ) ? NULL : text + i - j + 1 );
}
So, while the algorithm itself is more complicated, that complexity
is worth it.
*/
#include <limits.h>
#include <stdio.h>
#include <string.h>
char *
strstr( const char *text, const char * pattern )
{
int N = strlen( text );
int M = strlen( pattern );
int i = M - 1;
int i0, j, k;
int ch, d[SCHAR_MAX + 1];
for ( ch = 0; ch <= SCHAR_MAX; ch++ )
d[ch] = M;
for ( j = 0; j < M-1; j++ )
d[pattern[j]] = M - j - 1;
do
{
j = M - 1;
k = i;
while ( j >= 0 && pattern[j] == text[k] )
{
k--; j--;
}
i0 = i;
i += d[text[i]];
}
while ( j >= 0 && i < N );
return (char *)(( j < 0 ) ? text + i0 - M + 1 : NULL);
}
#ifdef TEST
main( int argc, char *argv[] )
{
char *status = NULL;
if ( argc != 3 )
{
puts( "Try again, with a text string and a pattern string." );
exit( 1 );
}
else
{
status = strstr( argv[1], argv[2] );
if ( status != NULL )
printf( "Found substring: %s\n", status );
else
printf( "Nothing doing, boss. I can't see a thing.\n" );
}
}
#endif /* TEST */
------8-<---Cut here---8-<-----
Man weeps to think that he will die so soon; woman, that she was born so long
ago. -- H. L. Mencken
More information about the Comp.lang.c
mailing list