String matching - parallel/concurrent/randomized info needed
William H Hsu
hsu_wh at jhunix.HCF.JHU.EDU
Thu Jul 19 00:26:44 AEST 1990
I am trying to implement some good partial string matching
functions for use with strictly binary files (specifically, the
resource forks of Macintosh files). I have encountered several
compiler and language specific problems while trying to get two
algorithms (one generic dynamic pattern matching scheme and one
randomized version of the Boyer-Moore/BMG algorithm) to work in
C.
First, I'm looking for a way to parallelize my string
matching. I have seen references to a concurrent version of
Boyer-Moore and dynamic algorithms that run in O(kn) time for k-
approximate matches in text length n (Landau and Vishkin, 18th
Annual ACM Symposium), but have been unable to get my hands on
real or pseudocode. Since I am looking for 10-100 short, random
strings, it would boost my program efficiency greatly to be able
to search for multiple patterns at once. Has anyone implemented
a concurrent version of BMG or a dynamic partial-matching
algorithm in C (without parallel hardware)?
Second, I'm sure that random number generation is a
universally known problem, but I have had problems using srand()
in the right place to get truly "random" pattern strings. rand()
is used to generate the starting location for the source string.
But I frequently get consecutive duplicate strings (four or five
"two in a row" strings per hundred). My randomization call is:
srand((unsigned)time(NULL));
How can the "randomness" be improved? I have observed that
moving the line deeper into my nested while/for loops causes a
decrease in randomness.
Please E-mail responses to: hsu_wh at jhunix.hcf.jhu.edu
(BITNET), hsu at cs.jhu.edu (ARPAnet)
More information about the Comp.lang.c
mailing list