Need help on a search and replace algorhythm

The Idealistic Cynic sgilley at cbnewsl.att.com
Sat Oct 13 06:27:22 AEST 1990


I'm having trouble getting a search and replace
algorhythm fast enough in an editor I'm writing.

The search is fast enough, and what is produced is a
list (in an integer array) of the character offsets
in the file of all matching strings.

What I've got is an array of structures that looks like:

	struct {
		int offset;
		int line;
	} itb[50000];

itb[n].offset contains the number of characters into
the file that line lays, and itb[n].length contains
the length of the line.

I also have an array:

	int stb[55000];

that contains the line numbers sorted by offset.

Finally, I have a last arry:

	int ar[5000];

that contains a list of all offsets where a match was
found.

At least one problem stems from the structure of the
temporary file.  Originally, the temp file looks exactly
like the original file, but when a line is changed,
the new line is written at the end of the file, and the
structure containing the offset and length of each line
is updated to show these new values.  So the search 
may return values that are no longer in valid lines
within the file.

Currently my algorhythm looks like:

	for all offsets of matches do {
		get line number offset is contained within by
			binary search
		seek to that line number
		read that line in
		change that line
	}

	/* change algorhytm */
	copy into temp buffer line up to change point
	copy change into temp buffer
	copy line after match into temp buffer
	write newline out to file

I have a feeling that all the lseeks are part of whats causing the
problem -- lseek to read, then lseek to write for each.  I've
done some stuff to optimize -- made integers register ints for
example -- but a global change still takes too long.

Anybody got any ideas, or could someone point me toward a really
efficient search and replace algorhythm that I can use or
adapt to my file structures?

I'd appreciate any help anyone can give.

Sean.

---
Sean L. Gilley
attmail!ignats!slg  attmail!sgilley
201 805 9088 (h)    201 457 5403 (w)



More information about the Comp.lang.c mailing list