GNU Emacs, memory usage, releasing
Craig Finseth
fin at uh.msc.umn.edu
Thu Jan 4 09:00:09 AEST 1990
As the author of "The Emacs Cookbook" (actual title: "Theory and
Practise of Text Editing --or-- A Cookbook for an Emacs", MIT
Laboratory for Computer Science (LCS) Technical Memo 165, May 1980 (it
was my B.S. thesis)), I feel that I can make a useful contribution to
this discussion.
Linked line vs. buffer gap: I mention in pages 33+ that of the two,
buffer gap is the preferred technology for paged virtual memory
systems.
As a theoretical point, you're always going to be in trouble
if your buffer size is larger than your (allowed by the o.s.)
working set size. I contend that you are both in *less*
trouble and the trouble point is moved up in a buffer gap
system.
Tim Bray makes an exellent point:
"Hold on. What makes you think you know what the problem is? Have you
instrumented it with the profiler and quantified the overhead of this
"problem"? My own intuition is that GNUmacs spends much more time
CONSing and otherwise chugging through the innards of the Lisp
interpreter. But I wouldn't bet $0.50 on my intuition unless I had a
good quantitative understanding of what's going on."
Aside from a few pathological cases (RMAIL being a notable one), I
would be surprized if gap motion was as significant factor on general
editing operations.
Editing is such a general activity, and GNU-Emacs is used for such a
wide variety of purposes, that it would be impractical to optimize its
performance in all cases. Instead, the trick (as in all programming)
is to identify the frequent cases and optimize them. In particular, I
consider editing small files and *viewing* large files to both be
frequent: editing a large file (especially the "insert at beginning /
insert at end / insert at beginning" case) is comparatively rare.
Piercarlo Grandi then presented some data. I took the liberty of
combining the user and system times (as we mainly care about wall
clock time, not who is paying $$$) and figuring the incremental times
for each operation:
OPERATION Emacs 18.54 MicroGNU 2a Jove 4.12
total inc total inc total inc
1) startup 1.55 1.55 0.12 0.12 0.79 0.79
2) C-X C-F 2.47 0.92 1.15 1.03 1.84 1.05
3) C-X C-Q 2.62 0.15 1.21 0.06 1.86 0.02
4) SP 2.88 0.16 1.22 0.01 1.86 0.00
5) M-> 3.10 0.22 1.31 0.09 1.93 0.07
6) SP 3.29 0.19 1.31 0.00 1.93 0.00
7) SP 3.41 0.12 1.31 0.00 1.93 0.00
8) M-< 3.99 0.58 1.38 0.07 2.05 0.12
9) SP 4.35 0.36 1.57 0.19 2.05 0.00
Comments on Piercarlo's comments:
1) Is just to invoke an empty editor. GNU Emacs pays dearly for its
size and complication here.
2) Reads 80KB in. GNU is relativley quick, it just copies the file
to memory. MicroGnu reads it into memory and threads it into a list
of lines, and Jove copies it to a temporary file and threads
into a list the offsets of lines in this file.
If "GNU is relativley quick" and GNU takes twice about as long as the
others, what is the definition of "quickness"?
3) This is only to remove the RO property from the buffer.
4) Insert a space at the beginning of buffer. GNU has to move
the gap, which is initially at the end of memory, and pays a
lot here. The other two take less than a centisecond, GNU
seven. This is 70 milliseconds for moving 80KB, or somewhat
more than a MB per second on a 4 MIPS machine. Note the
relatively high system time for Emacs. Thi is due mostly to
redisplay.
Actually, it takes GNU about the same time to do this as to clear the
RO property. Moving the gap is thus probably not the major problem
here.
5) Go to the end of buffer. This involves no gap movement,
just redisplay. System times show it. On my 386 the frame
buffer (an Hercules clone) is abjectly slow, and the device
driver for it correspondinglu clocks up system time.
Again, this takes about 50% longer than 4, with no gap motion
involved. I would start to suspect redisplay as the real culprit...
6) Insert as space as the last chracter. This moves the gap again, and
it shows. Also redisplay.
Now we have a drop in time with the gap motion. Less redisplay, too.
Again, I would really focus on the redisplay time and ignore the gap
motion time (unless it is trivial to speed up).
7) Add a second space at the end. Just redisplay really, and
minimal as to that.
8) Go back to the beginning of buffer.
9) insert another space there.
8 and 9 further confirm that gap motion is not the major problem WHEN
CONSIDERING THE SYSTEM AS A WHOLE.
>From the same message:
"The lesson I derive from these timings is that creating the
linked list of lines, and especially copying the file to a
temporary as well, slow down file reading time, but then
further operations become very much faster. Note also that
both MicrGnu and Jove are somewhat carelessly coded, with lots
of quadratic algorithms."
I claim that the data does not support this conclusion. This does not
mean that the conclusion is incorrect. Rather, the data supports the
conclusion that GNU-Emacs' redisplay is slow on the specified computer.
This ananlysis parallels that supplied by Ian Dall.
Jonathan Payne supplied an excellent summary of how a buffer gap
editor works and the problems with redisplay. As it turns out, even
basic redisplay is a hard problem (Knuth "The Art of Computer
Programming" level 40). Doing a full redisplay correctly (handling
variable-width characters, unlimited line lengths, multiple windows,
etc.) is HARD (Knuth level 50).
Some Historical Notes:
Early drafts of the thesis had a chapter "proving" that only a
mainframe-type computer could run a full Emacs-type editor. One
brilliant insight (:-) later, the chapter was cut and a software
company was formed (Mark of the Unicorn). The Mince text editor was
not interactively extensible, but was otherwise a full Emacs running
on a 48K CP/M system with small floppy disks.
The scheme that was used is called paged buffer gap and it is briefly
mentioned on page 36 of the thesis. The basic idea is that a file is
represented as an array of pages. Each ~1K page is a separate buffer
gap system. This representation was very efficient for the
environment, especially as it formed a natural basis for the paged
virtual memory environment required to edit files larger than will fit
in memory.
I contend that in a "modern workstation environment" (e.g., Sun 3/60),
a simple buffer gap method is preferred over both paged buffer gap and
linked line. I leave it as an excercise for the reader to figure out
why.
Craig A. Finseth fin at msc.umn.edu [CAF13]
Minnesota Supercomputer Center, Inc. +1 612 624 3375
More information about the Comp.unix.wizards
mailing list