shared libraries can be done right
Marc Sabatella
mjs at hpfcso.FC.HP.COM
Wed May 29 07:36:27 AEST 1991
For those of you who haven't put this discussion into your "kill" files by now:
I just finished reading through all the postings on shared libraries (a
coworker pointed me here; I don't normally read this group), and would like to
contribute some of my thoughts.
Bill, your proposal is in fact quite similar to the Apollo Domain system, and
what I last heard proposed for OSF/1, except that you propose a finer (page)
granularity. At the heart of both all of these is the concept of
"pre-loading", where the first time a dynamically linked page is loaded, its
external references are fixed up. This assumes, as you explicitly stated, that
the resolutions will be the same for each program. Unfortunately this cannot
be guaranteed. The "malloc" example brought up by several people in response
to Alex's claim that shared libraries should be "simple and elgant"
demonstrates this well. A library may make calls to malloc(), but different
programs may provide their own definitions of malloc(), and the library's
references would have to be resolved differently for each. Some means must be
provided for this. Were it not for the desire to allow this sort of
interposition, shared libraries would be a great deal simpler than they are.
This is also why a segmented architecture is no panacea, and why position
independent code needs to have some indirection in it to be useful.
Some numbers from a paper I presented at last year's USENIX conference were
tossed around. Guy Harris wondered if my claims about programs spending little
time in library routines was well-founded in the case of window programs. I
admit I don't know; I've heard it claimed from someone who implemented his own
shared library scheme that his applications spend 90% of their time in the X11
libraries! Note that library code tends to be "purer" than application code
(as several point out, most of the C library is probably pure already), so the
penalty for PIC (indirect reference to global data mainly) will be less than
the penalty I measured using Dhrystone and other standard benchmarks. The
indirect procedure calls (or calls through a jump table) will still hurt, but
I think Masataka-san greatly overestimates the effect this will have on most
programs. Do you really worry that much about 6 cycles per call? As for the
startup overhead, after the conference I took some of Donn Seeley's ideas and
tuned my dynamic loader. I ended up improving its performance by almost a
factor of two, and with tuning to the memory mapping kernal function, we ended
up getting the startup performance hit down to about 30 milliseconds per shared
library used by the program. This includes the amortized cost of dynamic
binding and relocation. The degradation in performance on SPEC is not
measurable.
As for the benefit, they are great indeed when considering disk space. HP-UX
shaved off 16 MB in core commands alone - ie, not even including X11. However,
there is a tradeoff here as well. Since the mapping operation generally
reserves swap for shared library data segments mapped copy on write, a program
that uses only a little of a library's static data segment may need more swap
space to execute than it would if it were linked with archive libraries. In
the shared case, swap is reserved for the whole library's data segment, but in
the archive case, only those few modules needed by the program are copied into
the a.out, so the data space for the rest of the library needs no swap at run
time. We measured up to 100K of "wasted" swap per process for Motif
applications.
As for memory savings, I tend to side with Masataka-san on this - you'll have
to prove it really does make a difference. So far, I've seen little other than
anecdotal evidence. There was a discussion earlier as to whether most of real
memory was being used for potential shareable text, or for clearly unshareable
data, and I wish someone would produce some actual numbers. My gut feel is
that the savings from sharing even the X11 libraries' text won't amount to
much as far as really reducing memory consumption as long as huge amounts of
data are being horded. Sharing of the "xterm" and "[ck]sh" executables
themselves probably gets me most of the text savings I am going to get on most
systems with which I am familiar, but I probably don't live in the "real
world". In any case, the rumor about Sun implementing shared libraries
primarily to save disk space rings true of HP; any other benefits were gravy.
Note that providing a general purpose dynamic linking capability probably did
not weigh heavily for Sun, as they did not provide such a facility until SunOS
4.1
Barry Margolin made a comment long time about shared libraries should be
"linked on demand" as in Multics. That is in fact the way they are usually
done in Unix, at least for procedures. Data references are usually done up
front simply because most systems don't have Multics' wizzy architecture. But
note in HP-UX, we are at least able to defer data resolution and dynamic
relocation until the first reference to any procedure defined in the module
within the library. We also have a way to defer evaluation of static
constructors using the same mechanism.
Multics actually "loads" a segment on first reference as well. Most Unix
implementations "map" libraries at startup time, and "load" pages on demand.
While we could certainly defer mapping as well, it is not clear to me that it
would be worthwhile. Typical programs use a handful of relatively large
libraries, each of which would tend to be reference fairly soon, so the
deferral wouldn't buy much. If Unix switched to the Mutlics everything-is-a-
separate-little-segment approach, deferred mapping would appear to make more
sense, but we'd have to reduce the overhead of the mapping operation to make
this realistic.
--------------
Marc Sabatella (marc at hpmonk.fc.hp.com)
Disclaimers:
2 + 2 = 3, for suitably small values of 2
Bill and Dave may not always agree with me
More information about the Comp.unix.internals
mailing list