profil questions

Sat Feb 13 11:00:00 AEST 1988

I've been trying to go beyond the obvious uses of profil(2), and have
certain questions and wonderings:

(1) profil(buff,...)
    char *buff;

    On the systems I've looked at, buff is treated as an array of shorts.
    Shouldn't UNIX be honest and say short *buff?

    Most systems I know have 16 bit shorts. Now, it occurs to me that I
    might like to profile some really long running programs, that run
    more than, say, 3 days - easily long enough to overflow the profiling
    bins. As a start, shouldn't we say 
	typedef short *profilbinT?

(2) Exactly what is the correspondence between profil bins and code locations
    and actual instructions, particularly when scaling?

    The man page says something about scale=0x10000 implying a one-to-one
    correspondence between words of code and words (I assume counting bins)
    in the buffer.
        Now, on some systems an instruction can begin at an arbitrary byte
    location. Does this mean that I should use a scale of 0x20000 to make
    sure that I get a counting bin for the beginning of every possible
    instruction on such a machine (eg. a VAX)?

    The man page says that scale = 0x8000 maps each pair of words of code
    to a word in the buffer. Again, I assume that these are 16 bit words,
    and words in the buffer refer to short counting bins.
        I have observed the following correspondence of byte offsets from
    the base code location to bin numbers, using 0x8000, "2 to 1":

	W 0 - Byte 0 - Bin 0
	      Byte 1 - Bin 0
	W 1 - Byte 2 - Bin 1
	      Byte 3 - Bin 1
	W 2 - Byte 4 - Bin 1
	      Byte 5 - Bin 1
	W 3 - Byte 6 - Bin 2

	What is the rationale here? It makes sense if instructions are 16
    or 32 bits, and the sampled PC points to the next instruction, on a 
    machine that requires 32 bit instructions to begin on a 32 bit boundary,
    because that means that in a sequence I16.I16.I32, the two adjacent
    16 bit instructions get counted in the same bin; but it doesn't seem
    to make sense on a machine like the VAX. Now, the same code is present
    on the 3B2 - does it go all the way back to a PDP-11? 

(3) How do the System V scale factors relate to the BSD scale factors?
    ie SV 0177777 <-> 0x10000. Just add 1?

(4) What's all this scale garbage anyway? I'm sure that it was a lot cheaper
    on a small machine, but 16 bits just isn't capable of expressing some
    of the fractions that might be appropriate to deal with on a large machine.
    Say I have a 256M text program that I want to divide into 4 counting bins
    - can I do that with profil? Maybe I can't give it 64K of counters.

    Maybe the scale argument should be made into a floating point number.
    But single precision floating point may only give you 6-8 decimal digits
    of accuracy, not enough to scale properly on *really* large programs.
	Maybe the scale argument should be a shift factor, specifying
    the power of two to divide by?
	Or maybe there should be no scale argument, but just a
    (CodeBottom,CodeTop) pair, and you let the system decide on what an
    appropriate representation is. After all, you are guaranteed that the
    addresses are representable, in as portable a form as a C pointer
    provides.

Andy "Krazy" Glew. Gould CSD-Urbana.    1101 E. University, Urbana, IL 61801   
    aglew at gould.com     	- preferred, if you have nameserver
    aglew at gswd-vms.gould.com    - if you don't
    aglew at gswd-vms.arpa 	- if you use DoD hosttable
    aglew%mycroft at gswd-vms.arpa - domains are supposed to make things easier?

My opinions are my own, and are not the opinions of my employer, or any
other organisation. I indicate my company only so that the reader may
account for any possible bias I may have towards our products.