Some hacks I'll share!

Fri Mar 9 19:10:51 AEST 1984

   I was doing some playing this evening and guess what I found out:

		1) On the PDP 11/44 a floating point (double precision)
		   clear (8 bytes) is almost exactly twice as fast as
		   4 clr (integer clear (2 bytes each)) instructions.
		   I replaced the code in clrbuf (in bio.c) with
		   floating point clears for a code speedup.
		2) A floating point load (double prec. again) 
		   followed by a floating point store is just a weeeee
		   bit faster than the appropriate number of 'mov'
		   instructions (assuming the cache is disabled).
		   I'll bet on the 11/70 you could use floating point
		   load/stores for twice the speed over conventional
		   mov's.

  What the h*ll does this mean?  That for some applications involving
  manipulation of blocks of data, it may be keen-o to use the floating
  point processor for the manipulations.  Super-cool 11 floating point
  processors (like the FP-11C in the 11/70 and FP-11E in the 11/60)
  that operate in parallel with the CPU may give you quite a performance
  boost if you play your cards right.

  Can anyone see problems with this scheme?  Has anyone thought of it
  before?  

  Does anyone run a 44 or 24 with the commercial instruction set 
  option?  If you do,  do you use the block character move instructions?
  Here at isrnix I wrote some code that copies kernel buffers to/from the
  users address space with 'mov' instructions (the scheme plays with the
  segmentation registers) instead of the slow m[t,f]p[d,i] instructions.
  It would be a thrill to see if I could pop a CIS board in our CPU and
  use the block move instruction and see what kind of a performance
  increase I get.  Even with the current situation I get better than 
  twice the performance in copying buffers than the previous copyin/copyout
  scheme.

  Any comments?

-- 
    Gregory R. Travis
    Institute for Social Research - Indiana University - Bloomington, In
    ihnp4!inuxc!isrnix!greg
    {pur-ee,allegra,qusavx}!isrnix!greg