Any decent Fortrans under Unix ? Which machine ?

Fri Mar 14 08:29:34 AEST 1986

[I have removed net.lang.f77 from the newsgroups list since this
is no longer directly related to Fortran.]

In article <208 at goanna.OZ> rcodi at goanna.UUCP writes:
>It would be nice if {ccom, f77pass1, pc0} all skipped the phases of
>generating assembly language text, and just produced a load-file
>straight from the source.

In terms of speed, at any rate, it would be nice.  But I am one of
those strange people who on occasion likes to inspect the output
of the compiler, both before and after peephole optimisation.

>Poor pc users are stuck with the following [sequence]:  cpp, pc0,
>pc1 (f1), pc2, c2, pc3, as, ld.

The Pascal compiler does not use the C preprocessor.  The rest of the
chain is accurate.

>Even nicer if cpp was built into the input parser of ... the
>compilers (but still have it as a separate package too).  

And therein lies a problem.  This kind of thing can get to be
a maintenance nightmare.  (It *can* be done, and it is quite
arguably worthwhile for something as heavily used as a compiler
on a development system.  But *I* do not want to do it.)

>I tend to disagree with the philosophy of just using registers R0 and
>R1 for calculations, and assigning the rest for register variables.

*That* sounds like VMS.  The Vax Unix compilers uses r0-r5 as
scratch (for the perhaps peculiar reason that certain Vax instructions
clobber these registers).

>ALL intermediate results in calculations should be cached in
>registers.

If you want a good optimising compiler, any expressions that
are reused should be so cached.  But saving all intermediate
results is not always optimal.

>I have looked at a lot of assembly code produced by various ports
>of UNIX C compilers and noted that there is little or no attempt
>to "carry over" intermediate results from C statement to C statement.
>The same goes for f77 and pc.

In C, this is often not a problem.  In the other languages, it is;
this is why the 4.3 f77 has a huge front end optimiser.

>What does YOUR compiler generate for the following C code?
>	a = z[i,j,k];
>	b = z[i,j,k];

Given the following declarations:

	int z[N];
	f() { register int i, j, k, a, b; ...

it gives

	movl	_z[r9],r8
	movl	_z[r9],r7

Now, while the second statement might be a bit faster if it used
`movl r8,r7', it probably makes no noticeable difference.  But I
suspect you meant

	a = z[i][j][k];
	b = z[i][j][k];

which of course requires much more work.  But as an `experienced'
C programmer, I would write

	a = b = z[i][j][k];

which (given `int z[2][3][4]') generates

	mull3	$48,r11,r0
	addl2	$_z,r0
	ashl	$4,r10,r1
	addl2	r1,r0
	ashl	$2,r9,r1
	addl2	r1,r0
	movl	(r0),r7
	movl	r7,r8

which is not after all too awful.  (Better might be

			# not sure if this 3 ins seq is faster than mull3
	ashl	$4,r11,r0	# r0 = x16
	ashl	$1,r0,r1	# r1 = x32
	addl2	r1,r0		# r0 = x48 

			# this here is all the same as before
	addl2	$_z,r0
	ashl	$4,r10,r1
	addl2	r1,r0

			# this next one is the real `win', use fancy addr
			# modes to get *(r0 + 4*r9)
	movl	r0[r9],r7
	movl	r7,r8

A true Vax assembly hacker may well know of even better tricks.)

Anyway, the point of all this is that C often does not require
an optimising compiler.  Not that I have anything against them;
indeed, I like optimising compilers, but I can make do without
them.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris at umcp-cs		ARPA:	chris at mimsy.umd.edu