Getting the most for a process.

Tue Oct 17 00:01:20 AEST 1989

In article <20140 at mimsy.UUCP> chris at mimsy.UUCP (Chris Torek) writes:

>  In article <1029 at crdos1.crd.ge.COM> davidsen at crdos1.crd.ge.COM
>  (Wm E Davidsen Jr) writes:
>  >  The Encore version of make looks at an environment variable and
>  >determines how many copies of the ccompilers to start. On a machine with
>  >8 cpu's you get a blindingly fast make compared to doing the same thing
>  >(in serial) on a faster machine.
>  
>  (Not if the serial machine is more than 8 times faster, or if there is
>  only one source file.)
>  
>  Unfortunately, the Encore version of cc, which is apparently a Greenhills
>  C compiler, has all of its `phases' built in.  Thus, if you are compiling
>  a single file, you cannot preprocess on cpu 0, compile on cpu 1, and
>  assemble on cpu 2 all at the same time.
>  
>  Given the standard edit/compile/debug cycle, this---combining
>  everything---seems to me to be a major mistake.  Well, not so major as
>  all that, perhaps, since most of the time is spent in the compilation
>  part, not in preprocessing or assembly.  Still, the potential was
>  there, and would return if Encore used gcc as their standard compiler.

Especially when you optimize with gcc, most (80% or more) is spent in
cc1, which has the following passes over the RTL file:

    *	The initial pass creating the RTL file from the TREE file
	created by the parser;

    *	A pass to copy any shared RTL structure that should not be
	shared.

    *	The first jump optimization pass.

    *	A pass to scan for registers to prepare for common sub-
	expression eliminiation.

    *	The common sub-expression elimination pass.

    *	Another jump optimization pass.

    *	Another register scan pass for loop optimizations.

    *	A loop optimization pass.

    *	A flow analysis pass.

    *	A combiner pass to combine multiple RTL expressions into
	larger RTL expressions.

    *	A pass to allocate registers that are live within a single
	basic block.

    *	A pass to allocate registers whose lifetime spans multiple
	basic blocks.

    *	A final jump optimization pass.

    *	An optional delayed branch recognition pass.

    *	A final pass that expands peepholes, and emits assembler code.

Also note in using -pipe, that the preprocessor internally buffers the
entire text, and writes it in one fell swoop at the end.  This means
that in general only the compiler proper (cc1) and assembler run in
parallel.  This helps to some degree.  A few months ago, I measured
how much it helped on a dual processor 88100.  Without the -pipe
option, the build time of the entire compiler was about 14 minutes
without -pipe, and about 10-12 minutes with -pipe.

--

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner at dg-rtp.DG.COM			have time for netnews?