[vox-tech] another gcc question
Mike Simons
vox-tech@lists.lugod.org
Thu, 28 Feb 2002 02:29:24 -0500
--h31gzZEtNLTqOjlF
Content-Type: text/plain; charset=us-ascii
On Wed, Feb 27, 2002 at 09:25:41AM -0800, Peter Jay Salzman wrote:
> -funroll-loops
> Perform the optimization of loop unrolling. This
> is only done for loops whose number of iterations
> can be determined at compile time or run time.
/* compile time */
for (l = 100; l--;) do_something();
/* run time */
for (l = rand(); l--;) do_something();
> -funroll-all-loops
> Perform the optimization of loop unrolling. This
> is done for all loops. This usually makes programs
> run more slowly.
/* never known */
while (do_something());
There are thousands of variations of the three cases above...
basically you have to ask yourself if the compiler knows for sure
how many times it will run, else if the runtime system will know
before the loop begins how many times to run, else if each loop
iteration will decide if the next one runs.
Disclaimers:
- All of this is user space trial and error, I've not actually looked
at the gcc source.
- Also, I don't know intel or gnu assembly, so if someone who does could
verify these observations that would be nice...
- Platform used was Debian/woody gcc 2.95.4 running on a Pentium.
For case 1 (compile time) the compiler appears to do the following:
- find the largest number that divides evenly into the loop counter
between 1 and 21... if the LCD isn't 1... then do this:
for (l = loop_counter / LCD; l--;)
{
do_something();
do_something();
/* ...repeat LCD times... */
}
if however LCD *is* equal to 1, then behave the same as runtime
case below;
For case 2 (runtime) the compiler appears to:
- if the loop body _modifies_ the loop counter in a non-constant way
then use case 3 below...
- otherwise break the code into something like the following case
statement:
l = some_int;
while (l)
{
switch (l > 4 ? 4 : l)
{
4: do_something();
3: do_something();
2: do_something();
1: do_something();
0: ;
}
l -= l > 4 : 4 : l;
}
For case 3 (never known) the compiler appears to:
- Do something like the following...
do {
done = do_something() || do_something() || do_something() ||
do_something() || do_something() || do_something() ||
do_something() || do_something();
} while (!done);
> ok, -funroll-all-loops usually makes programs run more slowly. it also
> clearly makes the executable larger.
>
> so what exactly are we optimizing here?
Just theory... possible advantages:
- Making it more likely to be able to fill all of the branch delay slots
on a machine with many slots. (see other post on subject).
- On some architectures taking a branch is (much) more expensive than not
taking unrolling will help, if an "never known" loop runs a long time.
On Wed, Feb 27, 2002 at 10:03:22AM -0800, ME wrote:
> $ gcc gcc -funroll-all-loops -S sample.c
[...]
> This would lead me to believe the generated asm, code is not unrolled if I
> understand the expectation of the unrolling process.
You are absolutely correct.
First, you need the following:
$ gcc -O -funroll-all-loops -S sample.c
No optimizations are done unless optimizations are turned on. Since
-funroll-all-loops is an optimizations without -O nothing happens.
Second, the code sample doesn't actually do anything with m,
so with -O the compiler is smart enough to just remove the loops
all together main becomes:
========
main:
pushl %ebp
movl %esp,%ebp
xorl %eax,%eax
leave
ret
========
I've changed the sample code to demo the three cases listed above
(see attachment). If you look for in the assembly code for "call" ...
in main, "printf" marks the boundary between block types,
"do_something" shows you how the unrolling works out.
Don't _run_ the sample code, it is just provided for viewing of assembly
code, the while loop is endless ... something more complex would
be needed to halt that loop and would just make the assembly produced
harder to understand.
Later,
Mike
ps:
I noticed that people don't appear to be using text attachements much...
hope it's not against some list policy.
--h31gzZEtNLTqOjlF
Content-Type: text/plain
Content-Disposition: attachment; filename="s.c"
#include <stdio.h>
#include <stdlib.h>
int do_something(void)
{
return printf(".");
}
int main(void)
{
int l;
/* compile time decision */
for (l = 100; l--;) do_something();
printf("\n");
/* runtime decision */
for (l = rand(); l--;) do_something();
printf("\n");
/* always unknown */
while (do_something());
printf("\n");
#if 0
l = 100;
while (l > 0)
{
switch (l > 4 ? 4 : l)
{
case 4: do_something();
case 3: do_something();
case 2: do_something();
case 1: do_something();
case 0: ;
}
l -= l > 4 ? 4 : l;
}
#endif
return 0;
}
--h31gzZEtNLTqOjlF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="s.s"
.file "s.c"
.version "01.01"
gcc2_compiled.:
.section .rodata
.LC0:
.string "."
.text
.align 4
.globl do_something
.type do_something,@function
do_something:
pushl %ebp
movl %esp,%ebp
subl $8,%esp
addl $-12,%esp
pushl $.LC0
call printf
leave
ret
.Lfe1:
.size do_something,.Lfe1-do_something
.section .rodata
.LC1:
.string "\n"
.text
.align 4
.globl main
.type main,@function
main:
pushl %ebp
movl %esp,%ebp
subl $20,%esp
pushl %ebx
movl $99,%ebx
.p2align 4,,7
.L37:
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
call do_something
addl $-20,%ebx
cmpl $-1,%ebx
jne .L37
addl $-12,%esp
pushl $.LC1
call printf
call rand
movl %eax,%ebx
addl $16,%esp
subl $1,%ebx
jc .L40
movl %ebx,%eax
notl %eax
andl $3,%eax
je .L42
cmpl $3,%eax
jge .L50
cmpl $2,%eax
jge .L51
call do_something
decl %ebx
.L51:
call do_something
decl %ebx
.L50:
call do_something
subl $1,%ebx
jc .L40
.p2align 4,,7
.L42:
call do_something
call do_something
call do_something
call do_something
addl $-4,%ebx
cmpl $-1,%ebx
jne .L42
.L40:
addl $-12,%esp
pushl $.LC1
call printf
addl $16,%esp
.p2align 4,,7
.L46:
call do_something
testl %eax,%eax
jne .L46
addl $-12,%esp
pushl $.LC1
call printf
xorl %eax,%eax
movl -24(%ebp),%ebx
leave
ret
.Lfe2:
.size main,.Lfe2-main
.ident "GCC: (GNU) 2.95.4 (Debian prerelease)"
--h31gzZEtNLTqOjlF--