[vox-tech] another gcc question
ME
vox-tech@lists.lugod.org
Wed, 27 Feb 2002 10:03:22 -0800 (PST)
With:
--------------------------
#include<stdio.h>
#define N 5
const int NN = 5;
int main()
{
int i = 0;
int m = 0;
int n = 5;
for (i=0 ; i < n ; i=i+1)
m=0;
for (i=0 ; i<N ; i=i+1)
m=0;
for (i=0 ; i<NN ; i=i+1)
m=0;
return 0;
}
--------------------------
and
$ gcc gcc -funroll-all-loops -S sample.c
I see (sample.s)
--------------------------
.file "sample.c"
.version "01.01"
gcc2_compiled.:
.globl NN
.section .rodata
.align 4
.type NN,@object
.size NN,4
NN:
.long 5
.text
.align 4
.globl main
.type main,@function
main:
pushl %ebp
movl %esp,%ebp
subl $24,%esp
movl $0,-4(%ebp)
movl $0,-8(%ebp)
movl $5,-12(%ebp)
movl $0,-4(%ebp)
.p2align 4,,7
.L3:
movl -4(%ebp),%eax
cmpl -12(%ebp),%eax
jl .L6
jmp .L4
.p2align 4,,7
.L6:
movl $0,-8(%ebp)
.L5:
incl -4(%ebp)
jmp .L3
.p2align 4,,7
.L4:
nop
movl $0,-4(%ebp)
.p2align 4,,7
.L7:
cmpl $4,-4(%ebp)
jle .L10
jmp .L8
.p2align 4,,7
.L10:
movl $0,-8(%ebp)
.L9:
incl -4(%ebp)
jmp .L7
.p2align 4,,7
.L8:
nop
movl $0,-4(%ebp)
.p2align 4,,7
.L11:
movl -4(%ebp),%eax
cmpl NN,%eax
jl .L14
jmp .L12
.p2align 4,,7
.L14:
movl $0,-8(%ebp)
.L13:
incl -4(%ebp)
jmp .L11
.p2align 4,,7
.L12:
xorl %eax,%eax
jmp .L2
.p2align 4,,7
.L2:
leave
ret
.Lfe1:
.size main,.Lfe1-main
.ident "GCC: (GNU) 2.95.2 20000220 (Debian GNU/Linux)"
----------------------------
When I inspect the above, I see loops included.
-12(%ebp) (3 32-bit offset from %ebp) is set to 5 and -4(%ebp) is incl
until it is cmpl to be no longer less than -12(%ebp).
Labels even show loops when you watch it. I count about 3 when I quickly
scan it.
This would lead me to believe the generated asm, code is not unrolled if I
understand the expectation of the unrolling process. (I would guess
unrolling loops would mean a shift from looped asm structures to linear
processing - increasing speed by droping compares, but increasing size of
code by up to n times the original loop where n = number of iterations. (I
could be wrong on this, please say so if I am.)
Also, a diff on the output of the two .s files (one attempt with
-funroll-all-loops, and the other with -funroll-loops) shows no
difference.
I bet different values for n, N, and NN would lead to different
4-byte offset from (%ebx) for each value if other runs/trials were
accepted.
No time to examine this in more deatil right now.
I am still rather new to Intel assembly, but this will change.
-ME
On Wed, 27 Feb 2002, Rod Roark wrote:
> You can compile with -S and then look at the assembler output file.
>
> -- Rod
> http://www.sunsetsystems.com/
>
> On Wednesday 27 February 2002 09:31, Peter Jay Salzman wrote:
> > another optimization question:
> >
> > int n = 5;
> > for (i=0; i<n; ++i)
> >
> > can gcc unroll this loop the way it can (for instance)
> >
> > #define N 5
> > for (i=0; i<N; ++i)
> >
> > if it can't, what about
> >
> > const int n = 5;
> > for (i=0; i<n; ++i)
> >
> > pete
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>