[vox-tech] another gcc question

ME vox-tech@lists.lugod.org
Wed, 27 Feb 2002 10:03:22 -0800 (PST)


With:
--------------------------
#include<stdio.h>

#define N 5

const int NN = 5;

int main()
{
  int i = 0;
  int m = 0;
  int n = 5;
  for (i=0 ; i < n ; i=i+1)
    m=0;

  for (i=0 ; i<N ; i=i+1)
    m=0;

  for (i=0 ; i<NN ; i=i+1)
    m=0;

  return 0;
}
--------------------------
and
$ gcc gcc -funroll-all-loops -S sample.c
I see (sample.s)
--------------------------
	.file	"sample.c"
	.version	"01.01"
gcc2_compiled.:
.globl NN
.section	.rodata
	.align 4
	.type	 NN,@object
	.size	 NN,4
NN:
	.long 5
.text
	.align 4
.globl main
	.type	 main,@function
main:
	pushl %ebp
	movl %esp,%ebp
	subl $24,%esp
	movl $0,-4(%ebp)
	movl $0,-8(%ebp)
	movl $5,-12(%ebp)
	movl $0,-4(%ebp)
	.p2align 4,,7
.L3:
	movl -4(%ebp),%eax
	cmpl -12(%ebp),%eax
	jl .L6
	jmp .L4
	.p2align 4,,7
.L6:
	movl $0,-8(%ebp)
.L5:
	incl -4(%ebp)
	jmp .L3
	.p2align 4,,7
.L4:
	nop
	movl $0,-4(%ebp)
	.p2align 4,,7
.L7:
	cmpl $4,-4(%ebp)
	jle .L10
	jmp .L8
	.p2align 4,,7
.L10:
	movl $0,-8(%ebp)
.L9:
	incl -4(%ebp)
	jmp .L7
	.p2align 4,,7
.L8:
	nop
	movl $0,-4(%ebp)
	.p2align 4,,7
.L11:
	movl -4(%ebp),%eax
	cmpl NN,%eax
	jl .L14
	jmp .L12
	.p2align 4,,7
.L14:
	movl $0,-8(%ebp)
.L13:
	incl -4(%ebp)
	jmp .L11
	.p2align 4,,7
.L12:
	xorl %eax,%eax
	jmp .L2
	.p2align 4,,7
.L2:
	leave
	ret
.Lfe1:
	.size	 main,.Lfe1-main
	.ident	"GCC: (GNU) 2.95.2 20000220 (Debian GNU/Linux)"
----------------------------

When I inspect the above, I see loops included.
-12(%ebp) (3 32-bit offset from %ebp) is set to 5 and -4(%ebp) is incl
until it is cmpl to be no longer less than -12(%ebp).

Labels even show loops when you watch it. I count about 3 when I quickly
scan it.

This would lead me to believe the generated asm, code is not unrolled if I
understand the expectation of the unrolling process. (I would guess
unrolling loops would mean a shift from looped asm structures to linear
processing - increasing speed by droping compares, but increasing size of
code by up to n times the original loop where n = number of iterations. (I
could be wrong on this, please say so if I am.)

Also, a diff on the output of the two .s files (one attempt with
-funroll-all-loops, and the other with -funroll-loops) shows no
difference.

I bet different values for n, N, and NN would lead to different
4-byte offset from (%ebx) for each value if other runs/trials were
accepted.

No time to examine this in more deatil right now.

I am still rather new to Intel assembly, but this will change.

-ME

On Wed, 27 Feb 2002, Rod Roark wrote:
> You can compile with -S and then look at the assembler output file.
> 
> -- Rod
>    http://www.sunsetsystems.com/
> 
> On Wednesday 27 February 2002 09:31, Peter Jay Salzman wrote:
> > another optimization question:
> >
> >    int n = 5;
> >    for (i=0; i<n; ++i)
> >
> > can gcc unroll this loop the way it can (for instance)
> >
> >    #define N 5
> >    for (i=0; i<N; ++i)
> >
> > if it can't, what about
> >
> >    const int n = 5;
> > 	for (i=0; i<n; ++i)
> >
> > pete
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
> http://lists.lugod.org/mailman/listinfo/vox-tech
>