[vox-tech] Understanding a C hello world program

Peter Jay Salzman p at dirac.org
Wed Nov 24 08:52:36 PST 2004


Consider this program:


   #include<stdio.h>

   int main(void)
   {
      printf("hello world\n");

      return 0;
   }


The more I think about it, the more fascinated I am by it.  I can get the
length of the sections using the size command:


   p at satan$ size hello_world-1.o 
      text    data     bss     dec     hex filename
        48       0       0      48      30 hello_world-1.o

I assume that data is the initialized data segment where programmer
initialized variables are stored.  That's zero because I have no global
variables.

I also assume that bss is zero because I have no programmer-uninitialized
global variables.

When I disassemble the object file, I get 35 bytes:

   Dump of assembler code for function main:
   0x00000000 <main+0>:    push   %ebp
   0x00000001 <main+1>:    mov    %esp,%ebp
   0x00000003 <main+3>:    sub    $0x8,%esp
   0x00000006 <main+6>:    and    $0xfffffff0,%esp
   0x00000009 <main+9>:    mov    $0x0,%eax
   0x0000000e <main+14>:   sub    %eax,%esp
   0x00000010 <main+16>:   movl   $0x0,(%esp)
   0x00000017 <main+23>:   call   0x18 <main+24>
   0x0000001c <main+28>:   mov    $0x0,%eax
   0x00000021 <main+33>:   leave  
   0x00000022 <main+34>:   ret    
   End of assembler dump.

however, "size" reports a text size of 48.  Where do the extra 13 bytes come
from that size reports?  Probably from "hello world\n\0", which is 13 bytes.
But if that's true, the string lives in the text segment.  I always pictured
the text segment as being straight opcodes.  There must be a structure to the
text segment that I was unaware of.

But then, looking at the output of the picture that objdump has of the object
file...

p at satan$ objdump -h hello_world-1.o 

hello_world-1.o:     file format elf32-i386

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         00000023  00000000  00000000  00000034  2**2
                  CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
  1 .data         00000000  00000000  00000000  00000058  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  2 .bss          00000000  00000000  00000000  00000058  2**2
                  ALLOC
  3 .rodata       0000000d  00000000  00000000  00000058  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  4 .note.GNU-stack 00000000  00000000  00000000  00000065  2**0
                  CONTENTS, READONLY
  5 .comment      00000026  00000000  00000000  00000065  2**0
                  CONTENTS, READONLY


I wish objdump identified hex numbers as hex numbers.  In any event, the text
section has length x23, which is 35.  The opcodes plus the string.  Further
evidence that the string lives in the text segment.

However, I note that there's a section called .rodata, which I've never heard
of before, but I'm assumming that it stands for "read only data section".
It's 13 bytes - just the size of "hello world\n\0".  That's probably why this
program segfaults:

   #include<stdio.h>

   int main(void)
   {
      char *string="hello world\n";

      string[3] = 'T';

      puts(string);

      return 0;
   }

because string is the address of something that lives in a section of memory
marked read-only.  Whammo -- sigsegv.  I remember Mark posting about 3 or 4
years ago that this actually worked on some other Unicies (not Linux).


There was originally a question attached to this email, but as I typed more
and more, I realized I'm not even sure what my question is anymore.  Maybe I
have too many of them.

Maybe my immediate question is -- where do read-only strings live?  In the
text section or the .rodata section?  I've seen evidence that it lives in
both section.

If anyone cares to riff off this, I'd certainly be interested in anything
anyone says.

Pete



-- 
The mathematics of physics has become ever more abstract, rather than more
complicated.  The mind of God appears to be abstract but not complicated.
He also appears to like group theory.  --  Tony Zee's "Fearful Symmetry"

PGP Fingerprint: B9F1 6CF3 47C4 7CD8 D33E  70A9 A3B9 1945 67EA 951D


More information about the vox-tech mailing list