[vox-tech] Understanding a C hello world program
Peter Jay Salzman
p at dirac.org
Wed Nov 24 08:52:36 PST 2004
Consider this program:
#include<stdio.h>
int main(void)
{
printf("hello world\n");
return 0;
}
The more I think about it, the more fascinated I am by it. I can get the
length of the sections using the size command:
p at satan$ size hello_world-1.o
text data bss dec hex filename
48 0 0 48 30 hello_world-1.o
I assume that data is the initialized data segment where programmer
initialized variables are stored. That's zero because I have no global
variables.
I also assume that bss is zero because I have no programmer-uninitialized
global variables.
When I disassemble the object file, I get 35 bytes:
Dump of assembler code for function main:
0x00000000 <main+0>: push %ebp
0x00000001 <main+1>: mov %esp,%ebp
0x00000003 <main+3>: sub $0x8,%esp
0x00000006 <main+6>: and $0xfffffff0,%esp
0x00000009 <main+9>: mov $0x0,%eax
0x0000000e <main+14>: sub %eax,%esp
0x00000010 <main+16>: movl $0x0,(%esp)
0x00000017 <main+23>: call 0x18 <main+24>
0x0000001c <main+28>: mov $0x0,%eax
0x00000021 <main+33>: leave
0x00000022 <main+34>: ret
End of assembler dump.
however, "size" reports a text size of 48. Where do the extra 13 bytes come
from that size reports? Probably from "hello world\n\0", which is 13 bytes.
But if that's true, the string lives in the text segment. I always pictured
the text segment as being straight opcodes. There must be a structure to the
text segment that I was unaware of.
But then, looking at the output of the picture that objdump has of the object
file...
p at satan$ objdump -h hello_world-1.o
hello_world-1.o: file format elf32-i386
Sections:
Idx Name Size VMA LMA File off Algn
0 .text 00000023 00000000 00000000 00000034 2**2
CONTENTS, ALLOC, LOAD, RELOC, READONLY, CODE
1 .data 00000000 00000000 00000000 00000058 2**2
CONTENTS, ALLOC, LOAD, DATA
2 .bss 00000000 00000000 00000000 00000058 2**2
ALLOC
3 .rodata 0000000d 00000000 00000000 00000058 2**0
CONTENTS, ALLOC, LOAD, READONLY, DATA
4 .note.GNU-stack 00000000 00000000 00000000 00000065 2**0
CONTENTS, READONLY
5 .comment 00000026 00000000 00000000 00000065 2**0
CONTENTS, READONLY
I wish objdump identified hex numbers as hex numbers. In any event, the text
section has length x23, which is 35. The opcodes plus the string. Further
evidence that the string lives in the text segment.
However, I note that there's a section called .rodata, which I've never heard
of before, but I'm assumming that it stands for "read only data section".
It's 13 bytes - just the size of "hello world\n\0". That's probably why this
program segfaults:
#include<stdio.h>
int main(void)
{
char *string="hello world\n";
string[3] = 'T';
puts(string);
return 0;
}
because string is the address of something that lives in a section of memory
marked read-only. Whammo -- sigsegv. I remember Mark posting about 3 or 4
years ago that this actually worked on some other Unicies (not Linux).
There was originally a question attached to this email, but as I typed more
and more, I realized I'm not even sure what my question is anymore. Maybe I
have too many of them.
Maybe my immediate question is -- where do read-only strings live? In the
text section or the .rodata section? I've seen evidence that it lives in
both section.
If anyone cares to riff off this, I'd certainly be interested in anything
anyone says.
Pete
--
The mathematics of physics has become ever more abstract, rather than more
complicated. The mind of God appears to be abstract but not complicated.
He also appears to like group theory. -- Tony Zee's "Fearful Symmetry"
PGP Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
More information about the vox-tech
mailing list