[vox-tech] Understanding a C hello world program

Peter Jay Salzman p at dirac.org
Wed Nov 24 14:06:14 PST 2004


On Wed 24 Nov 04, 11:10 AM, Mark K. Kim <lugodatcbreakdotorg> said:
> On Wed, 24 Nov 2004, Peter Jay Salzman wrote:
> [snip]
> >    p at satan$ size hello_world-1.o
> >       text    data     bss     dec     hex filename
> >         48       0       0      48      30 hello_world-1.o
> [snip]
> 
> What the... how did you make your program so small?  My output:
 
Heh.  I was working with the object file.  No linking was done.  :)


> What is bss?  Is that space dynamically generated in memory at runtime?
 
Yeah -- my understanding is that the data segment is divided into two parts:
a segment which gets initialized by the programmer and a segment that gets
initialized by the C library at run time.  The latter is called "bss", which
was a command used by an old assembler.  It stood for "block started by
symbol".

> At least in theory. data section is also executable in practice which
> allows buffer overflow to be used to execute malicious codes.  But we
> can't simply block out execution of the data section because I think
> function pointers need to be executed outside the text area..???

Is that true?  I'd think that function pointers necessarily need to point
inside the text section.

> Anyway, so if you think about a constant string like, "hello, world",
> there's no reason not to store it in the text section, because the string
> isn't supposed to be changeable.  If you were to store it in the data
> section, however, you could accidentally overwrite it, which is fine from
> a single-program point of view, but you can save some memory if you put it
> in the text section, because the text section isn't replicated (only the
> other sections) if you run multiple versions of the same program.  This
> makes it more important that the text section be non-writable and that
> policy be enforced, so that one program accidentally modifying the text
> section doesn't affect another instance of the same program.
> 
> Or that's how I understand it.

No -- that makes perfect sense.  I think that has to be true.  Good thinking!

> > However, I note that there's a section called .rodata, which I've never
> > heard of before, but I'm assumming that it stands for "read only data
> > section".  It's 13 bytes - just the size of "hello world\n\0".
> 
> Looking at the assembly output (thanks for that idea, Bryan!), .rodata is a
> custom name of a section:
> 
>    .section .rodata
> 
> whereas the .text section is reserved:
> 
>    .text
> 
> They might as well have named .rodata section as .whatever instead of
> .rodata and it'll still work the same way.  What the compiler does with
> this section is I think linker-dependent, but it probably takes the section
> and stick it in one of the canonical sections -- .text, .data, etc.
> according to some predefined rule.  (It's probably .text section by
> default, unless specified otherwise.)
 
More coolness.

> These custom sections are useful because it allows the linker to shuffle
> them around to fit any hardware constraints that may exist.  The linker
> won't break up a section, though.
 
YMC.  Where did you learn this from?  Have you taken a compiler class?  At
some point in my life, I'd like to learn more about this.

> > That's probably why this program segfaults:
> >
> >    #include<stdio.h>
> >
> >    int main(void)
> >    {
> >       char *string="hello world\n";
> >
> >       string[3] = 'T';
> >
> >       puts(string);
> >
> >       return 0;
> >    }
> >
> > because string is the address of something that lives in a section of memory
> > marked read-only.  Whammo -- sigsegv.  I remember Mark posting about 3 or 4
> > years ago that this actually worked on some other Unicies (not Linux).
> 
> Wow... you still remember that?  Yeah I seem to recall trying out
> something like that.  I think it was one of Sun's OSes.  Might have been
> HP/UX, too, since those two were the primary other unices I had access to.
 
Heh.  I have just about every post you ever made to vox-tech saved on my hard
drive.  ;-)   I have "subject" folders and "people" folders.  Some people
can't help but say interesting stuff.

> > Maybe my immediate question is -- where do read-only strings live?  In the
> > text section or the .rodata section?  I've seen evidence that it lives in
> > both section.
> 
> .rodata is a custom section name which lives in the .text section.  Think
> of it as an alias to an offset into the .text section.  At least that's
> how it worked in one assembler I used to use, and this seems to work the
> same way from what you've discovered.  That's pretty cool!
> 
> > If anyone cares to riff off this, I'd certainly be interested in anything
> > anyone says.
> 
> I think it's cool that you've done all this analysis.  I didn't know about
> the size program and I haven't really seen much usage of objdump so it's
> cool to see it used here.  Thanks Peter!

I just asked the questions... you and Bryan answered them!  :)

So just to reiterate.  It appears that read-only strings get placed into a
custom section called ".rodata" by the compiler.  Then, ".rodata" gets placed
into the .text segment during linking.

I've heard of relocation before, but never had the chance to read about it.
I suppose this is what relocation refers to?  Relocating a segment in an
object file to elsewhere either in memory or the executable?

I noticed this in the object file:

RELOCATION RECORDS FOR [.text]:
OFFSET   TYPE              VALUE 
00000013 R_386_32          .rodata
1G

Thanks!
Pete

-- 
The mathematics of physics has become ever more abstract, rather than more
complicated.  The mind of God appears to be abstract but not complicated.
He also appears to like group theory.  --  Tony Zee's "Fearful Symmetry"

PGP Fingerprint: B9F1 6CF3 47C4 7CD8 D33E  70A9 A3B9 1945 67EA 951D


More information about the vox-tech mailing list