[vox-tech] behind the scenes of static

Jeff Newmiller vox-tech@lists.lugod.org
Mon, 29 Jul 2002 19:34:31 -0700 (PDT)


On Fri, 26 Jul 2002, Micah Cowan wrote:

> Peter Jay Salzman writes:
>  > > Declaring global variables and functions as "static" keeps the scope of
>  > > the variable/function to the file.  I'm no compiler writer, but this could
>  > > be done simply by compiling the file, then removing the static symbol
>  > > names.  If that's how things are done, this would be a task done by the
>  > > compiler, not the linker.
>  > 
>  > i see what your saying, but it just seems like a more suitable task for
>  > the linker.  the process of (resolving or) removing symbol names after
>  > compilation just seems like more in the realm of linking.
> 
> Yeah, but the linker has no way of knowing what was declared as
> "extern" or "static" unless the compiler signals it. AFA GNU
> implementations are concerned, both types of symbols actually make it
> into the resulting ELF; but they are marked differently. This is
> because the linker still needs to map references to the static objects
> to some memory space (this is only necessary for dynamically loaded
> objects, AFAIK); it will resolve internal links only when referred to
> internally (duh), and will resolve external links for whoever wants
> 'em.

Dynamically loaded objects? The information has to be there for the static
linker to fixup the internal references to internal symbol offsets as
well, right?

This thread is a bit abstract... some of you may find the following
example useful:

Typically, linkers work with "sections" (also somewhate confusingly
referred to as "segments" in some literature), and symbols that refer to
offsets in those sections. 

The sections are conventionally labelled "text", "data" and "bss", as
well as others specific to an architecture or compiler.  (I'll get back to
what these are used for later.) They are all treated very much the same
from the point of view of the linker. The compiler/assembler marks its
needs in these segments (often incrementally) as it generates the object
file.  You can think of the sections as "bags" that the linker fills with
"allocations" as it processes the object files.

Within an object file, symbols refer to memory using the offset from the
first byte allocated in that section during the compilation/assembly that
generated the object file.  Thus, each object file has its own "beginning"
for each section.  It is the linker's job to mush all the sections
together AND fixup all the places in the code that use the symbols to use
absolute addresses.  Some of this process may be deferred to load time if
shared libraries are in use, in which case the linker still has to fix the
symbol offsets to count from the now-shared beginning of the mushed
sections. For example, if two object files have symbols pointing to offset
zero in the data section, then the linker puts the first one at offset
zero, and adds the amount of "data" section used in the first file to all
of the "data" offsets in the second file, "shifting" the latter all upward
in the combined section.

Consider the following C program (written for explanation, not
necessarily good code design):

---test.c---
#include <stdio.h>

char *do_getstr( void );

int i = 1;
int j;

int main()
{
 char *name;
 static int x;
 static int y = 2;
 int z;

 printf( "Hello, what is your name? : " );
 if ( NULL != ( name = do_getstr() ) ) {
    printf( "Pleased to meet you, %s!\n", name );
 } else {
    printf( "Sorry, couldn't understand you.\n" );
 }
 printf( "Default value of initialized global \"i\" is %d\n", i );
 printf( "Default value of uninitialized global \"j\" is %d\n", j );
 printf( "Default value of uninitialized block-static \"x\" is %d\n", x );
 printf( "Default value of initialized block-static \"y\" is %d\n", y );
 printf( "Default value of uninitialized automatic \"z\" is %d\n", z );
 return 0;
}

static char s[ 20 ];

char *do_getstr( void )
{
 s[ 19 ] = '\0';
 if ( NULL != fgets( s, 19, stdin ) ) {
    s[ strlen( s ) - 1 ] = '\0';
    if ( '\0' == s[ 0 ] )
	return NULL;
    return s;
 }
 return NULL;
}
------------

Compile this to object code ("gcc -c test.c"), and take a look at the
object file ("objdump -st test.o") (see comments marked ###):

------------
test.o:     file format elf32-i386

SYMBOL TABLE:
00000000 l    df *ABS*	00000000 test.c
00000000 l    d  .text	00000000 
00000000 l    d  .data	00000000 
00000000 l    d  .bss	00000000 
00000000 l       .text	00000000 gcc2_compiled.
00000000 l     O .bss	00000004 x.3
  ### x.3 is a local ("l") object ("O"; as distingushed from executable
  ### code) occuring at offset 0 in .bss and having length 4 bytes.
  ### This represents the block-static variable "x" in main().
  ### The ".3" is to distinguish this local symbol from any others
  ### from other blocks in this file.
  ### Because this variable is in .bss, when the program starts the
  ### "startup code" that runs before main() starts will write zeroes
  ### throughout the .bss section, with the net effect of zeroing all
  ### bss data before the program starts, yet not wasting space in the
  ### object file with a bunch of zeroes.  Therefore, you will not
  ### find a "bss" section at all in the object file.  Only the offsets
  ### into the bss section are recorded (as symbols) in the object file.
00000004 l     O .data	00000004 y.4
  ### This represents the block-static variable "y" in main().
  ### The initializing value for this variable is found below, in
  ### the "data" section of the object file.
00000000 l    d  .rodata	00000000 
00000004 l     O .bss	00000014 s
00000000 l    d  .note	00000000 
00000000 l    d  .comment	00000000 
00000000 g     O .data	00000004 i
  ### This represents the externally visible ("g" for global) initialized
  ### variable "i".  It is stored in the same section of memory as
  ### the "private" variable "x", but is directly addressable from
  ### functions in other files because it is global.
00000000 g     F .text	000000b2 main
  ### This represents the externally visible ("g") function
  ### ("F") "main()", being 178 bytes of code stored in the
  ### "text" section.
00000000         *UND*	00000000 printf
  ### The offset and section for this symbol were not known to the
  ### compiler at the time of compilation.  One of the tasks left to the
  ### linker is to look through the library files until it finds a
  ### symbol definition to fill out this undefined marker, so the
  ### main function can call the function.
000000b4 g     F .text	0000006e do_getstr
00000004       O *COM*	00000004 j
  ### The variable "j" is allocated as a "common" section... in effect,
  ### "j" is the name of the section, and all sections named "j" are
  ### merged into a single section in the linked output.  Thus, gcc will
  ### let you declare uninitialized extern variables in multiple
  ### files (or shared header files) and end up with a single shared
  ### variable after linking.  Contrast this with initialized extern
  ### variables, that must only be declared in a single file and
  ### the "extern" notation must be used in all other files to be linked.
00000000         *UND*	00000000 stdin
00000000         *UND*	00000000 fgets
00000000         *UND*	00000000 strlen


Contents of section .text:
 0000 5589e583 ec1883c4 f4680000 0000e8fc  U........h......
 0010 ffffff83 c410e8fc ffffff89 c08945fc  ..............E.
 0020 837dfc00 741a83c4 f88b45fc 50681d00  .}..t.....E.Ph..
 0030 0000e8fc ffffff83 c410eb14 8d742600  .............t&.
 0040 83c4f468 40000000 e8fcffff ff83c410  ...h@...........
 0050 83c4f8a1 00000000 50688000 0000e8fc  ........Ph......
 0060 ffffff83 c41083c4 f8a10000 00005068  ..............Ph
 0070 c0000000 e8fcffff ff83c410 83c4f8a1  ................
 0080 00000000 50680001 0000e8fc ffffff83  ....Ph..........
 0090 c41083c4 f8a10400 00005068 40010000  ..........Ph@...
 00a0 e8fcffff ff83c410 31c0eb04 8d742600  ........1....t&.
 00b0 c9c389f6 5589e583 ec08c605 17000000  ....U...........
 00c0 0083c4fc a1000000 00506a13 68040000  .........Pj.h...
 00d0 00e8fcff ffff83c4 1089c085 c0743883  .............t8.
 00e0 c4f46804 000000e8 fcffffff 83c41089  ..h.............
 00f0 c08d50ff b8040000 00c60402 00803d04  ..P...........=.
 0100 00000000 750a31c0 eb168db6 00000000  ....u.1.........
 0110 b8040000 00eb0931 c0eb0590 8d742600  .......1.....t&.
 0120 c9c38db4 26000000 008dbc27 00000000  ....&......'....
Contents of section .data:
 0000 01000000 02000000                    ........        

  ### Initial values for y and i.

Contents of section .note:
 0000 08000000 00000000 01000000 30312e30  ............01.0
 0010 31000000                             1...            
Contents of section .rodata:
 0000 48656c6c 6f2c2077 68617420 69732079  Hello, what is y
 0010 6f757220 6e616d65 3f203a20 00506c65  our name? : .Ple
 0020 61736564 20746f20 6d656574 20796f75  ased to meet you
 0030 2c202573 210a0000 00000000 00000000  , %s!...........
 0040 536f7272 792c2063 6f756c64 6e277420  Sorry, couldn't 
 0050 756e6465 72737461 6e642079 6f752e0a  understand you..
 0060 00000000 00000000 00000000 00000000  ................
 0070 00000000 00000000 00000000 00000000  ................
 0080 44656661 756c7420 76616c75 65206f66  Default value of
 0090 20696e69 7469616c 697a6564 20676c6f   initialized glo
 00a0 62616c20 22692220 69732025 640a0000  bal "i" is %d...
 00b0 00000000 00000000 00000000 00000000  ................
 00c0 44656661 756c7420 76616c75 65206f66  Default value of
 00d0 20756e69 6e697469 616c697a 65642067   uninitialized g
 00e0 6c6f6261 6c20226a 22206973 2025640a  lobal "j" is %d.
 00f0 00000000 00000000 00000000 00000000  ................
 0100 44656661 756c7420 76616c75 65206f66  Default value of
 0110 20756e69 6e697469 616c697a 65642062   uninitialized b
 0120 6c6f636b 2d737461 74696320 22782220  lock-static "x" 
 0130 69732025 640a0000 00000000 00000000  is %d...........
 0140 44656661 756c7420 76616c75 65206f66  Default value of
 0150 20696e69 7469616c 697a6564 20626c6f   initialized blo
 0160 636b2d73 74617469 63202279 22206973  ck-static "y" is
 0170 2025640a 00000000 00000000 00000000   %d.............
Contents of section .comment:
 0000 00474343 3a202847 4e552920 322e3935  .GCC: (GNU) 2.95
 0010 2e342032 30303131 30303220 28446562  .4 20011002 (Deb
 0020 69616e20 70726572 656c6561 73652900  ian prerelease).
------------

Automatic variables do not appear in the object file at all, since
they are allocated on the stack as the function begins, and
appear only as stack offsets encoded in the machine code of the
functions (in the text section).

Returning to the original point, that the use of "static" seems to be
"overloaded"... I don't think it _is_ overloaded.  No matter where it is
used, static always indicates limited-scope storage located in the bss
(uninitialized) or data (initialized) segments, much like initialized
global variables.  The fact that the symbol for a file-static or
block-static variable like "s" or "x" or "y" is only usable in that file
or block is a scoping thing that allows you to re-use variable names in
different places without name collisions.

Static allocation is a powerful tool.  It shares some of the drawbacks of
global variables, but if the scope limitations are used appropriately it
can minimize the dangers of global variables while retaining some of
their simplicity and performance benefits.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
---------------------------------------------------------------------------