[vox-tech] glibc and socket()

Jeff Newmiller vox-tech@lists.lugod.org
Wed, 12 Feb 2003 01:38:45 -0800 (PST)


On Tue, 11 Feb 2003, Nicole the Wonder Nerd wrote:

> Up spake Jeff Newmiller on Mon, Feb 10, 2003 at 09:11:24PM -0800:
> > > A call to sockaddr_check() goes through OK, but a subsequent call to socket() 
> > > triggers a kernel panic.  Control never even gets to the first line of socket().
> > You know this without recompiling glibc for debugging?  Or are you saying
> > it doesn't get to sys_socket()?  The latter sounds an awful lot like
> > header problems.
> 
> My experience is this: the kernel panics (complete with "aiee, killed interrupt handler" 
> message) before gdb has a chance to print out the first line of the glibc function socket().  
> I interpret this to mean that the kernel panics before control is transferred.  Is this view correct?

"before control is transferred" is an interesting way to describe a
function call.  The machine code loads arguments on the stack and performs
a "call" instruction that pushes the current program counter and loads the
destination address into the program counter register.  Things that can go
wrong with userland code happen at this level of granularity... a single
source code line can involve dozens of instructions, including stack check
subroutine calls or performance profiling calls.

Have you used 

  display/i $pc
  stepi
  stepi
  (etc...)

in gdb to see where in the course of the many instructions involved in
performing the function call that the problem occurs?

If it is the call itself, then the shared library linkage would seem to be
hosed, or there is a problem with the memory protection configuraton or
handling thereof. If it is associated with a kernel call made in the
course of arriving at the "first line of socket()" then you have a more
precise direction to look at in the kernel.

> Based on your advice and the Linus post you linked to, I guess I have a new question 
> for you, Jeff: What, exactly, are the consequences of compiling glibc against the 
> wrong kernel headers?  Could it be causing a kernel panic?  

That consequence does sound inappropriate on second thought, but then
again Linux is not perfect... so it is possible that garbage data from
userland could hose the kernel.  Your description of a reproducible
problem linked to specific userland code being executed certainly supports
the theory.

There are a couple of incompatibilities I can dream up, but I don't know
this area of Linux well enough to offer a definitive answer to your
question regarding exact consequences.

One possible problem is the data structure growth associated with kernel
upgrades: the library embeds kernel data structures within its own data
structures, and an application that is compiled against the new kernel and
library headers may expect the library data structures to be larger than
the old library binaries will think they should be, and overrun errors or
kernel interpretation of garbage data may result.

Another possible problem could lie in the constants compiled into the
application from new library/kernel headers ... when combined with
constants embedded in an old library binary, the kernel could
be mislead into thinking the library was providing data for a
newer API due to values embedded in data structures by the
application. Since the library is actually handling the data structures,
this could lead to inconsistent data (garbage) being passed to the kernel.

The consequences of interpretation of garbage data depend strongly on the
code doing the interpreting, so specific consequences would be kind of
hard to predict.

> And how can I tell which kernel headers glibc was compiled against?

This I don't know in the general case.  In most systems, people use the
libraries and headers that a distro sets up, and which may be tracked in
the packaging system.  If you are building your own "linux from scratch"
for development purposes then you have to manage this yourself.

After surfing around a bit, I find that it appears that the glibc
maintainers have started actually keeping their own versions of the
headers in the glibc source, and updating them (manually?) to stay in sync
with the stable portions of the kernel binary interface.  As long as your
application uses the psuedo-kernel headers supplied with the library
instead of the actual headers supplied with some version of the kernel
(such as the version on your development machine), you should be safe.

> Thanks a bunch!

Actually, I don't know that any of this is specific enough to be of
use.  Sorry... as I said, I haven't tried to debug the kernel itself.

---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil@dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...2k
---------------------------------------------------------------------------