[vox-tech] late night musings: stripping

Mitch Patenaude vox-tech@lists.lugod.org
Thu, 26 Feb 2004 11:20:53 -0800


On Thursday, Feb 26, 2004, at 09:22 US/Pacific, Peter Jay Salzman wrote:
> we finished right into glibc.  shouldn't GDB have known when myfunction
> returned to main, even if there's no debugging information?

Hmm.. I'm not sure, but I have a suspicion.  It has to do with how 
breakpoints are implemented.  The debugger isn't simply using "stepi" 
to go instruction by instruction.  It's modifying the code to return 
control to the debugger when certain things happen.

Since it has the address of the function, what it does it replace the 
first instruction(s) with something that returns control to the 
debugger.. probably a jump or a trap or something else short.  It takes 
the code that was overwritten by that trap, and puts it in separate 
place that's executed before the control is returned to the program.

But.... without detailed debugging information it's very hard to know 
when you're returning from a stack frame.  Since all the cleanup is 
done after return, you can "return" from just at about anywhere.. and 
there may be more than one return in a function.  The return that gets 
taken is not necessarily the first one linearly after the program, and 
replacing ALL the RET's after the current entry point might break other 
functions.

It COULD try to implement this by playing with the return address in 
the stack frame, having it "return" into the debugger, but I suspect 
that they just didn't bother.  If you want that functionality, compile 
with debugging info turned on.

> what would be useful would be something like GDB which can follow a
> process and collect information about:
>
> 1. control flow (what functions call what).
> 2. get the parameters and return values of the function calls.
>
>
> the only way i know how to get #1 is to sit there, using stepi (and
> possibly nexti over uninteresting libc functions) with a pencil and
> paper in hand.
>
> as for #2, seems like the only way to do that would be to disassemble
> the code.  i don't know a lick of x86 assembly, but i did notice that
> %eax appears to be the register for returning integers.

gdb has a pretty extensive scripting language, and I imaging that you 
could do that by looking at when the next instruction is a "CALL", 
though you also need to trap "INT 0x80" because that's the system call 
interrupt.

> i rewrote myfunction to return an int of 1.

For those unfamiliar with the C call stack I'll add some comments

>    0x8048380 <myfunction>: push   %ebp
Pushes the base pointer onto the stack, to save the stack frame of the 
calling function

>    0x8048381 <myfunction+1>:       mov    %esp,%ebp
Moves the current stack pointer into the base pointer to make a new 
"frame"

>    0x8048383 <myfunction+3>:       sub    $0x8,%esp
Make room on the stack for 8 bytes of automatic variables and arguments.

>    0x8048386 <myfunction+6>:       movl   $0x80484b4,(%esp,1)
"Push" the address of the format string "Hello World" onto the stack 
(notice that it doesn't use push.. and so leaves the stack pointer 
untouched.. I don't know why that is.)

>    0x804838d <myfunction+13>:      call   0x8048288 <printf>
call printf

>    0x8048392 <myfunction+18>:      mov    $0x1,%eax
Move the return value into eax

>    0x8048397 <myfunction+23>:      leave
I don't remember leave.. maybe it's an post 386 extension to 
automatically clean up the stack

>    0x8048398 <myfunction+24>:      ret
And return.

It's pretty much the same with a floating point return value,, except.

> here, myfunction returns a float of 1.0:
> [...]
>    0x8048394 <myfunction+18>:      fld1
>    0x8048396 <myfunction+20>:      leave
>    0x8048397 <myfunction+21>:      ret
>
> i'll need to google for fld1, but its general idea is clear.

fld1 loads the floating point stack.  In olden days, floating point 
operations were handled by a separate chip (the 80{,1,2,3}87), and so 
you needed to load values onto a separate floating point "stack" in the 
coprocessor before you could perform operations.. I'm guessing that 
when the functions returns double/float, it leaves the return value on 
the top of the floating point stack for efficiency, since functions 
that return a float are often used as part of a larger equation (i.e.  
x = 4.0*ln(y+sqrt(z)); )

> anyway, it would be neat to have a program that automated all this
> stuff.

I'd be surprised if it hadn't already been done, though it's might be 
"underground", since it's got a lot of quasi-ethical aspects to it.

-- Mitch