[vox-tech] late night musings: stripping
Mitch Patenaude
vox-tech@lists.lugod.org
Thu, 26 Feb 2004 11:20:53 -0800
On Thursday, Feb 26, 2004, at 09:22 US/Pacific, Peter Jay Salzman wrote:
> we finished right into glibc. shouldn't GDB have known when myfunction
> returned to main, even if there's no debugging information?
Hmm.. I'm not sure, but I have a suspicion. It has to do with how
breakpoints are implemented. The debugger isn't simply using "stepi"
to go instruction by instruction. It's modifying the code to return
control to the debugger when certain things happen.
Since it has the address of the function, what it does it replace the
first instruction(s) with something that returns control to the
debugger.. probably a jump or a trap or something else short. It takes
the code that was overwritten by that trap, and puts it in separate
place that's executed before the control is returned to the program.
But.... without detailed debugging information it's very hard to know
when you're returning from a stack frame. Since all the cleanup is
done after return, you can "return" from just at about anywhere.. and
there may be more than one return in a function. The return that gets
taken is not necessarily the first one linearly after the program, and
replacing ALL the RET's after the current entry point might break other
functions.
It COULD try to implement this by playing with the return address in
the stack frame, having it "return" into the debugger, but I suspect
that they just didn't bother. If you want that functionality, compile
with debugging info turned on.
> what would be useful would be something like GDB which can follow a
> process and collect information about:
>
> 1. control flow (what functions call what).
> 2. get the parameters and return values of the function calls.
>
>
> the only way i know how to get #1 is to sit there, using stepi (and
> possibly nexti over uninteresting libc functions) with a pencil and
> paper in hand.
>
> as for #2, seems like the only way to do that would be to disassemble
> the code. i don't know a lick of x86 assembly, but i did notice that
> %eax appears to be the register for returning integers.
gdb has a pretty extensive scripting language, and I imaging that you
could do that by looking at when the next instruction is a "CALL",
though you also need to trap "INT 0x80" because that's the system call
interrupt.
> i rewrote myfunction to return an int of 1.
For those unfamiliar with the C call stack I'll add some comments
> 0x8048380 <myfunction>: push %ebp
Pushes the base pointer onto the stack, to save the stack frame of the
calling function
> 0x8048381 <myfunction+1>: mov %esp,%ebp
Moves the current stack pointer into the base pointer to make a new
"frame"
> 0x8048383 <myfunction+3>: sub $0x8,%esp
Make room on the stack for 8 bytes of automatic variables and arguments.
> 0x8048386 <myfunction+6>: movl $0x80484b4,(%esp,1)
"Push" the address of the format string "Hello World" onto the stack
(notice that it doesn't use push.. and so leaves the stack pointer
untouched.. I don't know why that is.)
> 0x804838d <myfunction+13>: call 0x8048288 <printf>
call printf
> 0x8048392 <myfunction+18>: mov $0x1,%eax
Move the return value into eax
> 0x8048397 <myfunction+23>: leave
I don't remember leave.. maybe it's an post 386 extension to
automatically clean up the stack
> 0x8048398 <myfunction+24>: ret
And return.
It's pretty much the same with a floating point return value,, except.
> here, myfunction returns a float of 1.0:
> [...]
> 0x8048394 <myfunction+18>: fld1
> 0x8048396 <myfunction+20>: leave
> 0x8048397 <myfunction+21>: ret
>
> i'll need to google for fld1, but its general idea is clear.
fld1 loads the floating point stack. In olden days, floating point
operations were handled by a separate chip (the 80{,1,2,3}87), and so
you needed to load values onto a separate floating point "stack" in the
coprocessor before you could perform operations.. I'm guessing that
when the functions returns double/float, it leaves the return value on
the top of the floating point stack for efficiency, since functions
that return a float are often used as part of a larger equation (i.e.
x = 4.0*ln(y+sqrt(z)); )
> anyway, it would be neat to have a program that automated all this
> stuff.
I'd be surprised if it hadn't already been done, though it's might be
"underground", since it's got a lot of quasi-ethical aspects to it.
-- Mitch