[vox-tech] emergency: please help. /lib keeps disappearing
Peter Jay Salzman
vox-tech@lists.lugod.org
Wed, 26 May 2004 06:52:22 -0700
Last night, on one of my machines I ran:
apt-get update && apt-get upgrade
During the update, most of my consoles began to read:
"init spawning too fast. Disabling getty for 5 minutes"
When I typed "ps aux", bash returned with:
/bin/ps: File not found
In fact, everything from "bash" to "mount" to "shutdown" yielded "file
not found" errors. I've got a baaad feeling about this, Chewie.
I used the magic sysrq keys to sync my hard drives, then unmount my hard
drives, then to reboot the system.
Lilo came up healthy and I chose my 2.6.6 kernel. Booting proceeded,
but towards the end of the bootup, I got:
No init found. Kernel panic.
Shit. I reached for my Knoppix disk and booted it. Mounted my root
partition under /mnt/root to take a look. I found the problem
immediately. /lib was gone. Absolutely gone. As in, the directory was
completely missing.
I've recovered from situations worse than this, so I rolled up my
sleeves and got to work.
First thing I did was "mkdir /mnt/root/lib". After an "ifconfig" and a
"route", my network was working.
I used scp to copy the entire /lib directory from another machine to the
crippled machine. I then ran:
chroot /mnt/root /sbin/ldconfig
It gave me some warnings that a couple of files which ldconfig was
expecting to be symlinks were actual regular disk files. That's OK (I
believe). So at this point, I have a working libc and a lot of other
important libraries. Next step is to uncripple my kernels.
Since I have a working libc, I do:
chroot /mnt/root
Then, for each kernel source (2.4.26, 2.6.0, 2.6.5, 2.6.6), I do:
make modules && make modules_install
OK. Kernels should be OK. libc is taken care of. In theory, this
machine should boot, and except for packages on the crippled machine
that weren't on my other workstation, everything should be more or less
OK.
At this point, I'm aiming at full recovery, not "bringing back to life".
The machine boots fine. First thing's first. Just to make absolutely
sure:
apt-get install --reinstall libc6 libc6-dev
Good. Now I begin to recover packages. I don't know how to get a list
of all packages that install files in /lib, so I do the only thing I
know how to do:
COLUMNS=255 dpkg -l "*" | grep -v 'description'
| awk '{ print $2 }' > recover.sh
which gives me a list of all installed packages. BTW, if anybody knows
how to generate a list of all packages which have files in /lib, I'd
like to know.
The command above gives me a list that looks like:
...
exim4-base
exim4-config
exim4-doc-html
exim4-doc-info
exuberant-ctag
fakeroot
fdflush
fdutils
...
Using vim, I prepend each line with "apt-get install --reinstall --yes
--ignore-missing" so now recover.sh looks like:
apt-get install --reinstall --yes --ignore-missing exim4-base
apt-get install --reinstall --yes --ignore-missing exim4-config
apt-get install --reinstall --yes --ignore-missing exim4-doc-html
apt-get install --reinstall --yes --ignore-missing exim4-doc-info
apt-get install --reinstall --yes --ignore-missing exuberant-ctag
apt-get install --reinstall --yes --ignore-missing fakeroot
apt-get install --reinstall --yes --ignore-missing fdflush
apt-get install --reinstall --yes --ignore-missing fdutils
And I run it with:
sh recover.sh
It started to work fine. Everything seemed to be going OK.
About 5 or 10 minutes into it, I started to see on my console:
"init spawning too fast. Disabling getty for 5 minutes"
After doing a ps aux, I got:
/bin/ps: File not found
Oh oh. I've got a baaad feeling about this, Chewie. I pressed ctl-c to
get out of apt-get, and did:
sync
ctl-d (I'm now out of chroot)
umount /dev/hdb1
mount /dev/hdb1 /mnt/root
chroot /mnt/root
It told me that "/bin/bash" was not found. An "ls /mnt/root/" revealed:
/lib was missing again.
I was outside of chroot, so I did a "ps aux" again. There was a dd
process trying to access /dev/fd0. I killed it. Could that be a clue?
Does apt-get ever access the floppy?
My hypothesis is that there must be a Debian package that's doing this.
I checked debian-user but there's no mention of this, and this would be
BIG. I'm sure I wouldn't be the first to find this.
If nobody has better ideas, I was going to try all these steps over, but
do something like:
...
apt-get install --reinstall --yes --ignore-missing exim4-base
echo "I did exim4-base" >> /root/log
apt-get install --reinstall --yes --ignore-missing exim4-config
echo "I did exim4-config" >> /root/log
apt-get install --reinstall --yes --ignore-missing exim4-doc-info
echo "I did exim4-doc-info" >> /root/log
apt-get install --reinstall --yes --ignore-missing exuberant-ctag
echo "I did exuberant-ctag" >> /root/log
apt-get install --reinstall --yes --ignore-missing fakeroot
echo "I did fakeroot" >> /root/log
...
That's about the only thing I can think of to try (other than installing
Gentoo which I've been thinking about for a few weeks now).
I could sure use some help!
Thanks to all who actually made it down here.
Pete
--
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
Good, Fast, Cheap. Pick any two (you can't have all three)
--- RFC 1925: The Twelve Networking Truths (The 7th truth)