[vox-tech] emergency: please help. /lib keeps disappearing
Ken Bloom
vox-tech@lists.lugod.org
Fri, 28 May 2004 01:14:38 -0700
--FsscpQKzF/jJk6ya
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Wed, May 26, 2004 at 06:52:22AM -0700, Peter Jay Salzman wrote:
> Last night, on one of my machines I ran:
>=20
> apt-get update && apt-get upgrade
>=20
> During the update, most of my consoles began to read:
>=20
> "init spawning too fast. Disabling getty for 5 minutes"
>=20
> When I typed "ps aux", bash returned with:
>=20
> /bin/ps: File not found
>=20
> In fact, everything from "bash" to "mount" to "shutdown" yielded "file
> not found" errors. I've got a baaad feeling about this, Chewie.
>=20
> I used the magic sysrq keys to sync my hard drives, then unmount my hard
> drives, then to reboot the system.
>=20
> Lilo came up healthy and I chose my 2.6.6 kernel. Booting proceeded,
> but towards the end of the bootup, I got:
>=20
> No init found. Kernel panic.
>=20
> Shit. I reached for my Knoppix disk and booted it. Mounted my root
> partition under /mnt/root to take a look. I found the problem
> immediately. /lib was gone. Absolutely gone. As in, the directory was
> completely missing.
>=20
> I've recovered from situations worse than this, so I rolled up my
> sleeves and got to work.
>=20
> First thing I did was "mkdir /mnt/root/lib". After an "ifconfig" and a
> "route", my network was working.
>=20
> I used scp to copy the entire /lib directory from another machine to the
> crippled machine. I then ran:
>=20
> chroot /mnt/root /sbin/ldconfig
>=20
> It gave me some warnings that a couple of files which ldconfig was
> expecting to be symlinks were actual regular disk files. That's OK (I
> believe). So at this point, I have a working libc and a lot of other
> important libraries. Next step is to uncripple my kernels.
>=20
> Since I have a working libc, I do:
>=20
> chroot /mnt/root
>=20
> Then, for each kernel source (2.4.26, 2.6.0, 2.6.5, 2.6.6), I do:
>=20
> make modules && make modules_install
>=20
> OK. Kernels should be OK. libc is taken care of. In theory, this
> machine should boot, and except for packages on the crippled machine
> that weren't on my other workstation, everything should be more or less
> OK.
>=20
> At this point, I'm aiming at full recovery, not "bringing back to life".
>=20
> The machine boots fine. First thing's first. Just to make absolutely
> sure:
>=20
> apt-get install --reinstall libc6 libc6-dev
>=20
> Good. Now I begin to recover packages. I don't know how to get a list
> of all packages that install files in /lib, so I do the only thing I
> know how to do:
>=20
> COLUMNS=3D255 dpkg -l "*" | grep -v 'description'
> | awk '{ print $2 }' > recover.sh
>=20
> which gives me a list of all installed packages. BTW, if anybody knows
> how to generate a list of all packages which have files in /lib, I'd
> like to know.
>=20
> The command above gives me a list that looks like:
>=20
> ...
> exim4-base
> exim4-config
> exim4-doc-html
> exim4-doc-info
> exuberant-ctag
> fakeroot
> fdflush
> fdutils
> ...
>=20
> Using vim, I prepend each line with "apt-get install --reinstall --yes
> --ignore-missing" so now recover.sh looks like:
>=20
> apt-get install --reinstall --yes --ignore-missing exim4-base
> apt-get install --reinstall --yes --ignore-missing exim4-config
> apt-get install --reinstall --yes --ignore-missing exim4-doc-html
> apt-get install --reinstall --yes --ignore-missing exim4-doc-info
> apt-get install --reinstall --yes --ignore-missing exuberant-ctag
> apt-get install --reinstall --yes --ignore-missing fakeroot
> apt-get install --reinstall --yes --ignore-missing fdflush
> apt-get install --reinstall --yes --ignore-missing fdutils
>=20
> And I run it with:
>=20
> sh recover.sh
>=20
> It started to work fine. Everything seemed to be going OK.
>=20
> About 5 or 10 minutes into it, I started to see on my console:
>=20
> "init spawning too fast. Disabling getty for 5 minutes"
>=20
> After doing a ps aux, I got:
>=20
> /bin/ps: File not found
>=20
> Oh oh. I've got a baaad feeling about this, Chewie. I pressed ctl-c to
> get out of apt-get, and did:
>=20
> sync
> ctl-d (I'm now out of chroot)
> umount /dev/hdb1
> mount /dev/hdb1 /mnt/root
> chroot /mnt/root
>=20
> It told me that "/bin/bash" was not found. An "ls /mnt/root/" revealed:
>=20
> /lib was missing again.
>=20
> I was outside of chroot, so I did a "ps aux" again. There was a dd
> process trying to access /dev/fd0. I killed it. Could that be a clue?
> Does apt-get ever access the floppy?
>=20
>=20
>=20
> My hypothesis is that there must be a Debian package that's doing this.
> I checked debian-user but there's no mention of this, and this would be
> BIG. I'm sure I wouldn't be the first to find this.
>=20
> If nobody has better ideas, I was going to try all these steps over, but
> do something like:
>=20
>=20
> ...
> apt-get install --reinstall --yes --ignore-missing exim4-base
> echo "I did exim4-base" >> /root/log
> apt-get install --reinstall --yes --ignore-missing exim4-config
> echo "I did exim4-config" >> /root/log
> apt-get install --reinstall --yes --ignore-missing exim4-doc-info
> echo "I did exim4-doc-info" >> /root/log
> apt-get install --reinstall --yes --ignore-missing exuberant-ctag
> echo "I did exuberant-ctag" >> /root/log
> apt-get install --reinstall --yes --ignore-missing fakeroot
> echo "I did fakeroot" >> /root/log
> ...
>=20
>=20
> That's about the only thing I can think of to try (other than installing
> Gentoo which I've been thinking about for a few weeks now).
>=20
> I could sure use some help!
>=20
> Thanks to all who actually made it down here.
>=20
> Pete
>=20
> --=20
> GPG Instructions: http://www.dirac.org/linux/gpg
> GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
>=20
> Good, Fast, Cheap. Pick any two (you can't have all three)
> --- RFC 1925: The Twelve Networking Truths (The 7th truth)
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
>
Are you running stable, testing, or unstable.
I'm not dist-upgrading for a few days because I want to know what ate
your /lib and I want to make sure it doesn't happen to me. Let me know
what caused it.
--=20
I usually have a GPG digital signature included as an attachment.
See http://www.gnupg.org/ for info about these digital signatures.
My key was last signed 10/14/2003. If you use GPG *please* see me about=20
signing the key. ***** My computer can't give you viruses by email. ***
--FsscpQKzF/jJk6ya
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)
iD8DBQFAtvTulHapveKyytERArJpAJ4sq4jtDknM87MLGuZ8Wulf0OqTXgCeN3KT
AkX3fkrSvLhjZWgNCxM26MA=
=YEIF
-----END PGP SIGNATURE-----
--FsscpQKzF/jJk6ya--