[vox-tech] emergency: please help. /lib keeps disappearing

Ken Bloom vox-tech@lists.lugod.org
Fri, 28 May 2004 01:14:38 -0700


--FsscpQKzF/jJk6ya
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, May 26, 2004 at 06:52:22AM -0700, Peter Jay Salzman wrote:
> Last night, on one of my machines I ran:
>=20
>    apt-get update && apt-get upgrade
>=20
> During the update, most of my consoles began to read:
>=20
>    "init spawning too fast.  Disabling getty for 5 minutes"
>=20
> When I typed "ps aux", bash returned with:
>=20
>    /bin/ps: File not found
>=20
> In fact, everything from "bash" to "mount" to "shutdown" yielded "file
> not found" errors.  I've got a baaad feeling about this, Chewie.
>=20
> I used the magic sysrq keys to sync my hard drives, then unmount my hard
> drives, then to reboot the system.
>=20
> Lilo came up healthy and I chose my 2.6.6 kernel.  Booting proceeded,
> but towards the end of the bootup, I got:
>=20
>    No init found.  Kernel panic.
>=20
> Shit.  I reached for my Knoppix disk and booted it.  Mounted my root
> partition under /mnt/root to take a look.  I found the problem
> immediately.  /lib was gone.  Absolutely gone.  As in, the directory was
> completely missing.
>=20
> I've recovered from situations worse than this, so I rolled up my
> sleeves and got to work.
>=20
> First thing I did was "mkdir /mnt/root/lib".  After an "ifconfig" and a
> "route", my network was working.
>=20
> I used scp to copy the entire /lib directory from another machine to the
> crippled machine.  I then ran:
>=20
>    chroot /mnt/root /sbin/ldconfig
>=20
> It gave me some warnings that a couple of files which ldconfig was
> expecting to be symlinks were actual regular disk files.  That's OK (I
> believe).  So at this point, I have a working libc and a lot of other
> important libraries.  Next step is to uncripple my kernels.
>=20
> Since I have a working libc, I do:
>=20
>    chroot /mnt/root
>=20
> Then, for each kernel source (2.4.26, 2.6.0, 2.6.5, 2.6.6), I do:
>=20
>    make modules && make modules_install
>=20
> OK.  Kernels should be OK.  libc is taken care of.  In theory, this
> machine should boot, and except for packages on the crippled machine
> that weren't on my other workstation, everything should be more or less
> OK.
>=20
> At this point, I'm aiming at full recovery, not "bringing back to life".
>=20
> The machine boots fine.  First thing's first.  Just to make absolutely
> sure:
>=20
>    apt-get install --reinstall libc6 libc6-dev
>=20
> Good.  Now I begin to recover packages.  I don't know how to get a list
> of all packages that install files in /lib, so I do the only thing I
> know how to do:
>=20
>    COLUMNS=3D255 dpkg -l "*" | grep -v 'description'
>       | awk '{ print $2 }' > recover.sh
>=20
> which gives me a list of all installed packages.  BTW, if anybody knows
> how to generate a list of all packages which have files in /lib, I'd
> like to know.
>=20
> The command above gives me a list that looks like:
>=20
>    ...
>    exim4-base
>    exim4-config
>    exim4-doc-html
>    exim4-doc-info
>    exuberant-ctag
>    fakeroot
>    fdflush
>    fdutils
>    ...
>=20
> Using vim, I prepend each line with "apt-get install --reinstall --yes
> --ignore-missing" so now recover.sh looks like:
>=20
>    apt-get install --reinstall --yes --ignore-missing exim4-base
>    apt-get install --reinstall --yes --ignore-missing exim4-config
>    apt-get install --reinstall --yes --ignore-missing exim4-doc-html
>    apt-get install --reinstall --yes --ignore-missing exim4-doc-info
>    apt-get install --reinstall --yes --ignore-missing exuberant-ctag
>    apt-get install --reinstall --yes --ignore-missing fakeroot
>    apt-get install --reinstall --yes --ignore-missing fdflush
>    apt-get install --reinstall --yes --ignore-missing fdutils
>=20
> And I run it with:
>=20
>    sh recover.sh
>=20
> It started to work fine.  Everything seemed to be going OK.
>=20
> About 5 or 10 minutes into it, I started to see on my console:
>=20
>    "init spawning too fast.  Disabling getty for 5 minutes"
>=20
> After doing a ps aux, I got:
>=20
>    /bin/ps: File not found
>=20
> Oh oh.  I've got a baaad feeling about this, Chewie.  I pressed ctl-c to
> get out of apt-get, and did:
>=20
>    sync
>    ctl-d  (I'm now out of chroot)
>    umount /dev/hdb1
>    mount /dev/hdb1 /mnt/root
>    chroot /mnt/root
>=20
> It told me that "/bin/bash" was not found.  An "ls /mnt/root/" revealed:
>=20
>    /lib was missing again.
>=20
> I was outside of chroot, so I did a "ps aux" again.  There was a dd
> process trying to access /dev/fd0.  I killed it.  Could that be a clue?
> Does apt-get ever access the floppy?
>=20
>=20
>=20
> My hypothesis is that there must be a Debian package that's doing this.
> I checked debian-user but there's no mention of this, and this would be
> BIG.  I'm sure I wouldn't be the first to find this.
>=20
> If nobody has better ideas, I was going to try all these steps over, but
> do something like:
>=20
>=20
>    ...
>    apt-get install --reinstall --yes --ignore-missing exim4-base
>    echo "I did exim4-base" >> /root/log
>    apt-get install --reinstall --yes --ignore-missing exim4-config
>    echo "I did exim4-config" >> /root/log
>    apt-get install --reinstall --yes --ignore-missing exim4-doc-info
>    echo "I did exim4-doc-info" >> /root/log
>    apt-get install --reinstall --yes --ignore-missing exuberant-ctag
>    echo "I did exuberant-ctag" >> /root/log
>    apt-get install --reinstall --yes --ignore-missing fakeroot
>    echo "I did fakeroot" >> /root/log
>    ...
>=20
>=20
> That's about the only thing I can think of to try (other than installing
> Gentoo which I've been thinking about for a few weeks now).
>=20
> I could sure use some help!
>=20
> Thanks to all who actually made it down here.
>=20
> Pete
>=20
> --=20
> GPG Instructions: http://www.dirac.org/linux/gpg
> GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D
>=20
> Good, Fast, Cheap.  Pick any two (you can't have all three)
>    --- RFC 1925: The Twelve Networking Truths (The 7th truth)
> _______________________________________________
> vox-tech mailing list
> vox-tech@lists.lugod.org
>

Are you running stable, testing, or unstable.
I'm not dist-upgrading for a few days because I want to know what ate
your /lib and I want to make sure it doesn't happen to me. Let me know
what caused it.


--=20
I usually have a GPG digital signature included as an attachment.
See http://www.gnupg.org/ for info about these digital signatures.
My key was last signed 10/14/2003. If you use GPG *please* see me about=20
signing the key. ***** My computer can't give you viruses by email. ***

--FsscpQKzF/jJk6ya
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAtvTulHapveKyytERArJpAJ4sq4jtDknM87MLGuZ8Wulf0OqTXgCeN3KT
AkX3fkrSvLhjZWgNCxM26MA=
=YEIF
-----END PGP SIGNATURE-----

--FsscpQKzF/jJk6ya--