[vox-tech] HOWTO [NOT] remove lilo...

vox-tech@lists.lugod.org vox-tech@lists.lugod.org
Thu, 23 Jan 2003 03:06:13 -0500


[...I'm leaving names out of this...]

At the last installfest we had one helper run the following command:
  dd if=/dev/hda2 of=/dev/hda bs=1 count=512

  The stated objective was to uninstall Lilo from the MBR (because both the 
Linux installs failed to get the video card working in X).  I get the 
impression that another helper glanced at the command and thought 
that looked correct.
  
  Unfortunately that is not a valid way to remove lilo from the MBR
because two things share the first 512 bytes of a hard drive:
  - The Master Boot Record.
  - The Partition Table.

  So that command replaced the MBR with something that was not Lilo
and wiped out the partition table.


For those with short attention spans, one of the correct ways to 
uninstall lilo is:
  /sbin/lilo {-u|-U} - uninstall lilo

If that fails another is to read the lilo Manual which is often stored in:
  /usr/share/doc/lilo/Manual.txt.gz

  If the partition table is trashed, a manual copy of the partition table
or gpart can help to restore it.

  If the you have boot problems on a XP filesystem which otherwise appears
to be perfectly fine with other programs, attempt to fiddle with it 
using a commercial partition resizer, that fixed the booting problem in
our case.


... now the longer version.

  If /dev/hda2 had contained a valid boot loader, it certainly didn't
contain the correct partition table.  So with a single command the
data required to use any of the 6 partitions on that drive vanished.  
  Over the last few years we have had similar things done a few other
times at other installfests.  Unfortunately in this case things were worse 
than they should have been.  Which is why I am wrote this (howto uninstall
lilo above).

  The most recent Linux install (from a Knoppix CD) had either not
installed lilo or did not save a backup of the boot block in /boot
as is customary.  Also no one had a hand written partition table
layout (from which we could have just rebuilt the partition).

  Fortunately there is a program called 'gpart'...

apt-cache show gpart
===
Description: Guess PC disk partition table, find lost partitions
 Gpart is a tool which tries to guess the primary partition table of a
 PC-type disk in case the primary partition table in sector 0 is
 damaged, incorrect or deleted.
 .
 It is also good at finding and listing the types, locations, and
 sizes of inadvertently-deleted partitions, both primary and logical.
 It gives you the information you need to manually re-create them
 (using fdisk, cfdisk, sfdisk, etc.).
[...]
===

  While gpart did find the 6 partitions... and runs very fast.
The version I was using does not automatically handle restoring more than
the first 4.  One would have to manually reconstruct the partition table to
get beyond that.  In this case /dev/hda1 was a Dell Diagnostic Partition,
/dev/hda2 was Windows XP on NTFS, and /dev/hda3-6 were Linux related /,
/home, swap, and a vfat 'transfer partition'.
  Since the later partitions were useless I just let gpart rebuild the 
first 4.

  The dos, NTFS and Linux / partition were all intact and could be 
read from any Linux rescue CDs.  Unfortunately we either didn't have 
a valid XP boot loader or _something_ was wrong with the XP filesystem
which prevented it from booting.

  There were a number of people who stayed to help or even generated 
rescue disks from home, some may have been killed by their wives when
they got home.  I'm not sure if other people want to admit to being there
for as long as they were... I should ask them and update this document, 
but I want to get this sent and DONE.

  Over the next almost 12 hours a whole bunch of things were tried... 
I'm not sure a complete list of things would be possible but here
are some:
  - fdisk /mbr from a Win 98 rescue floppy.
  - lilo installed on a floppy
  - lilo installed on the hard disk
  - the MBR from another XP machine copied to floppy and /dev/hda
.. eventually a Debian system was installed in the back portion of the
drive and important data was transfer off the NTFS so that if a reinstall
were needed it wouldn't be lost.

  What ended up fixing the boot problem was a commercial partition 
resizing program was told to add a few MB worth of space into the NTFS.
as soon as that was done, XP booted and ran a filesystem check and
found no errors.

  When the machine left the installfest location it was able to triple
boot: the Dell Diag utils, Windows XP, and Debian Linux (with a working
copy X11 from the 4.2.99.3 prelease binaries).

    There is no good ending,
      Mike

ps:
  Unfortunately I found out the next day that the owner had some unknown
hardware problem when he hooked the machine up at home.  He said the 
screen just stayed black, not even the blue Dell BIOS logo appeared.
I stopped by briefly after the meeting and found the following...

  If power is unplugged from the machine for > 10 seconds and reconnected.
The machine appears to "power up": the fans begin to spin, the hard drive
powers up, the case power LED stays off, the case hard drive LED 
lights up, but there are no beeps, and no activity on the monitor.  If the 
power button is pressed, the machine shuts down and more attempts to press
the power button produce absolutely no effect.

  I heard that this was the behavior when the machine first arrived
at home... and the owner opened up the case to check for loose things
(as it turns out his owners manual says he should).  I asked to open the
case and look around... I observed that one of the memory dimms was no
fully seated and that the two dimms have a different physical appearance
(but appear to be the same size and were installed by Dell), the IDE
cable to the ZIP drive was not fully seated, the IDE cable to the hard
drive was not fully seated, there was a minor bend in the bottom of the
computer case which prevented the case cover from fitting on smoothly.
  After reseating each of the loose connections, and trying both of
the dimms individually there was still no change in machine behavior.
Since I had just spent almost half a day trying to cleaning up someone 
else's oops, I was certain the machine was working at the end of the 
installfest, and I felt there was a hardware problem, I decided not to 
fiddle with it any more.
  I asked the owner to call Dell and report that he could not power up
his machine (be sure to explain the power up behavior because it may
mean something useful to their Techs), he said he would do that in the
morning.
  I hope they are able to pin-point the problem and get his system 
working again without destroying the drive contents.