[vox-tech] In Denial About These Hard Drive Problems

Rick Moen vox-tech@lists.lugod.org
Sat, 22 Jun 2002 15:21:57 -0700


Quoting msimons@moria.simons-clan.com (msimons@moria.simons-clan.com):

> IBM Deskstar 60GXP, 40 Gig, 7200rpm. [...] I recommend that anyone
> with a IBM drive models 60, 75, or 120 GXP, buy a replacement
> drive....

Yeah, that particular series has a bad reputation.  _Generally_ however,
the IBM Deskstar and Ultrastar series have been very good.

>   Except for the 40 or so bad blocks all of the data on his drives have
> been extracted and transfered to one of the replacement drives he 
> purchased.  

Typically, the only thing you care about is data files and maybe
dotfiles & configuration files.  Accordingly, you can ignore and blow 
away everything else, including all program files.  (You _did_ keep
master tarballs of software installed locally in /usr/local/src, right?  
Those you'd want to preserve.)

>   There appears to have been some minor file loses on the Redhat system. 
> In particular gpanel has lost it's default config file that controls
> the lower panel.  rpmverify appears to find about 100 discrepancies
> on the filesystem....

But those would be classics in the "I don't care" department, right?  
I mean, you're going to reinstall all program files onto a replacement
drive, and those will come from installation media.

>   Unfortunately I don't think this recovery process is very economical.
> It took about 12 hours from start to finish.  Even if nothing had gone 
> wrong it would have taken at least 6 hours or so to recover 40 Gigs of 
> data.

Well, but I'd expect that you only need _care_ about a tiny fraction of
those files.

>   The distribution was Redhat 7.2, ext3 filesystems which were configured
> to *never* do a file system check.  Both mount count and day count based 
> checks were both disabled. 

Boy, _that's_ an eye-opener.

However, closer attention to patterns of errors in /var/log/messages
would have caught the failure pattern sooner.  I don't know if logcheck
watches for those, but I'll bet it could be made to do so if it doesn't
by default.

> The hard drive was making slight noises when error messages appeared.

Lesson, there:  Your ears are a vital system-diagnostic tool.  Pay
attention to the sounds coming from your system, folks.

>   The initial plan was to use dd_rescue(2) to pull off all the partitions
> off the failing IBM drive...

You know, on a quiescent system (single-user), just "cp -ax" on modest
sized directory trees, one at a time, will more than suffice.  No need
to muck about with cpio, tar, gzip, bzip2, etc.

>   The initial DMA problems with this new disk seemed to appear when the 
> CD drive was spinning up... there were two main power lines from the 
> power supply, as I write this I wonder if the CD was connected to
> the same power connector as that new dead drive.

You know, a _lot_ of "hard drive" problems in my experience turn out to
be caused by flaky, weak, and/or overstressed power suplies.  That's why 
I always use PC Power & Cooling power supplies in my system, if I can.
I literally yank out and put in a corner the cheapo Taiwanese unit
provided with the case, and put in the PC Power & Cooling one.  Worth
the $100.  Among other things, it can save your nice expensive hard
drive:  When power supplies fail, about half the time they take hard
drives with them.

Some OEMs like the "Sparkle" brand.  I have no comment on the merits of
those.

>   First I tried the LinuxCare Bootable Business CD, 1.2 (which is very
> old at this point). 

The old Linuxcare project has morphed into the Linuxcare Bootable
Toolkit, v. 2.0.  I have a bunch of them, but haven't had occasion to
use them because the LNX-BBC 1.618 disc is so very good that I don't 
need anything else.  But I hear good things about the LBT.

(I'm in the credits on both forks.  Not that I did a lot, but I was 
involved in the early stages.)

> I need to get one of those updated BBC based CDs to carry around.

Well worth the download:  http://www.lnx-bbc.org/download.html
(The NTFS support is still problematic because the Linux kernel
driver is likewise.  I have no experience with using it.)

> I think I don't like LABELs.

I _know_ I don't.  Looks to me like nothing but problems.  I think I 
know what they were trying to do, and it's just a bad call.

-- 
Cheers,
Rick Moen                     Emacs is a decent operating system,
rick@linuxmafia.com           but it still lacks a good text editor.