[vox-tech] failing hard drive...

Bill Broadley bill at cse.ucdavis.edu
Mon Dec 11 22:53:22 PST 2006


I know I'm a bit late, but I figured I'd add a few comments.

Yes, once a drive starts acting weird migrating off it ASAP is
the best course of action.  Preferably in order of your most
important data first.

Smartd and related can help give you an early warning, to do a
long test run the following:

smartctl -d ata /dev/sda -t long

Then after the time it mentions (often 1 to 2 hours) run:
smartctl -d ata /dev/sda -l selftest

Hopefully you will see something like:
Num  Test_Description    Status                  Remaining  LifeTime(hours)
LBA_of_first_error
# 1  Extended offline    Completed without error       00%     11098

-i and -a can give interesting info, especially:
SMART overall-health self-assessment test result: PASSED

Seagate bought maxtor awhile back, I suspect the differences these
days are mostly marketing and branding, not actual hardware differences.

In general I suggest at least considering the enterprise level drives,
better testing, better vibration resistance, they handle heat better,
sometimes have better warranty, etc.  When I was looking at 300GB drives
the price difference for the maxtor maxline (enterprise version)
vs the maxtor diamondmax (consumer drive) was $110 vs $100 or so.

This is doubly so if your building RAIDs out of them, I've heard many
stories about RAIDs having multiple failures, often tracked down to
using consumer drivers or too small a power supply.

Once you migrate off the disk then I'd use the manufacturers diagnostics
to destructively test the entire drive, usually sector remapping only
happens on writes, so if you hear strange noises while accessing
particular drives you likely have sectors failing.

Once you do that I'd run the diagnositics a few more times and if
you continue to get errors pitch it (or RMA it), if not, I'd try
it out for awhile without putting anything too important on it.  I've
had drives last for years after this with zero additional errors.

I'd also watch the reallocated_sector_count, if it keeps increasing that
is a very bad sign.

Best of luck, buy good drives, keep them cool, and of course keep backups
of anything important.  Even the best preventative measures can't predict
the occasional total failure.




More information about the vox-tech mailing list