[vox-tech] fsck, badblocks, defrag, and hd weirdness

Peter Jay Salzman vox-tech@lists.lugod.org
Sat, 8 May 2004 06:30:52 -0700


this is not my month...

my main workstation, satan, has always lived with 5.25" drive bay covers
removed, and back PCI/ISA covers missing.  most of the time, the
computer is left open.  it's been rock solid.  dependable and speedy
considering that my once bleeding edge dual celery 333 is now an aging
rocker.

about 1-2 weeks ago, the sound was getting to me.  the CPU fans are
really loud.  so i purchased an acousticase (got featured on /. a couple
of weeks ago):

http://209.47.233.50/acb/showdetl.cfm?&User_ID=319670&St=3156&St2=59773250&St3=30213400&DS_ID=4&Product_ID=140&DID=8

with the understanding that when i get rich and famous beyond my wildest
dreams, i will buy a zalman case:

http://www.quietpcusa.com/acb/showdetl.cfm?&DID=8&Product_ID=121&CATID=9

anyway, i'm getting off-topic.

so my workstation has been closed for 1-2 weeks.  it's been much quieter
without gaping holes, but it has also been hotter.

last night, i left bittorrent running.  this morning, i woke up and saw
"I/O errors" in all my xterms.  i killed X, and saw "I/O error" in all
my consoles.  i had no prompt.  it just kept going "I/O error" over and
over.  here's what i did:

1. sync'ed, remounted read-only, and rebooted (alt-sysrq-s, u, b).
2. at boot, hdc generated an "end of device" error.
3. edited fstab so hdc doesn't mount at boot, and rebooted.
4. mounted hdc2 by hand, and it just hung there.
5. sync'ed, remounted read-only, and rebooted (alt-sysrq-s, u, b).
6. this time, hdc wasn't even recognized as a valid block device.  it
   was as if the drive wasn't there.  the "oh shit" factor started to
   sink in.
7. i opened the case, and the drive felt quite warm.  not hot, but
   warmer than i ever felt a drive.
8. turned off the system, opened the case and let it cool for 10min.
9. booted.  this time, hdc appeared ok in the startup.
10. fscked it.  fsck modified the filesystem, but that's to be expected
   if the drive failed while bittorrent was running.
11. it appears to be ok.  fsck likes it now.  i can mount/umount it fine.

the drive in question, hdc, has two partitions.  hdc1 is a 128MB swap
device and hdc2 is 80GB ext3 device.


a bunch of questions:

1. it *appears* that the drive overheated.  i've never heard of drives
   overheating.  but then again, i'm a hardware enthusiast.  not guru.
   has anyone heard of non-permanent drive failures due to over heating?

2. if this were your drive, would you replace it?   it's somewhere
   between 4 and 6 years old.  keep in mind this was not a bad block.
   the whole drive seemed to go into la-la land.

3. i've never used badblocks before, but i figure it might be a good
   thing to do.  i did a test run on a smaller partition on a different
   drive.  here's the output:

   lucifer# fsck -c /dev/hdb4
   fsck 1.35 (28-Feb-2004)
   e2fsck 1.35 (28-Feb-2004)
   Checking for bad blocks (read-only test): done
   Pass 1: Checking inodes, blocks, and sizes
   Pass 2: Checking directory structure
   Pass 3: Checking directory connectivity
   Pass 4: Checking reference counts
   Pass 5: Checking group summary information

   /dev/hdb4: ***** FILE SYSTEM WAS MODIFIED *****
   /dev/hdb4: 18157/3112960 files (4.9% non-contiguous), 3130565/6221171 blocks


   it's unsettling that the filesystem was modified, but fsck made no
   mention about what got modified.  does anybody have any ideas about
   what exactly fsck modifies when it doesn't give you a reason?  that's
   really a rotten thing to do.  "dume2fs -b" doesn't report any bad
   blocks.  so what got modified?

4. fsck reported that hdc2 is 30% non-contiguous.  that sounds like a
   lot to me.  i'm not going to do anything until i either regain
   confidence in this drive or replace it, but i was wondering if
   anybody here has any experience using e2defrag.

5. under windows, how is it that programs like scandisk and defrag can
   do their jobs without either umounting or remounting read-only the
   partition in question?  programs like fsck and e2defrag warn of
   "severe filesystem corruption" if you try to do this.

pete

-- 
Make everything as simple as possible, but no simpler.  -- Albert Einstein
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D