[vox-tech] fsck, badblocks, defrag, and hd weirdness
Peter Jay Salzman
vox-tech@lists.lugod.org
Sat, 8 May 2004 06:30:52 -0700
this is not my month...
my main workstation, satan, has always lived with 5.25" drive bay covers
removed, and back PCI/ISA covers missing. most of the time, the
computer is left open. it's been rock solid. dependable and speedy
considering that my once bleeding edge dual celery 333 is now an aging
rocker.
about 1-2 weeks ago, the sound was getting to me. the CPU fans are
really loud. so i purchased an acousticase (got featured on /. a couple
of weeks ago):
http://209.47.233.50/acb/showdetl.cfm?&User_ID=319670&St=3156&St2=59773250&St3=30213400&DS_ID=4&Product_ID=140&DID=8
with the understanding that when i get rich and famous beyond my wildest
dreams, i will buy a zalman case:
http://www.quietpcusa.com/acb/showdetl.cfm?&DID=8&Product_ID=121&CATID=9
anyway, i'm getting off-topic.
so my workstation has been closed for 1-2 weeks. it's been much quieter
without gaping holes, but it has also been hotter.
last night, i left bittorrent running. this morning, i woke up and saw
"I/O errors" in all my xterms. i killed X, and saw "I/O error" in all
my consoles. i had no prompt. it just kept going "I/O error" over and
over. here's what i did:
1. sync'ed, remounted read-only, and rebooted (alt-sysrq-s, u, b).
2. at boot, hdc generated an "end of device" error.
3. edited fstab so hdc doesn't mount at boot, and rebooted.
4. mounted hdc2 by hand, and it just hung there.
5. sync'ed, remounted read-only, and rebooted (alt-sysrq-s, u, b).
6. this time, hdc wasn't even recognized as a valid block device. it
was as if the drive wasn't there. the "oh shit" factor started to
sink in.
7. i opened the case, and the drive felt quite warm. not hot, but
warmer than i ever felt a drive.
8. turned off the system, opened the case and let it cool for 10min.
9. booted. this time, hdc appeared ok in the startup.
10. fscked it. fsck modified the filesystem, but that's to be expected
if the drive failed while bittorrent was running.
11. it appears to be ok. fsck likes it now. i can mount/umount it fine.
the drive in question, hdc, has two partitions. hdc1 is a 128MB swap
device and hdc2 is 80GB ext3 device.
a bunch of questions:
1. it *appears* that the drive overheated. i've never heard of drives
overheating. but then again, i'm a hardware enthusiast. not guru.
has anyone heard of non-permanent drive failures due to over heating?
2. if this were your drive, would you replace it? it's somewhere
between 4 and 6 years old. keep in mind this was not a bad block.
the whole drive seemed to go into la-la land.
3. i've never used badblocks before, but i figure it might be a good
thing to do. i did a test run on a smaller partition on a different
drive. here's the output:
lucifer# fsck -c /dev/hdb4
fsck 1.35 (28-Feb-2004)
e2fsck 1.35 (28-Feb-2004)
Checking for bad blocks (read-only test): done
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/hdb4: ***** FILE SYSTEM WAS MODIFIED *****
/dev/hdb4: 18157/3112960 files (4.9% non-contiguous), 3130565/6221171 blocks
it's unsettling that the filesystem was modified, but fsck made no
mention about what got modified. does anybody have any ideas about
what exactly fsck modifies when it doesn't give you a reason? that's
really a rotten thing to do. "dume2fs -b" doesn't report any bad
blocks. so what got modified?
4. fsck reported that hdc2 is 30% non-contiguous. that sounds like a
lot to me. i'm not going to do anything until i either regain
confidence in this drive or replace it, but i was wondering if
anybody here has any experience using e2defrag.
5. under windows, how is it that programs like scandisk and defrag can
do their jobs without either umounting or remounting read-only the
partition in question? programs like fsck and e2defrag warn of
"severe filesystem corruption" if you try to do this.
pete
--
Make everything as simple as possible, but no simpler. -- Albert Einstein
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D