[vox-tech] fsck, badblocks, defrag, and hd weirdness

Mark K. Kim vox-tech@lists.lugod.org
Sat, 8 May 2004 23:57:46 -0700 (PDT)


On Sat, 8 May 2004, Peter Jay Salzman wrote:

> so my workstation has been closed for 1-2 weeks.  it's been much quieter
> without gaping holes, but it has also been hotter.

Perhaps your old system was noisy because the fans were getting old?  My
system recently started making some noises so I just replaced the
problematic case fan.  All is back to quietness.  BTW, I have an aluminum
case.

> last night, i left bittorrent running.  this morning, i woke up and saw
> "I/O errors" in all my xterms.  i killed X, and saw "I/O error" in all
> my consoles.  i had no prompt.  it just kept going "I/O error" over and
> over.  here's what i did:
>
> 1. sync'ed, remounted read-only, and rebooted (alt-sysrq-s, u, b).
> 2. at boot, hdc generated an "end of device" error.

What do you mean *hdc* generated an EOD error?  When did the error occur?
(before POST, during POST, during Linux kernel loading, or during
startup?)

> 3. edited fstab so hdc doesn't mount at boot, and rebooted.
> 4. mounted hdc2 by hand, and it just hung there.
> 5. sync'ed, remounted read-only, and rebooted (alt-sysrq-s, u, b).
> 6. this time, hdc wasn't even recognized as a valid block device.  it
>    was as if the drive wasn't there.  the "oh shit" factor started to
>    sink in.
> 7. i opened the case, and the drive felt quite warm.  not hot, but
>    warmer than i ever felt a drive.
[snip]

My HP Pavilion case did that (not just warm, but HOT, actually... like
"you can get burned if you leave your finger on it for five seconds" hot,
not just a "wow, this is hot!" hot.)  I wanted to get hard drive fans to
cool the drive down but couldn't find any.  So I got an aluminum case with
hard drive fans built-in.  I got it for about $150 at CompUSA.  The noise
level is quite acceptable, except when one of the fans started making too
much noise about a month ago so I got a replacement fan about a couple
weeks ago.  All works great now.

[snip]
> 1. it *appears* that the drive overheated.  i've never heard of drives
>    overheating.  but then again, i'm a hardware enthusiast.  not guru.
>    has anyone heard of non-permanent drive failures due to over heating?

I've seen computers work after cooling down and not having any further
problems when kept cool.  Don't think I've seen such problems occur due to
hard drive problems.  But I can certainly see how hard drives getting hot
can warp the drive casing and put stress on the platters and prevent data
from being read properly without actually damaging the data on them.  But
then... how important is the data?

> 2. if this were your drive, would you replace it?   it's somewhere
>    between 4 and 6 years old.  keep in mind this was not a bad block.
>    the whole drive seemed to go into la-la land.

Depends on how important the data is.  It's about time to get a new hard
drive anyway so you might as well use the potentially damaged hard drive
on a less critical system for non-critical, amusement-purpose systems.
If you had a RAID it's fit nicely into a RAID system but you don't so...
=P

> 3. i've never used badblocks before, but i figure it might be a good
>    thing to do.  i did a test run on a smaller partition on a different
>    drive.  here's the output:
>
>    lucifer# fsck -c /dev/hdb4
>    fsck 1.35 (28-Feb-2004)
>    e2fsck 1.35 (28-Feb-2004)
>    Checking for bad blocks (read-only test): done
>    Pass 1: Checking inodes, blocks, and sizes
>    Pass 2: Checking directory structure
>    Pass 3: Checking directory connectivity
>    Pass 4: Checking reference counts
>    Pass 5: Checking group summary information
>
>    /dev/hdb4: ***** FILE SYSTEM WAS MODIFIED *****
>    /dev/hdb4: 18157/3112960 files (4.9% non-contiguous), 3130565/6221171 blocks
>
>
>    it's unsettling that the filesystem was modified, but fsck made no
>    mention about what got modified.  does anybody have any ideas about
>    what exactly fsck modifies when it doesn't give you a reason?  that's
>    really a rotten thing to do.  "dume2fs -b" doesn't report any bad
>    blocks.  so what got modified?

Did you unmount the system first?

> 4. fsck reported that hdc2 is 30% non-contiguous.  that sounds like a
>    lot to me.  i'm not going to do anything until i either regain
>    confidence in this drive or replace it, but i was wondering if
>    anybody here has any experience using e2defrag.

Looks fine to me but only because I come from FAT background (100%
non-contiguous isn't unusual... =P)

> 5. under windows, how is it that programs like scandisk and defrag can
>    do their jobs without either umounting or remounting read-only the
>    partition in question?  programs like fsck and e2defrag warn of
>    "severe filesystem corruption" if you try to do this.

Does scandisk run under Windows?  I think it runs under DOS, in
non-multitasking environments only.  I've seen it run a lot before Windows
runs, under DOS, with disk-caching turned off, and no disk writing done to
the FS.  Pretty much like accessing the disk as read-only or in unmounted
state.

Defrag defrags the FS as much as it can, and when it detects that the FS
is modified (a program writes to it, cache is about to be flushed, etc.),
then it starts defragging from the start again.  It's really really
annoying and it takes enormously unnecessarily long time to run.  But
going through an already defragged portions of the HD is faster when it is
read the second time since no clusters are moved (only verified that is
contiguous.)

FAT is a very... interesting FS... =)

Anyway, I highly recommend you get some hard drive fans installed.  After
realizing how hot hard drives get these days (not just an experience from
my HP system, but I also read an article about it not too long after my
experience) I realized hard drive fans are necessities, not just optional
accessories anymore.

-Mark


-- 
Mark K. Kim
AIM: markus kimius
Homepage: http://www.cbreak.org/
Xanga: http://www.xanga.com/vindaci
Friendster: http://www.friendster.com/user.jsp?id=13046
PGP key fingerprint: 7324 BACA 53AD E504 A76E  5167 6822 94F0 F298 5DCE
PGP key available on the homepage