[vox-tech] Hardware Fault on Mandrake System?

Marc Elliot Hall vox-tech@lists.lugod.org
Sun, 14 Sep 2003 08:53:09 -0700


The situation is this:

I have a Mandrake 9.1 system with the stock kernel running on an EPIA 
mini-ITX mainboard with an 800MHz VIA CPU (Ezra) and 512 MB of RAM. 
Other than the motherboard, PSU, and a pair of cooling fans, the only 
things in the box are the mass storage devices:

* /dev/hda	76 GB Western Digital (Mandrake identifies
		this as a Maxtor 6Y080P0 for some reason) with
		an EXT3 filesystem

* /dev/hdc	36 GB Maxtor 6L040J2 with an EXT3 filesystem

* /dev/hdd	36 GB Maxtor 6L040J2 with an EXT3 filesystem

* /dev/scd0	SAMSUNG CD-R/RW SW-240B drive

/dev/hda is a single partition, upon which I have stored my media files 
-- including my vast, *legally acquired* (for the benefit of any RIAA 
spiders), music collection. More on this in a moment.

Mandrake 9.1 correctly identified and setup all the hardware on this box 
(with the exception noted above) when I updated it from Mandrake 8.2. 
(Mandrake 9.0 needed a patch for this CPU, but I didn't like the 
results, so I rolled it back.) I love this system!

This machine is used as my daily desktop system, home network DNS 
server, and backup web server for several domains.

The problem is this:

Going about my daily work (my now seven-month-long job search, 
thankyouverymuch) in a KDE environment, I keep an XMMS session running 
through a number of playlists. Periodically (by which I mean about once 
a week), XMMS will freeze mid-song; the hard disk LED will light up and 
stay on; over a period of thirty seconds one by one, all other programs 
will become non-responsive (including ctrl+alt+Fx to drop to a console). 
If I am fast enough to quit a top session in a running shell and do a 
killall xmms, I can recover and work normally; however, I've only 
managed that a couple of times. Usually, what happens is the entire 
system hangs: no mouse, no keyboard (not even numlock), no remote login. 
I don't know about SysRq... probably should try that, but I never think 
about it at the time.

Upon reboot, the system recognizes an unclean shutdown and asks if I 
want to e2fsck. If I say no, it correctly finds journal entries and 
cleans up the disks. Total boot time: 2 minutes, 30 seconds. If I say 
yes, it proceeds through a file system check and nukes anywhere from 0 
to 2 GB on /dev/hda. *ALWAYS* /dev/hda. Most of the time (but not 
always), these files can be recovered from lost+found. Total boot time: 
anywhere from five to thirty minutes (once, about four months ago, it 
took three hours).

My question is this:

Is it likely that /dev/hda is faulty? Is there a utility I can use to 
check disk integrity/hardware stuff? Should I have broken the disk into 
smaller partitions? I don't think this is heat related, as my disks are 
each spaced a full bay apart (big case, small mobo) and I have two 
(running) case fans in addition to the PSU fan. This problem is annoying 
rather than mission critical at the moment; but I imagine if it 
continues for much longer permanent harm may result. Any thoughts?


-- 
Marc Elliot Hall           www.hallmarc.net/quick_resume.html
P.O. Box 435
Shingle Springs, CA 95682
(530)409-0372 cell
(530)672-8504 home
www.hallmarc.net		                      Hire me!