[vox-tech] ac97 sound problems

Thu, 25 Apr 2002 13:20:25 -0700

begin msimons@moria.simons-clan.com <msimons@moria.simons-clan.com> 
> On Thu, Apr 25, 2002 at 11:05:43AM -0700, Peter Jay Salzman wrote:
> > begin msimons@moria.simons-clan.com <msimons@moria.simons-clan.com> 
> > >   If you send a "kill -9" and the process does not die instantly, then 
> > > you have a kernel bug... there is no way to "block" or "hide" from 
> > > kill -9.
> > 
> > as you point out, processes in "uninterruptable sleep" can't be killed
> > with SIGKILL.  the process is put to sleep while the kernel waits for
> > some event to happen.  this corresponds to process status "D".
> 
>   It is true that processes in state D don't die instantly, I had not
> considered them.  Even processes in state D should die in a few seconds 
> or _maybe_ a minute for a very slow device... but I can't think of any 
> thing at the moment that is a _valid_ reason to lock a process in 
> uninterruptable sleep forever.

state D processes *can* last forever.  do a google group search on
"uninterruptable sleep" (after you correct my spelling of
uninterruptable which i no doubt got wrong).

but putting aside google groups, *i've* seen state D processes last
for weeks on my system.  they only died after a reboot.

you can certainly imagine a trivial scenario for such a thing --
consider a process which is waiting for a data page coming from swap.
no data page, no scheduling.  it's as simple as that.  and signals won't
help here -- you can't signal a process "hey, i just read your data,
here it is".  what you do is you put the process to sleep until the
event occurs.  if that happens to be an open() or write() for a user
space process, then that process is stuck.

>   Realize that when the kernel exposes a method to become "stuck" forever
> in that state a malicious program can do great evil things to the machine
> by for example sucking up as much memory as possible and any other 
> critical resources it can get, then calling the magic "lock me" method
> and the only way out would be a power cycle.

maybe i'm not understanding you, but it sounds like this is a non-issue.
a process which is in uninterruptable sleep is simply placed on a wait
queue and not scheduled at all

if you want to talk about evil stuff, then ... well sure.  ALL kernel
code is trusted code, starting with a simple call to printk() to code
that modifies system calls.  they ALL present dr. evil with the
opportunity to wreak havoc.   you certainly don't need a state D process
for that!

>   Processes that get wedged in state D can also prevent the filesystems
> from unmounting... 

sure.  but *any* part of the kernel can put itself to sleep.  it's as
simple as passing a macro to a function call.

>   If you think of a few cases that locking the process is valid please
> let me know, I probably overlooked something...

well, i just gave one.  but here's another.  suppose you're some kernel
code and you're waiting for some event to happen, so you put yourself to
sleep.

but suppose the event occurs AFTER you decide to put yourself to sleep
but BEFORE you actually do go to sleep.  poorly written code won't
recover from this type of race condition.

and then we need to debate whether poorly written code is a bug.

> > as you point out, it can be kernel bug.  often a race condition.
> > but it can also be caused by hardware failure.
> 
>   Most any hardware bugs can be worked around in software...

heh.  ok, i'm willing to concede this point, but only because we're
talking about symantics now.  is not supporting a hardware bug
considered a kernel bug?  maybe.  maybe not.

there's nearly an infinite number of things bad hardware can do.  should
the kernel have a work around for all of them?   all possible
conceivable crazy things hardware can do?   this isn't quite as simple
as a switch that tests the value of an integer.

should we call code buggy if it doesn't address all possible
circumstances?  i dunno!

>   The kernel is still alive and functioning, the driver knows
> how much data is queued on the device, it knows what data rate is
> being played and it could easily determine that the device has locked
> up... and reset the device.  Most drivers will reset their devices
> when they detect a timeout or other "shouldn't happen" sort of 
> error condition... if the device doesn't respond to the reset
> report that and return IO error messages to user space.
> 
>   A work around as drastic as blacklisting the buggy chipset is
> acceptable if the authors can't figure out how to dance around
> the problem.

exactly.

> > the chipset in question is known to have issues in both linux and
> > windows.
> 
>   Hrmmm... I agree that I see reports of problems with the "via chipsets"
> even in the kernel documentation directory... 
>   /usr/src/linux/Documentation/sound/VIA-chipset
> saying that there was no word back from VIA but that file is ancient
> ... dated Jan 1999.
> 
>   I had heard via was much more active supporting linux, and
> now that I look further they appear to have step by step directions 
> for enabling their chipsets in Linux...
>   http://www.viaarena.com/?PageID=88
> also public support forums available, possibly looking there would be 
> another good place.

i can't remember if it was VIA or AMD that was responsible for the
soundblaster live fiasco.  but i have a VIA sound chipset in one of my
machines.  it was such a nightmare to get working with OSS that i just
shoved an 8 buck soundcard into the machine, disable the VIA chipset via
BIOS and saved myself hours of frustration.

pete