[vox-tech] Re: crashes -- kernel problems?

Peter Jay Salzman vox-tech@lists.lugod.org
Tue, 6 May 2003 13:28:11 -0700


begin Charles McLaughlin <cmclaughlin@ucdavis.edu> 
> Thanks for the advice!  Below is an update and a few more questions.  ;-)
> 
> > Date: Sun, 4 May 2003 15:41:29 -0700
> > To: vox-tech@lists.lugod.org
> > Subject: Re: [vox-tech] crashes -- kernel problems?
> > From: Peter Jay Salzman <p@dirac.org>
> > Reply-To: vox-tech@lists.lugod.org
> >
> > charles,
> >
> > first thing to do is to learn about the magic sysrq key so you can sync
> > and umount your drives gracefully, in case this really is a kernel
> > related issue.  make sure this option is compiled into your kernel.
> > there was a thread on vox-tech about it about 2 months ago, i think.
> 
> I recompiled my kernel with support for the magic sysrq key.  I'm having
> some problems using it though.  In order for me to press the sysreq key, I
> have to hold down the function (fn) key on my laptop.  The function key
> turns part of my laptop keyboard into a number pad, so when I attempt to
> press alt+sysreq+U to unmount, it translates to alt+syreq+4.
> 
> Is there a way to map the sysreq key to another key?
 
there is for X (but i need to read the man page everytime i use it).

i'm sure there is for the console too, but i don't know offhand.  you
can try a bash alias, but that's totally grasping at straws.  i'm sure
it won't work.  but you can try.  :)


> > second thing to do is to ping or try to ssh into the machine to see if
> > the kernel is really gone or the input system is just hosed (which
> > happens).  at least then you can perform an ordinary shut down.
> 
> I can't ping it or ssh into it when it crashes.

ok, that's a real bona-fide lock up, then.

> > of course, there's always your logs in /var/log, but if it's a kernel
> > panic, the kernel most often goes la-la before being able to pass
> > logging info to klogd.
> 
> I checked kern.log, but didn't see anything about a kernel panic.

not surprised.  klogd is a user process, so the poor thing doesn't have
the chance.

> > there's google groups.  for instance, when i fed "2.4.20 kernel crash
> > network" to google groups, i learned there might be problems with the
> > 3c59x module.
> 
> My laptop has a 3c905 NIC, but it uses the same module as the 3c590.

fancy that!  ;-)

> Should I get an newer or older kernel?  For now, I'm thinking of disabling
> this module to see if that fixes the problem.  I've mostly use a wireless
> conection anyway.  :-)

did you say you had two cards in it?  if so, try removing one of the
cards and see if that helps.  i recall reading, a long time ago, that
linux wasn't happy with automatically configuring two identical cards
(from the viewpoint of the driver) of the same type at boot.  i thought
this changed a long time ago, but i could be mistaken.

you might also look for email addresses in drivers/net/3c590.c and email
the guy about your problem.  he'll almost certainly have a good response
(if he responds at all, that is).

also, i'd say a small post to lkml might be in order after you exhaust
all your fact finding and have already emailed the driver author with no
results.  just make sure you exhaust all your avenues because those
people are really busy.

as for upgrading, that's hard to say.  you have to use 2.5.x since
there's no 2.4.21 to my knowledge.  there might be a
2.4.21-pre-something out.  i'd probably try that first.

either with 2.4.20-pre-something or 2.5.x, do a diff on the driver
source and see if much work has been going on.

as for downgrading, you can also try that.  do the same thing with
diffing the source code.  you might want to use google groups and see
which kernels have been reported with problems with this driver.  i did
a brief scan of subjects, didn't actually read the content.

nicole carlson was asking about a kernel debugger a few weeks ago.
maybe nows the time for a lugod talk about kgdb.   ;-)

> btw  Does it matter if I compile the NIC support into the kernel or as a
> module?  Currently I have it built-in.

not really.  i always build stuff like NIC drivers and sound directly in
the kernel, and if i ever change hardware, i'll just recompile the
kernel.  sometimes the module dependencies aren't too good at all.

my rule of thumb is that if there's functionality that i'd rather die
than be without, i'll compile it into the kernel.  things like a.out
support or minix filesystem support go into modules.  deps on those
things don't change much anyhow.

> > kernel crashes are really hard to diagnose unless you're willing to
> > learn alot of information to send a detailed bug report to lkml, but the
> > bright side is that if the crash is reproducible, someone almost always
> > helps you out.
> >
> > jooc, are you running 2.4.20-pre-something or the officially released
> > 2.4.20?
> 
> I'm using the official 2.4.20.

one more question -- what distro are you using?  as i understand it,
distros like RH, mandrake and suse use the alan cox kernels (the AC
kernels).  i believe debian uses the linus kernels.

AC kernels usually have features that are being tested that linus
doesn't want changed in the stable kernel.  except the entire virtual
memory system.  that's worthy of being changed in the middle of a stable
release.  ;-)


pete



> > oh, just one more thing, and this might be helpful.  try to make your
> > system crash while using a virtual console.  the kernel isn't fond of
> > printing junk into an xterm.  if you can see a message about an "oops",
> > and have a healthy system.map you might know a bit more about your
> > problem.
> >
> > hope something here was useful.
> >
> > good luck and keep us advised with what happens.
> >
> > pete
> >
> >
> > begin Charles McLaughlin <cmclaughlin@ucdavis.edu>
> > > Hi,
> > >
> > > I'm running debian unstable on my laptop.  I've recompiled my kernel
> > > (2.4.20) several times in order to get some of hardware to work.
> > >
> > > The system has completely froze several times.  When this happens, I
> > > can't move the mouse, type, etc.  It seems to happen when I'm doing
> > > network related stuff, like using Mozilla or gftp.
> > >
> > > How should I start to troubleshoot this?
> > >
> > > Thanks!
> > > Charles

-- 
GPG Instructions: http://www.dirac.org/linux/gpg
GPG Fingerprint: B9F1 6CF3 47C4 7CD8 D33E 70A9 A3B9 1945 67EA 951D