[vox-tech] gzip bug?

Thu, 14 Mar 2002 15:50:19 -0800

Hi Jan,

You did a great research in the gzip question. I also want to blame the
hardware just because it's a 2 months old custom-made box. On the other
hand, nothing else causes problem just gzip. If it is a random problem
then I would expect some 50MB files succeed without error.

How do you "check memory"? Would you recommend memtest?

Thank you,
Kevin

-----Original Message-----
From: vox-tech-admin@lists.lugod.org
[mailto:vox-tech-admin@lists.lugod.org] On Behalf Of Jan Wynholds
Sent: Thursday, March 14, 2002 1:50 PM
To: vox-tech@lists.lugod.org
Subject: RE: [vox-tech] gzip bug?

Hi Kevin:

Have you checked memory?  Whenever I have problems with something that
_should_
be rock solid (like bzip and gzip), I check memory...  Not to say that
it
couldn't be some other random hardware problem, but memory is what I
have seen
most commonly.  Have any other pieces of hardware changed since you have
seen
this behavior?  

With a RedHat 7.1 system I think I have used bzip and gzip to handle
many
Gigabytes of tape data.  I am doubtful it is your software.

Is there anything else that is giving you problems with this box?  Does
gcc
work correctly?  Will a kernel compiled on that box run correctly?  Do
any
programs halt with Segmentation Fault (sig11)?

I have had problems with RedHat boxen that have memory problems.  Have
you
upgraded memory lately?  I ask only b/c it seems like bzip and gzip
should not
croak on such sized files.  Since you are using RH 7.1, 2 GB file size
limits
shouldn't be your problem.  Your problem is quite weird, b/c gzip is
tested and
retested (to the point of bullet proof), so it is very doubtful that it
is your
software.  My guess it is something with your hardware.  I found alot of
useful
(hardware) testing information from the Sig11 FAQ found at:

http://www.bitwizard.nl/sig11/

Here is some text from that page on (very nearly) your problem:

QUESTION
Is it always signal 11?
ANSWER
Nope. Other signals like four, six and seven also occur occasionally.
Signal 11 is most common though.
As long as memory is getting corrupted, anything can happen. I'd
expect bad binaries to occur much more often than they really
do. Anyway, it seems that the odds are heavily biased towards gcc
getting a signal 11. Also seen:

free_one_pmd: bad directory entry 00000008

EXT2-fs warning (device 08:14): ext_2_free_blocks bit already
     cleared for block 127916

Internal error: bad swap device

Trying to free nonexistent swap-page

kfree of non-kmalloced memory ...

scsi0: REQ before WAIT DISCONNECT IID

Unable to handle kernel NULL pointer dereference at virtual
     address c0000004 

put_page: page already exists 00000046

invalid operand: 0000

Whee.. inode changed from under us. Tell Linus

<<This might be akin to your problem:>>

crc error  --  System halted  (During the uncompress of the Linux
kernel)

Segmentation fault

"unable to resolve symbol" 

make [1]: *** [sub_dirs] Error 139

make: *** [linuxsubdirs] Error 1

The X Window system can terminate with a "caught signal xx"
The first few ones are cases where the kernel "suspects" a
kernel-programming-error that is actually caused by the bad memory.
The last few point to application programs that end up with the
trouble.
-- S.G.de Marinis (trance@interseg.it)
-- Dirk Nachtmann (nachtman@kogs.informatik.uni-hamburg.de)

<<END>>
HTHO,

jan