[vox-tech] ECC memory --- is it worth it? (semi-OT)

Bill Broadley bill at cse.ucdavis.edu
Tue Apr 10 17:01:11 PDT 2007


hajhouse wrote:
> Linux wotan 2.6.17-10-generic #2 SMP Tue Dec 5 22:28:26 UTC 2006 i686 GNU/Linux
> 
> Try 'modprobe ecc'.

My research found:
* Bluesmoke is now EDAC
* The ecc.ko is part of the EDAC project
* EDAC has been somewhat intel centric in the past
* Main line kernels have EDAC and support intel chipsets
* 2.6.17-10-generic does not support opteorn
* The devel tree on sourceforge has opteron support
* Mcelog is the more AMD centric way to do it
* Mcelog seems reasonably popular (redhat and ubuntu anyways)
* Mcelog seems to support numerous events, not just dimm related ecc errors

So while getting the ecc module to build would require a new kernel
(2.6.18 or newer) and custom patches from sourceforge mcelog just requires
a small binary to read /dev/mcelog.  I ran it on 180 machines or so and
found one very unhappy node:

CPU 0 1 instruction cache TSC e6a7a079a8a84
ADDR 117b00
  Instruction cache ECC error
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      instruction fetch mem transaction
      memory access, level generic'
STATUS d400400000000853 MCGSTATUS 0
MCE 5
CPU 0 2 bus unit TSC e6a7a079a8ccd
ADDR c500
  L2 cache ECC error
  Bus or cache array error
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d400400000000813 MCGSTATUS 0
MCE 6
CPU 0 4 northbridge TSC e6a7a079a906a
ADDR 3ce5e0
  Northbridge ECC error
  ECC syndrome = 64
       bit32 = err cpu0
       bit46 = corrected ecc error
       bit62 = error overflow (multiple errors)
  bus error 'local node origin, request didn't time out
      generic read mem transaction
      memory access, level generic'
STATUS d432400100000813 MCGSTATUS 0


More information about the vox-tech mailing list