[vox-tech] ECC memory --- is it worth it? (semi-OT)

hajhouse hajhouse at houseag.com
Wed Apr 11 07:55:27 PDT 2007


P=E5 2007-04-11, skrev Rick Moen:
> Quoting Bill Broadley (bill at cse.ucdavis.edu):
> > Rick Moen wrote:
> > > A bad bit in memory, if indicative of a physical defect, will quickly
> > > manifest unmistakeably on Linux in the manner I described.  If not th=
us
> > > indicative, (from empirical observation over a long period of time:)
> > > it's extremely unlikely to have detectable long-term consequences.  =

> > =

> > You speculate that it contributes to premature httpd deaths but is
> > undetectable long term?
> =

> I didn't think what I was saying was that difficult to follow, but here
> is what I said, again:  "I'd speculate that some non-zero percentage of
> prematurely deceased httpd instances owed to that...."  I figure that
> possibly (i.e., speculate that) some quite small number of such events
> ultimately owe to uncorrected single-bit memory errors that are not
> associated with actually bad RAM -- but, effectively, it's way down in
> the noise of undiagnosable oddities.  =

> =

> > $10 a dimm requires you to "pay through the node" and the "wealth of
> > midas"?  =

> =

> If you assumed I was endorsing your figure, you assumed wrong.  ;->
> =

> Ironically, the most recent RAM I purchased _was_ ECC, because it was
> a gig for the Intel L440GX+ "Lancewood" motherboard in my old VA Linux
> Systems model 2230 server.  However, let's talk about the HP ProLiant
> 380 I was working on recently:  128 MB ECC Registered is $42 at SA
> Technologies, Inc. (where I would buy such things by preference).
> Without ECC, $32.  ECC thus exacts a 31% premium in that case.  =

> =

> Now, would I pay that premium?  I might, or I might put the money
> somewhere else, where it's more likely to yield significant benefit.
> (In 2007, I'd actually try not to drop cash on a 800MHz PIII that I
> didn't dearly love, but five years ago might have been different.)

Here's my perspective on that. Assuming that one of those uncorrected
single-bit errors turned out to be in the worst possible place (say, a
pointer in the kernel or in postgresql in a journaling memory structure)
that turned out to cause data corruption that caused a day of work to be
lost (i.e., the last good backup was 24 hours old), then:

- assuming a man-hour is worth $50 (that's probably low) =

- assuming that the machine is used by four people (other people's
  servers have more users),

then the problem would cost $1600 to recover from, plus whatever
additional time was required to take the system down to restore the
backup, fsck the filesystem, etc.

That notwithstanding, I agree with Rick about disk failures being an
order of magnitute more likely. I've experienced the pain of a failing
disk more times that I care to remember.

-- =

Henry House
+1 530 753 3361 ext. 13
Please don't send me HTML mail! My mail system frequently rejects it.
The unintelligible text that may follow is a digital signature.
See <http://hajhouse.org/pgp> to find out how to use it.
My OpenPGP key: <http://hajhouse.org/hajhouse.asc>.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.lugod.org/pipermail/vox-tech/attachments/20070411/f9172c=
a8/attachment-0001.pgp


More information about the vox-tech mailing list