You are not logged in.

#1 2011-01-21 13:30:12

plexor
Member
Registered: 2011-01-20
Posts: 4

Strange ECC errors

I recently fell in love with Archlinux and reinstalled my Ubuntu machines with Archlinux. When i stress my main workstation the following error show up:

MC0_STATUS: Corrected error, other errors lost: yes, CPU context corrupt: no, CECC Error
Data Cache Error during L1 linefill from L2.
Transaction: data read, Type: data, Cache Level: L2
MC1_STATUS: Corrected error, other errors lost: no, CPU context corrupt: no
Instruction Cache Error: Parity error during data load.
Transaction: inst fetch, Type: instruction, Cache Level: L1
MC2_STATUS: Corrected error, other errors lost: no, CPU context corrupt: no, CECC Error

I tried to enable ECC correction in my BIOS but cannot find that setting there.

My motherboard is a Gigabyte GA-MA790FXT-UDP5 and a AMD Phenom II X4 945 Processor.

Any suggestions on what's wrong ?

Offline

#2 2011-01-21 20:11:01

Octoploid
Member
From: Berlin, Germany
Registered: 2009-10-13
Posts: 64

Re: Strange ECC errors

Your processor corrected several ECC/Parity Errors, because your
L1$ seems to be flaky when the CPU gets hot.
Have you checked your CPU cooling?
If the cooling is OK and these correctable errors are easily repeatable,
then you should contact AMD and RMA the processor.
(I've RMA'ed a similar CPU (X4 955)  a year ago without any problems,
because of L2$ errors that happened about once every day)

Last edited by Octoploid (2011-01-21 20:12:02)

Offline

#3 2011-01-22 12:50:08

plexor
Member
Registered: 2011-01-20
Posts: 4

Re: Strange ECC errors

Thanks for a quick answer Octoploid.

I checked the temperature of the processor with lm_sensors and it steady around 40 degrees celsius.

These error started to appear after i installed cpufreq and enabled 'ondemand'. I'm going to disable cpufreq to see if it helps.

Offline

#4 2011-02-02 21:50:08

pataphysician
Member
Registered: 2010-09-04
Posts: 46

Re: Strange ECC errors

I am having the same errors only after going to the 2.6.37 kernel from the 2.6.34 kernel.

I  noticed this thread on the sidux forums
http://aptosid.com/index.php?name=PNphp … opic&t=381

Where the same thing happened to a user when upgrading to 2.6.36

I have reverted to 2.6.34 for a couple days, whith no errors anymore, as opposed to fairly constant under 2.6.37.

This is on my AMD machine, my intel notebook runs 2.6.37 fine, so I think it only happens to specific hardware.

So try out 2.6.34 or 35 and see if you still get these errors before decideing it's bad hardware.

Offline

#5 2011-02-02 22:12:45

plexor
Member
Registered: 2011-01-20
Posts: 4

Re: Strange ECC errors

Hello pataphysician and thank you for your answer. In my case i think my processor is faulty in some way. My machine has two OS installed (Archlinux and Windows 7). My Windows 7 installation indicates a problem as well. Same error as dmesg in my Archlinux installation. I bought a AMD Phenom X6 to another machine and i'm thinking about buying one to this one too.

/Plexor

Offline

#6 2011-02-02 22:25:34

pataphysician
Member
Registered: 2010-09-04
Posts: 46

Re: Strange ECC errors

Thanks for the response, plexor

I saw your post when looking for info on this error, so thought I would respond, after mine seems to be not bad hardware related. In my case I get the message fairly constantly even at idle, where as you seemed to get it only at high load, which probably indicates it's more likely to be bad hardware.

Though I would like an excuse to replace with X6 wink

Offline

#7 2011-02-02 22:42:11

plexor
Member
Registered: 2011-01-20
Posts: 4

Re: Strange ECC errors

You're right about that one. The error show up regardless of heavy load or now. So far i don't notice any performance degradations ?

Offline

#8 2011-02-03 18:31:46

pataphysician
Member
Registered: 2010-09-04
Posts: 46

Re: Strange ECC errors

Ok I'm really laughing now, as mine is a hardware problem. I just got a hardware error on 2.6.34.

It seems I inadvertently unlocked a second core on my Sempron processor, which I didn't even know it had, though it makes sense as it's probably cheaper for them to use dual cores. My motherboard has some kind of core unlocker function that's activated by hitting Alt-F2 when it initially booting up. I have four machine hooked up directly to my monitor which has loads of inputs, and then I have a usb port sharing to switch keyboard and mouse, it works better and is way cheaper than DVI KVM. Unfortunately I often switch video, then get distracted and forget to switch keyboard, and I usually switch when I'm rebooting, and my gnome-do hotkey is set Alt-F2, because I was so used to using the Gnome Run Application hotkey. So it seems hilarity ensued and this core must have been unlocked around the same time I switched to 2.6.37. I noticed the second core when I did a burnK7 test, and top only showed 50%.

So after putting it back to one core, all is fine.

As far as performance degradation, as long as the error says
"Corrected error, other errors lost: no, CPU context corrupt: no"
it shouldn't effect anything, though it might just be matter of time before it is unable to correct the error.

Offline

#9 2011-12-03 14:34:25

Behemot
Member
Registered: 2010-12-10
Posts: 96

Re: Strange ECC errors

How to disable the notifications? I am not interested in this, the system is 100% stable.

Offline

Board footer

Powered by FluxBB