System flooded by error messages on kernel 3.15

AG Caesar · 2014-06-21 16:08:24

I have a AMD Phenom(tm) II X4 20 which basically is a AMD Phenom II x2 550 with two unlocked cored to become an AMD Phenom II x4 955. Everything worked perfectly on all 4 Cores.
The only problem were these error Messages every 5 minutes. But as I noticed no problems I ignored them:

Jun 21 15:31:23 localhost kernel: [Hardware Error]: MC2 Error: : EV error during data copyback.
Jun 21 15:31:23 localhost kernel: [Hardware Error]: Error Status: Corrected error, no action required.
Jun 21 15:31:23 localhost kernel: [Hardware Error]: CPU:0 (10:4:2) MC2_STATUS[Over|CE|-|-|AddrV]: 0xd40000000000017a
Jun 21 15:31:23 localhost kernel: [Hardware Error]: MC2_ADDR: 0x00000000011c2d80
Jun 21 15:31:23 localhost kernel: [Hardware Error]: cache level: L2, tx: GEN, mem-tx: EV
Jun 21 15:36:23 localhost kernel: [Hardware Error]: MC2 Error: : GEN parity/ECC error during data access from L2.
Jun 21 15:36:23 localhost kernel: [Hardware Error]: Error Status: Corrected error, no action required.
Jun 21 15:36:23 localhost kernel: [Hardware Error]: CPU:0 (10:4:2) MC2_STATUS[Over|CE|-|-|AddrV|CECC]: 0xd40041000000010a
Jun 21 15:36:23 localhost kernel: [Hardware Error]: MC2_ADDR: 0x00000002fe700a40
Jun 21 15:36:23 localhost kernel: [Hardware Error]: cache level: L2, tx: GEN, mem-tx: GEN

With the update to Kernel 3.15 these messages began to occur much more often. I got 80000 lines of error logs in 5 minutes, journalctl said "coulden't log message, too may messages in too short time" (or something like that):
The computer kept working but stuff like su or sudo did not work any more, I guess the kernel got flooded with error messages.
The question is: How can I fix that? Is there a way to stop the reporting of those errors? Any other good idea?

x33a · 2014-06-21 16:30:54

It really seems to be a hardware error: https://bugzilla.kernel.org/show_bug.cgi?id=43205

Last edited by x33a (2014-06-21 16:31:45)

AG Caesar · 2014-06-21 16:53:24

Yes, it probably is. But I want to ignore it because my CPU works just fine. The Problem began when the error was printed every second or even more in kerlen 3.15 instead of every 300 seconds in every version before that. I want to decrease the logging frequency or just disable it. I know its not the best solution, but I don't want to buy a new CPU

x33a · 2014-06-21 17:07:05

Playing around with loglevel might help you suppress these messages. But that might have the side-effect of suppressing other useful messages as well.

Also, here's another guy (unless it's you ) with the same cpu as yours and having the same problem. So it might not be a hardware error, maybe just some compatibity issue/regression in the kernel.

AG Caesar · 2014-06-21 19:34:41

No, its not me I found that when I first looked into those messages but as no one had any solution I kept ignoring it which worked well till 3 days ago
He also mentioned, it appears every 5 minutes which would be totally fine. The new 3.15 kernel must have some change in the logging. I used Arch Rollback Machine which worked great, but that is not a permanent solution. Blocking the Kernel from being updated also seems to be a bad idea.

Is here any instruction on how to change loglevels? I found nothing helpful... I already have "quiet" in the kernel parameters but thats not it...

x33a · 2014-06-22 08:46:23

Give loglevel=0 a try. Of course, remove quiet before using this parameter.

Also, you can find the different loglevel values on the following page: https://www.kernel.org/doc/Documentatio … meters.txt

PS: By the way, since when did you start noticing these errors?

Last edited by x33a (2014-06-22 08:48:45)

AG Caesar · 2014-06-22 11:10:40

Thaks I will try that later. I had those errors since I installed Arch 1,5 years ago. There never was any particular problem.

AG Caesar · 2014-06-23 13:03:58

It did not help. I added to /etc/default/grub at GRUB_CMDLINE_LINUX_DEFAULT="loglevel=0" (thats correct?) but errors still appeared and made the system unusable. I noticed shutdown does not work as well. Is this an error I should report to the kernel developers or will they just be annoyed?

x33a · 2014-06-23 14:25:55

You should definitely report this error upstream.

AG Caesar · 2014-07-30 18:22:44

If someone find this with the same error: I reported it at the kernel bugtracker, but they were... not helpful:
https://bugzilla.kernel.org/show_bug.cgi?id=78781
I feel they ignored the real problem saying "buy a new CPU"

x33a · 2014-07-30 18:48:39

Maybe give the linux-lts package a try. If that doesn't work, you might have to try a more conservative distro which runs older kernels.

kokoko3k · 2014-07-30 18:53:45

Probably the timing of the messages has changed because some unrelated kernel code has changed to trig "your" cache bug more often (?) just guessing.
But in that case you cannot ask kernel devs to workaround it, and there would be the possibility that the flood will go away in a next version.

Anyway, if no dev will help you with a fast reply, you will need to bisect the kernel to, at least, pointing the kernel developers to the problematic commit.

https://www.kernel.org/pub/software/scm … k2009.html

mich41 · 2014-07-30 19:32:25

^This.

You may also have some luck improving cooling, increasing Vcore (risk of hardware damage, blah blah blah) or reducing clock speed.

Last edited by mich41 (2014-07-30 19:33:40)

AG Caesar · 2014-07-30 19:46:13

@x33a I will try the lts packages when it becomes necessary, thanks I hope more people will the the bug once 3.15 arives in Ubuntu or something.

@ kokoko3k
You are probably right but my programming skills are close to zero. I will have a look at your link though

Nothgirc · 2015-05-06 20:12:09

Did you ever find a solution to this other than using a 3.14.x kernel? I’m having the same problem with an AMD FX-6300 CPU.
My error message is a little bit different, but the problem is the same:

Mai 06 21:49:22 archpc kernel: [Hardware Error]: MC4 Error (node 0): L3 data cache ECC error.
Mai 06 21:49:22 archpc kernel: [Hardware Error]: Error Status: Corrected error, no action required.
Mai 06 21:49:23 archpc kernel: [Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|-|-|CECC]: 0xdd0144e5001c011b
Mai 06 21:49:23 archpc kernel: [Hardware Error]: MC4_ADDR: 0x00000000a6bf2b84
Mai 06 21:49:23 archpc kernel: [Hardware Error]: cache level: L3/GEN, tx: GEN, mem-tx: RD

This appears every five minutes on a 3.14.x kernel, but way more on a newer kernel so that the computer is not usable any more … my KDE session freezes and the virtual console gets flooded with the above error message.

Can these error messages be suppressed?

AG Caesar · 2015-05-07 05:40:54

Sorry, I never got a solution. I Installed some Ubuntu LTS with Kernel 3.14, gave it to my parents and bought an Intel. Not a good solution... Just out of Interest, which CPU do you have?

Nothgirc · 2015-05-07 12:48:43

Nothgirc wrote:

I’m having the same problem with an AMD FX-6300 CPU.

I’m fine for now with linux-lts since I discovered that there are VirtualBox kernel modules for the LTS kernel as well.

AG Caesar · 2015-05-07 13:25:00

Oh sorry.. it was very early I still think this needs to be fixed in the Kernel... Maybe try opening a bug report there?

asdf1234 · 2016-09-17 21:22:47

i had the same problem in debian (8.5) with a phenom II x2 555, black edition, 3.2Ghz.... which can be unlocked into a "phenom x4 b55".
the errors would start appearing in any open terminal after a couple of minutes, and after a few more minutes they'd turn into a flood of error messages.

i found out after a couple of tests that it was caused by core 3. when i used it as a triple core with cores 1, 2 and 4 it appeared stable. but it scared me out of using more than the default 2 cores anyway, i'm not interested in an unstable system. i overclocked the cores one at a time up to 4ghz (black edition, unlocked multiplier) and none of them would fail ~15mins of mprime stress test (without changing any voltages)... at 4.1ghz the entire system would crash. debian terminal was the only sign of any instability. i intended to test it further, increasing vcore and northbridge voltage, but lost interest in core unlocking after the error flood (normally i underclock).

R00KIE · 2016-09-17 22:19:28

No necro-bumping please. Closing.

Arch Linux

#1 2014-06-21 16:08:24

System flooded by error messages on kernel 3.15

#2 2014-06-21 16:30:54

Re: System flooded by error messages on kernel 3.15

#3 2014-06-21 16:53:24

Re: System flooded by error messages on kernel 3.15

#4 2014-06-21 17:07:05

Re: System flooded by error messages on kernel 3.15

#5 2014-06-21 19:34:41

Re: System flooded by error messages on kernel 3.15

#6 2014-06-22 08:46:23

Re: System flooded by error messages on kernel 3.15

#7 2014-06-22 11:10:40

Re: System flooded by error messages on kernel 3.15

#8 2014-06-23 13:03:58

Re: System flooded by error messages on kernel 3.15

#9 2014-06-23 14:25:55

Re: System flooded by error messages on kernel 3.15

#10 2014-07-30 18:22:44

Re: System flooded by error messages on kernel 3.15

#11 2014-07-30 18:48:39

Re: System flooded by error messages on kernel 3.15

#12 2014-07-30 18:53:45

Re: System flooded by error messages on kernel 3.15

#13 2014-07-30 19:32:25

Re: System flooded by error messages on kernel 3.15

#14 2014-07-30 19:46:13

Re: System flooded by error messages on kernel 3.15

#15 2015-05-06 20:12:09

Re: System flooded by error messages on kernel 3.15

#16 2015-05-07 05:40:54

Re: System flooded by error messages on kernel 3.15

#17 2015-05-07 12:48:43

Re: System flooded by error messages on kernel 3.15

#18 2015-05-07 13:25:00

Re: System flooded by error messages on kernel 3.15

#19 2016-09-17 21:22:47

Re: System flooded by error messages on kernel 3.15

#20 2016-09-17 22:19:28

Re: System flooded by error messages on kernel 3.15

Board footer