You are not logged in.

#1 2019-09-14 02:31:50

aamm19
Member
Registered: 2019-05-02
Posts: 5

[SOLVED] kernel: mce: [Hardware Error]

Hello everyone

Lately I have been having a couple of issues with regards of stability on my install. It's been once or twice that when I'm using my PC it reboots by itself, and the last time I noticed the message on the boot sequence that's the subject of this post.

I read about installing the rasdaemon and ras-mc-clt services in the Wiki but even now I still get the same output (maybe they're not configured properly?)

[todovan@strata ~]$ journalctl -xb | grep mce
Sep 13 21:18:29 strata kernel: mce: [Hardware Error]: Machine check events logged
Sep 13 21:18:29 strata kernel: mce: [Hardware Error]: CPU 6: Machine Check: 0 Bank 5: bea0000000000108
Sep 13 21:18:29 strata kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff8f8c294c MISC d012000101000000 SYND 4d000000 IPID 500b000000000 
Sep 13 21:18:29 strata kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1568427500 SOCKET 0 APIC 8 microcode 800110e
Sep 13 21:18:29 strata kernel: mce: Using 23 MCE banks
Sep 13 21:18:32 strata rasdaemon[616]: rasdaemon: mce:mce_record event enabled
Sep 13 21:18:32 strata rasdaemon[615]: rasdaemon: Can't register mce handler
Sep 13 21:18:32 strata rasdaemon[616]: mce:mce_record event enabled
Sep 13 21:18:32 strata rasdaemon[615]: Can't register mce handler
Sep 13 21:18:32 strata rasdaemon[615]: rasdaemon: Recording mce_record events

I read on another thread they asked someone with a similar problem about their processor model so preemptively, Ryzen5 1600

Any thoughts?

Thanks

Last edited by aamm19 (2019-09-27 05:34:12)

Offline

#2 2019-09-14 09:09:58

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,668

Re: [SOLVED] kernel: mce: [Hardware Error]

Make sure your UEFI is up to date and that you are applying AMD microcodes early

Offline

#3 2019-09-15 00:10:40

aamm19
Member
Registered: 2019-05-02
Posts: 5

Re: [SOLVED] kernel: mce: [Hardware Error]

I updated my Motherboard's BIOS, installed the amd-ucode and included it on the bootloader entries like so:

title Arch Linux
linux vmlinuz-linux
initrd /amd-ucode.img
initrd /initramfs-linux.img
options root=PARTUUID=aaf8c3bb-eb99-465a-95a4-5259512638d8 rw

Is that what you meant with "UEFI is up to date"?

I'll keep updating if I run into the issue again.

EDIT:
I found this thread with a same issue.
In the thread hasdf uses the same processor but different components to I guess it means it can be isolated to that. Apparently setting the  processor.max_cstate=[1 to 5] fixes the issue as well as other solutions in this Gentoo Wiki Thread. I'll add them as described in the wiki  and update the result.

Dumb me for not finding that thread before opening a new one.

EDIT II:
Seemed to work well all day yesterday but had a reboot 9 hours ago (according to uptime, I was afk), added rcu_nocbs=0-11 to kernel boot parameters to see if it helps but according to this comment on a kernel.org's bugzilla, you need to make a custom kernel for it to work but I don't know if that's necessary.

I found this reddit post that has a script and helps on disabling c6, going to try that one and see if it works.

EDIT III:
Well, it seems the script on reddit did the trick for disabling c6 states but also kept the parametes on the kernel.

Making as [Solved]

Last edited by aamm19 (2019-09-27 05:33:07)

Offline

Board footer

Powered by FluxBB