You are not logged in.

#1 2025-03-31 11:03:08

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

New 5950x CPU hardware error

Hello fellow archers.

After having some nightmare time with a bad cpu, have successfully RMA-ed it and got a new (possibly working) sample. It works flawlessly without a single reboot till now (BIOS defaults, C-states on). I use it since 01.2025. Recently got under completely idle state:

mar 31 02:34:51 testowy kernel: [Hardware Error]: Deferred error, no action required.
mar 31 02:34:51 testowy kernel: [Hardware Error]: CPU:1 (19:21:2) MC22_STATUS[-|-|-|-|-|-|Deferred|-|-]: 0x9090909090909090
mar 31 02:34:51 testowy kernel: [Hardware Error]: IPID: 0x0000000000000000
mar 31 02:34:51 testowy kernel: [Hardware Error]: Bank 22 is reserved.
mar 31 02:34:51 testowy kernel: [Hardware Error]: cache level: RESV, tx: INSN

Should I blame a bad AGESA combined with the newest kernel? There is an update of the bios, but this is the first time I have any issue with the new sample. I'm not sure should I update the bios. All help very appereciated.

Offline

#2 2025-03-31 23:45:36

Versa
Member
Registered: 2022-10-14
Posts: 9

Re: New 5950x CPU hardware error

This is likely related to this other known issue with some Ryzen 5000 series CPUs where under Linux, the clock targets are higher and the voltages at those clocks are slightly lower than under windows leading to some samples being unstable out of the box.
This seems to be more common with the 5900x and 5950x (What I own and have had issues like this with), though I have seen a 5600x exhibit these issues. (Owned by someone else but I solved this for them)

https://wiki.archlinux.org/title/Ryzen#Random_reboots

This link above describes what the issue can result in.

I typically go into the Motherboard UEFI config and and enable the Curve Optimizer (I believe its sometimes listed under PBO with advanced configuration? It entirely depends on your motherboard manufacturer for where they put some elements)
and change the one or both of the following:
Voltage: Positive offset between 3 and 5
Clockspeed: Negative offset, -50 MHz (Or if by itself, however much you need until its stable)

Though you can choose to just cut down clockspeed if you would prefer to avoid the extra heat and power of the positive voltage offset.

It may require a little experimenting to find what your particular sample will remain stable under. Its an annoying issue that seems specific to the 5000 series but likely explains your case here.

Last edited by Versa (2025-03-31 23:55:10)

Offline

#3 2025-04-01 01:15:39

qu@rk
Member
Registered: 2021-07-28
Posts: 149

Re: New 5950x CPU hardware error

Versa wrote:

Voltage: Positive offset between 3 and 5

Apparently you need to go 3-5 points higher on your already negative offset. Say if you have -20 applied, go up to -16 or even -15.


@seth
See, I'm not the only one who understood it like that big_smile

Last edited by qu@rk (2025-04-01 01:17:06)

Offline

#4 2025-04-01 01:34:44

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

It has run smoothly for months, it's the first MCE HW  error case on this cpu, I have auto PBO, sometimes it peaks lower, sometimes higher on c6, depends from AGESA "entropy"

Last edited by Al.Piotrowicz (2025-04-01 01:35:08)

Offline

#5 2025-04-01 01:38:14

qu@rk
Member
Registered: 2021-07-28
Posts: 149

Re: New 5950x CPU hardware error

Me too but some recent kernel change undid that smooth running.
Cap your PBO, 142W/90A/140A which are stock values.
Use https://github.com/hattedsquirrel/ryzen_monitor with https://gitlab.com/leogx9r/ryzen_smu (they're in AUR I think), and monitor voltages, try all core loads and notice if any cores have different voltage compared to the rest, bring them in line with per core curve optimizer.

Try and pay attention to failing cores, they pop up in your error message

[Hardware Error]: CPU:1

Numbering starts from 0, and they're the threads not CPU core I think. so CPU 0/1 is core 0, CPU 2/3 is core 1. I think

Last edited by qu@rk (2025-04-01 01:41:24)

Offline

#6 2025-04-01 02:43:43

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

@qu@rk Do you suggest updating my mobo bios to the newest version beforehand? I didn't do that just to make a long term stability test on the pre-newest one, which I updated right after completed my new rig around 7 months  ago. First one 5950x has been successfully RMA'ed.

Offline

#7 2025-04-01 05:38:48

qu@rk
Member
Registered: 2021-07-28
Posts: 149

Re: New 5950x CPU hardware error

Not if it's a beta version. If it's not a beta version you can try it. I guess you can revert to older BIOS, don't think that's restricted, I managed to downgrade from a beta version.

Offline

#8 2025-04-01 09:20:45

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 700

Re: New 5950x CPU hardware error

qu@rk wrote:

Numbering starts from 0, and they're the threads not CPU core I think. so CPU 0/1 is core 0, CPU 2/3 is core 1. I think

Wrong.

https://bbs.archlinux.org/viewtopic.php … 5#p2228725


Excuse my poor English.

Offline

#9 2025-04-01 09:53:52

qu@rk
Member
Registered: 2021-07-28
Posts: 149

Re: New 5950x CPU hardware error

agapito wrote:
qu@rk wrote:

Numbering starts from 0, and they're the threads not CPU core I think. so CPU 0/1 is core 0, CPU 2/3 is core 1. I think

Wrong.

https://bbs.archlinux.org/viewtopic.php … 5#p2228725

I did say "I think". That explains me failing to properly adjust curve optimizer. Have to go through it again. Thanks for the list!

Offline

#10 2025-04-02 11:03:32

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 14,364

Re: New 5950x CPU hardware error

Moderator Note :

The discussion between agapito & quark is not helping this thread.
I have split off posts #10 and later to TNG .

Last edited by Lone_Wolf (2025-04-02 11:07:33)


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#11 2025-04-02 11:10:42

qu@rk
Member
Registered: 2021-07-28
Posts: 149

Re: New 5950x CPU hardware error

Wiki contains a numbering list for Core/CPU. Is that valid? Core0=CPU:0/CPU:1 ?

Offline

#12 2025-04-02 13:09:26

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 700

Re: New 5950x CPU hardware error

Lone_Wolf wrote:

Moderator Note :

The discussion between agapito & quark is not helping this thread.
I have split off posts #10 and later to TNG .

Removing posts from someone who has many hours of experience stabilizing different Ryzen processors is very helpful for this post. Let a newbie keep spreading false information, it's more useful.

I have nothing more to say. Good luck finding the kernel bug that causes the reboots or trying to guess which core each CPU corresponds to.


Excuse my poor English.

Offline

#13 2025-04-02 13:21:24

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

agapito wrote:
Lone_Wolf wrote:

Moderator Note :

The discussion between agapito & quark is not helping this thread.
I have split off posts #10 and later to TNG .

Removing posts from someone who has many hours of experience stabilizing different Ryzen processors is very helpful for this post. Let a newbie keep spreading false information, it's more useful.

I have nothing more to say. Good luck finding the kernel bug that causes the reboots or trying to guess which core each CPU corresponds to.

Just to be specific - I didn't have a single reboot since the mount of new CPU after RMA. The quoted hardware MCE error was the first and only single occurrence. It didn't cause reboot.

Last edited by Al.Piotrowicz (2025-04-02 13:22:33)

Offline

#14 2025-04-02 14:39:39

agapito
Member
From: Who cares.
Registered: 2008-11-13
Posts: 700

Re: New 5950x CPU hardware error

Al.Piotrowicz wrote:

Just to be specific - I didn't have a single reboot since the mount of new CPU after RMA. The quoted hardware MCE error was the first and only single occurrence. It didn't cause reboot.

I did not want to, nor do I want to continue participating in this post, I just want to clarify that I was not addressing you. I was talking about the guy who wrote this:

blablabla wrote:

At some point the hardware crashes started, with some kernel update

Ryzen processor reboots are not due to any kernel bug. If so, why don't I have them too? ANSWER: because I have bothered to test my CPU for weeks in CoreCycler.


In one of the messages I wrote in this post and that has been deleted you can read what is probably happening to you:

agapito wrote:

I repeat it again: a core/thread/CPU that stops restarting the computer because you added a couple of points on the curve does not mean that the core/thread/CPU is working properly.

So clearly the message in your initial post indicates that something is not right. Probably that core has enough voltage to not reboot the PC, but generates errors because it needs more voltage. But be careful, because these messages do not mean 100% that the core is lacking voltage, they also appear when the Infinity Fabric voltage is insufficient under stress, although they usually occur in Core 0. MC_STATUS errors may even appear due to unstable memory.


Excuse my poor English.

Offline

#15 2025-04-02 14:42:22

qu@rk
Member
Registered: 2021-07-28
Posts: 149

Re: New 5950x CPU hardware error

Al.Piotrowicz wrote:
agapito wrote:
Lone_Wolf wrote:

Moderator Note :

The discussion between agapito & quark is not helping this thread.
I have split off posts #10 and later to TNG .

Removing posts from someone who has many hours of experience stabilizing different Ryzen processors is very helpful for this post. Let a newbie keep spreading false information, it's more useful.

I have nothing more to say. Good luck finding the kernel bug that causes the reboots or trying to guess which core each CPU corresponds to.

Just to be specific - I didn't have a single reboot since the mount of new CPU after RMA. The quoted hardware MCE error was the first and only single occurrence. It didn't cause reboot.

See https://bbs.archlinux.org/viewtopic.php … 8#p2234648 Might be related to C6 state, even with no hard crash.



agapito wrote:
Lone_Wolf wrote:

Moderator Note :

The discussion between agapito & quark is not helping this thread.
I have split off posts #10 and later to TNG .

Removing posts from someone who has many hours of experience stabilizing different Ryzen processors is very helpful for this post. Let a newbie keep spreading false information, it's more useful.

I have nothing more to say. Good luck finding the kernel bug that causes the reboots or trying to guess which core each CPU corresponds to.

I have spread no information actually, I just tested yours and seemed to not solve my issue.
Your information actually goes against the Wiki:

https://wiki.archlinux.org/title/Stress … ing_errors

Last edited by qu@rk (2025-04-02 14:43:17)

Offline

#16 2025-04-03 08:29:38

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

qu@rk wrote:

See https://bbs.archlinux.org/viewtopic.php … 8#p2234648 Might be related to C6 state, even with no hard crash.

Basing on the above it's certain that the recent kernel changes are responsible for the problem. I have had no issues on <6.13.8.

Offline

#17 2025-06-23 07:16:36

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

Recently got under medstress (youtube 4k60fps playback) [6.12.34-1-lts]

 [Hardware Error]: Uncorrected, software containable error.
 [Hardware Error]: CPU:22 (19:21:2) MC1_STATUS[-|UE|MiscV|AddrV|-|TCC|-|-|Poison|-]: 0xbc800800060c0859
 [Hardware Error]: Error Addr: 0x00000001a2053600
 [Hardware Error]: IPID: 0x000100b000000000
 [Hardware Error]: Instruction Fetch Unit Ext. Error Code: 12
 [Hardware Error]: cache level: L1, mem/io: IO, mem-tx: IRD, part-proc: SRC (no timeout)

Immediate system reboot has been inflicted. Sounds new RMA in the nearby future. Pretty sure it has nothing in common with c6 states. I'll try to combine with power peaks PBO curve.

Offline

#18 2025-06-26 08:28:41

Oddwierdo
Member
Registered: 2023-07-29
Posts: 40

Re: New 5950x CPU hardware error

*Edit*
nvm misinterpreted the error.

Last edited by Oddwierdo (2025-06-26 08:33:52)

Offline

#19 2025-06-27 14:49:46

LuxFerre
Member
Registered: 2010-03-01
Posts: 109

Re: New 5950x CPU hardware error

Al.Piotrowicz wrote:

@qu@rk Do you suggest updating my mobo bios to the newest version beforehand? I didn't do that just to make a long term stability test on the pre-newest one, which I updated right after completed my new rig around 7 months  ago. First one 5950x has been successfully RMA'ed.

If you are still having issues using the latest BIOS is good practice. Also make sure you have the latest amd-ucode installed.

Offline

#20 2025-09-17 10:48:37

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

Still having an issues with deferred hardware errors followed by instant shutdown. Wonder about the culprit of these. Never observed such a thing. Looking forward for the beta bios.

Offline

#21 2025-10-23 12:33:23

Al.Piotrowicz
Member
Registered: 2017-08-07
Posts: 170

Re: New 5950x CPU hardware error

Basically bumping the thread for the last time. From as I already know the issue comes from undervolted CPU at the idle state. My last error:

oct 23 10:56:11 testowy kernel: [Hardware Error]: Corrected error, no action required.
oct 23 10:56:11 testowy kernel: [Hardware Error]: CPU:1 (19:21:2) MC22_STATUS[Over|CE|-|-|-|-|CECC|-|Poison|-]: 0xc0c748feac1ffee9
oct 23 10:56:11 testowy kernel: [Hardware Error]: IPID: 0x0000000000000000
oct 23 10:56:11 testowy kernel: [Hardware Error]: Bank 22 is reserved.
oct 23 10:56:11 testowy kernel: [Hardware Error]: cache level: L1, tx: GEN

It manifests itself foremost in the idle state. Have the latest bios for my mobo: ROG STRIX B550-F GAMING v3631.

Offline

Board footer

Powered by FluxBB