You are not logged in.
Hey, owner of a Ryzen 5 5600 & sapphire nitro+ RX 6650xt here.
https://www.techpowerup.com/gpu-specs/s … 0-xt.b9638
For a while now my system has been crashing whenever I was started gaming. System would be stable outside of gaming (or i guess any activity which doesnt involve the GPU heavily), but as soon as i started to load a game, entire system would reboot
Couldnt really find what the culprit is.
I've seen another person posting a similar issue, screen completely freezes, the last played audio loops for a few seconds before the entire system restarts with the following error:
[Hardware Error]: System Fatal error.
Jan 11 07:11:41 pcArch kernel: [Hardware Error]: CPU:4 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000000000108
Jan 11 07:11:41 pcArch kernel: [Hardware Error]: Error Addr: 0x00ffffffc057ef84
Jan 11 07:11:41 pcArch kernel: [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
Jan 11 07:11:41 pcArch kernel: [Hardware Error]: Execution Unit Ext. Error Code: 0
Jan 11 07:11:41 pcArch kernel: [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN
I had not OC'ed any of my components. All of them were running at stock settings.
In order to kinda resolve the issue, I resorted to creating a udev rule to force the "profile_standard" performance profile:
KERNEL=="card1", SUBSYSTEM=="drm", DRIVERS=="amdgpu", ATTR{device/power_dpm_force_performance_level}="profile_standard"
Long story short, while this resolves the issue, doing this has a few disadvantages:
1. my clocks are now always static, even on idle
2. sclk and mclk clocks are somewhat low... 1950Mhz and 675Mhz respectively... doesnt budge from that...
Because of this, i ended up changing the profile to "manual" instead and set the upper limit SCLK clock to 2300Mhz.
However when i ended up changing the memory clock even a little bit from 675Mhz (say 680Mhz)...system starts crashing again whenever i started gaming with the same above MCE error.
I would like to know if anyone else has faced/facing this issue, where adjusting the mem clock just a lil bit is causing isses , and also if anyone has a solution for the above issue..
Linux pcArch 6.12.9-hardened1-1-hardened #1 SMP PREEMPT_DYNAMIC Fri, 10 Jan 2025 19:27:38 +0000 x86_64 GNU/Linux
Much appreciated
Last edited by arthurBellic (2025-01-14 14:49:34)
Offline
I don't have that card but I think you could try arch mainline or lts kernel if it behave the same or not. Also do you have good PSU?
Offline
I encountered a very similar issue not too long ago on kernel 6.12.9 with the combination of a Ryzen 5 3600 and (saphire pulse) Radeon RX 7600 (8GB) model without any overclocking enabled. That particular GPU ended up dying on me so perhaps its hardware a sign of potential hardware failure. I don't know if if helps, but here is a similar journalctl output that I got before the failure:
Jan 05 07:26:39 dt-polonium kernel: mce: [Hardware Error]: Machine check events logged
Jan 05 07:26:39 dt-polonium kernel: [Hardware Error]: System Fatal error.
Jan 05 07:26:39 dt-polonium kernel: [Hardware Error]: CPU:3 (17:71:0) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000000000108
Jan 05 07:26:39 dt-polonium kernel: [Hardware Error]: Error Addr: 0x000072f3e9fc3d64
Jan 05 07:26:39 dt-polonium kernel: [Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
Jan 05 07:26:39 dt-polonium kernel: [Hardware Error]: Execution Unit Ext. Error Code: 0
Jan 05 07:26:39 dt-polonium kernel: [Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GEN
Followed by a log spam of SMU errors 5 days later...
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to get fan speed(PWM)!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:41 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to get fan speed(PWM)!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
Jan 10 09:52:42 dt-polonium kernel: amdgpu 0000:2d:00.0: amdgpu: Failed to export SMU metrics table!
"The only reason for time is so that everything doesn't happen at once." - Albert Einstein
Offline