You are not logged in.
I recently switched GPUs to an rx 7800xt and i have been noticing that only when gaming my system just dies. I trigger an MCE error and in journelctl i have these logs
[Hardware Error]: System Fatal error.
fbcon: Taking over console
[Hardware Error]: CPU:8 (19:21:2) MC5_STATUS[-|UE|MiscV|AddrV|PCC|TCC|SyndV|-|-|-]: 0xbea0000001000108
[Hardware Error]: Error Addr: 0x00007f98f8c86695
[Hardware Error]: IPID: 0x000500b000000000, Syndrome: 0x000000004d000000
Aug 31 00:14:37 Abysal kernel:
[Hardware Error]: Execution Unit Ext. Error Code: 0
[Hardware Error]: cache level: RESV, tx: GEN, mem-tx: GENAll the googling I've done shows that this is a CPU or RAM issue, however this error has only been happening after I got this new GPU. All sites have suggested that tests and stress tests be ran which I have done. My CPU under all core holds at 85c and isn't crashing even after loading it for 40 minutes, my RAM has passed 2 passes of memtester, and my GPU is stable under full load. But whenever I launch sober (roblox on linux) or Haste (through proton) I encounter a green screen and a system reboot. The haste issues were happening from day one, the sober crashing happened after i ran sudo pacman -Syu and yay. My whole system config is, ryzen 7 5700x3d, 96gb of DDR4 (strange I'm aware) at 3200mhz, and an rx 7800xt. Running Kernel 6.16.4-arch1-1 . The CPU has only ever had PBO enabled, the ram has only ever ran at this speed, and the GPU is new so I'm really struggling to see what could be causing this issue because my case seems different compared to others. A link to my whole journelctl log is here: https://pastebin.com/QVmdwdUG
Something to note is this whole message has been written on the same system that this happens on. This CPU is from about 6 months ago so not super new but not old.
Offline
only been happening after I got this new GPU
The CPU has only ever had PBO enabled
https://wiki.archlinux.org/title/Ryzen#Troubleshooting
Do you still have the old GPU?
Leaving aside that the PBO situation is an aggravating condition, the GPU might end up starving the CPU by drawing too much power over the bus.
Offline
Do you still have the old GPU?
Sadly no.
Leaving aside that the PBO situation is an aggravating condition, the GPU might end up starving the CPU by drawing too much power over the bus.
A friend did mention something like this to me, but it seems unlikely since i was able to stress my GPU at 100% load, my CPU at 100% load, and hit 89gb of ram with memtest at the same time for hours and it was all stable.
Some extra information a friend pointed out to me is, this only started getting "bad" when I tried a single GPU passthrough VM without the reset script for AMD GPUs. Before that only Haste would trigger it and it was able to recover a few times. We also discovered that running opengl on Sober stopped it from happening, switching to vulkan again however didn't re-trigger it
Offline
was able to stress my GPU at 100% load, my CPU at 100% load
This typically happens because of core cycling, the problem isn't the load, but the load changes.
Disable PBO and/or ajdust the curve optimizer (if your firmware allows that) - there's also https://archlinux.org/packages/extra/x86_64/corectrl/
Edit: https://bbs.archlinux.org/viewtopic.php … 3#p2154573
Last edited by seth (2025-08-31 12:12:42)
Offline
I have tried changing parameters for my kernel to disallow overclocking and changing pcie bandwidth seemed to have made it stable (multiple hours of gpu under full load with a core cycling test running). I'll update soon to see if the issue has gone away or is just a fluke, if it comes back I'll slightly up the voltage which I'd rather try to avoid
Offline