You are not logged in.
My system will occasionally crash under unclear conditions when running games under Wine/Proton. It freezes up completely (peripherals stay on) and has to be restarted via a motherboard reset. On one occasion, I have left it like that for around 5 minutes to see if it recovers by itself.
The crashes do not leave any trace in the journal, it just stops abruptly before the crash (up to ~30 seconds). Does journald buffer logs before writing them to disk? Is there some other logging method that would save them immediately?
I have ruled out most possible hardware issues (memtest, replaced MB and CPU). If it was a power supply issue, I'd expect it to shut off completely. It could be a GPU fault, but testing that by replacement would be difficult at the moment (and there doesn't seem to be any correlation with GPU load).
Kernel 6.10.9
KDE on Wayland 1.23.1
wine-ge-custom 1:GE.Proton8.26
AMD Navi 31 GPU
mesa 1:24.2.2
dxvk 2.4
Last edited by 43615 (2024-09-27 23:39:55)
Offline
How did you "rule out most hardware issues"?
https://wiki.archlinux.org/title/Ryzen#Random_reboots
But there's still pending https://bbs.archlinux.org/viewtopic.php … 8#p2195348 and https://bbs.archlinux.org/viewtopic.php … 7#p2191607 which had multiple confirmations but should only be relevant for notebooks.
Online
I had an Intel system and switched to AMD (initially suspecting the publicized oxidation defect in Raptor Lake chips). The freezing behavior is exactly the same and seems to happen in similar situations.
That wiki page section talks about reboots that leave log entries, which isn't what I'm having. This section sounds more like it, I'll try the suggestions there.
Offline
The undervoltage can have random causes but if you replaced the entire hardware, this isn't even very likely.
Let's see how "frozen" the system actually is, can you reboot it w/ the https://wiki.archlinux.org/title/Keyboa … el_(SysRq)
(nb that you'll have to explicitly enable the feature *before* the crash)
Online
I have been using it as normal with the kernel parameter rcu_nocbs=0-23 (CPU is an R9 7900X), with no crashes so far. I'll need more time to get a solid result.
I've also added sysrq_always_enabled=1 and will try REISUB if/when it happens again.
Offline
After nearly 2 weeks of use, I'm confident enough to declare this kernel parameter as a solution.
Offline