You are not logged in.
After a recent update, I've encountered a problem with my Arch Linux system suddenly rebooting. About once or twice a day. My system specs are as follows: CPU - AMD Ryzen 7 6800H with Radeon Graphics @ 16x 4.785GHz, GPU - AMD/ATI Rembrandt [Radeon 680M].
Interestingly, the journalctl logs show nothing unusual after the reboot, as if some logs are missing or the crash isn't being logged at all. Initially, I was unsure whether the issue was hardware or software-related. However, after rolling back all packages to the last successful update, the problem disappeared.
Could anyone advise on where to look in this case? How can I determine if the issue is with the kernel or a driver? I haven't found any relevant information so far. Any guidance would be greatly appreciated.
Possibly a duplicate https://bbs.archlinux.org/viewtopic.php?id=298360 The post does not indicate that everything is fine after the rollback
Last edited by rastop123 (2024-09-01 09:56:48)
Offline
Did seth's tip with https://wiki.archlinux.org/title/Ryzen#Random_reboots do anything?
Offline
I can’t regulate the voltage, rolling back helped for now, delving deeper into the topic, as I understood, this is still a bug in the kernel that should be corrected soon
Offline
I think this is still a duplicate https://bbs.archlinux.org/viewtopic.php?id=298360
Offline
This is on 6.10.7?
https://bbs.archlinux.org/viewtopic.php … 1#p2192871
Did you try whether the system re-stabilizes w/ 6.10.2?
https://bbs.archlinux.org/viewtopic.php … 7#p2188907
Offline
I rolled the problem back to 6.8.x, I haven't tried new patches yet, my question is no longer how to fix it, but how to understand hardware or software in such a situation.
Offline
A spontanous reboot is a hardware problem. Always.
In this case what's most likely happening (this is an unfounded theory, it just makes sense but isn't based on anything else) is that the amdgpu kernel module causes an sudden power draw, starving the CPU which results in a reboot.
There's no generic way to tell what exactly causes such behavior.
What we do know is that Ryzen is prone to voltage-related cold reboots, that this one was triggered as kernel regression and that the offending commit could be traced into the amdgpu module.
The particular commit is uncritical w/ different AMD hardware, so is it hardware, software, interface or all of the above?
Offline
Same issue here, after updating to recent kernels (e.g 6.6, 6.11 I get random reboots, nothing in the logs.
AMD 6800H CPU. I've also run into more frequent GPU freezes, and also I'm seeing the key presses not registered issue I haven't seen since 5.3 ?
Offline
Offline
Didn't change any settings, updated to 6.12 and my issues are gone. I feel like it may be related to https://www.phoronix.com/news/Linux-Cle … MSAVE-Zen4 but I'm on Zen 3, so maybe not. I just got another random reboot.
Last edited by Laoceau (2024-11-30 23:34:20)
Offline
Offline
Interestingly, the journalctl logs show nothing unusual after the reboot, as if some logs are missing or the crash isn't being logged at all. Initially, I was unsure whether the issue was hardware or software-related. However, after rolling back all packages to the last successful update, the problem disappeared.
Also, remember turning up kernel log levels might show something interesting. Not sure what `journalctl` defaults to.
Offline
turning up kernel log levels might show something interesting
The loglevel parameter only controls what is printed to the console. The journal sees all.
Jin, Jîyan, Azadî
Offline
You could possibly use it to get the frequency of reboots over the recent past by greping for 'Linux version'.
journalctl -a | grep "Linux version"Kind of silly, though.
Offline
Hi,
Just to share some info, I've had the same symptoms here for some weeks too.
I started to see unexpected reboots and amdgpu trace in journal after upgrade linux 6.11.2.zen1-1 -> 6.11.3.zen1-1 on the 2024-10-17.
On the 2024-11-05 I've switched to linux 6.6.59-1-lts to fix an unrelated issue (no longer able to wake up after using S3 sleep due to nvidia drivers).
While the switch from zen to lts fixed the issue with resume, I still had unexpected reboots and sometimes amdgpu trace when rebooting.
In late 2022 I enabled eco mode to lower TDP to 65 W because I did not need the extra peak performance. Remembered that after seeing advice to tune CPU voltage on forum. Disabled the eco mode and problem seems to be gone (2 weeks running fine so far).
CPU is AMD Ryzen 7 7700X.
If it can help some readers...
Offline
I found 6.7.9 to be stable so far. No reboots. The moment I switch to a recent kernel I get the random reboots back. There's definitely something after 6.8.x.
Offline
https://wiki.archlinux.org/title/Ryzen#Random_reboots
"sudden reboots" are a hardware issue, the kernels might have gone more aggressive at core cycling or so, but the problem is your CPU. Fix that instead of hoping that nothing ever touches it.
Offline
Reporting back. 3 weeks with 6.7.9-zen1-1-zen, 12+ hours each day, no sudden reboots so far.
Offline
That's great but really doesn't change much.
Your CPU isn't supposed to bail because it doesn't like some core access patterns and posting that some dated kernel version didn't trigger this won't get you ahead either.
You could try to bisect the "breaking" commit but even then chances for this to be reverted because it's what randomly triggers you CPU into suiceide are miniscule.
See whether increasing the voltage supply to the CPU stabilizes it (on newer kernels as well)
Offline
I upgraded to a recent kernel and had the same issue, and I've found that undervolting with ryzenadj has prevented the sudden reboots (so far).
Offline
*under*volt??
How exactly?
https://wiki.archlinux.org/title/Ryzen#Random_reboots
Offline
The +4 all core fix seems strange. There's quite a bit performance loss from -28 (as a general example).
Does the +4 bring the voltages (and performance) of say -28 all core on Windows? Replaced RAM on my 5900X and went back to -28 and two days no crash. Hope it stays this way.
edit: for me -28 to +4 is some 7-10% performance hit in Cinebench score.
Last edited by qu@rk (2025-03-20 16:11:02)
Offline
Does the +4 bring the voltages (and performance) of say -28 all core on Windows?
We're gonna need an english version of that ![]()
Increasing the voltage may/will stabilize the CPU but the higher voltage causes higher temperatures allowing for less frequency boosts, what will lower the performance. Yes.
The wiki actually points that out
It will limit overclocking potential due to higher heat dissipation requirements, but it will run stable
However I think the idea would be to move from -28 to -24 (by +4 points, not /to/ +4 points) - hopefully.
And it's pretty much an example, the plan is to incrementally increase the core voltage until the system becomes stable.
But if you can pin the behavior on bad RAM you oc don't have to adjust the CPU voltage at all anyway.
Just keep an eye on it.
Offline
I may have misunderstood the wiki but to me it looked like I had to modify the all core curve optimizer in BIOS to +4. It goes to minimum -30 all core, didn't try on positive side (increasing voltage) apart from +4 max. Since I had the random crashes I moved to +4 in BIOS, and that stopped the crashes.
I did have two types of mismatched RAM though, 3000/CL15 for two sticks and 3200/CL16 for the other two, and ran them at 2966/CL15 I think. I have now replaced all four with two 3600/CL18. And went back to previous setting of -28 on all core curve optimizer in BIOS, which was something that worked when tested in Windows, with the "bad" RAM setup.
Offline
However I think the idea would be to move from -28 to -24 (by +4 points, not /to/ +4 points) - hopefully.
Oh, ok now I get it, should have gone to -24 from -28. Well, going to try that if it crashes again.
Offline