You are not logged in.
Pages: 1
Since sunday, my system freezes every few minutes to every few hours and much worse today, freezing after only a few seconds to a few minutes
The immediate symptoms are the same as in https://bbs.archlinux.org/viewtopic.php?id=236686
Error Description:
System suddenly freezes, does not react to any input, not even REISUB keys, only hard reset will do anything
Display does not update anymore. no black screen or broken image, but just the last frame
If music was playing it repeats the last 1 or 2 seconds
journalctl does not show anything useful, often having no entries at all for the last 10 to 15 minutes before a freeze
a journal from before setting max_cstate=1 https://pastebin.com/5EDCZN3t
a journal from after setting max_cstate=1 https://pastebin.com/8kHZ23yM
Monitoring tools show no RAM or CPU outage
Possible causes:
originally i thought it was related to some application i was using but that cant be right because now im getting freezes during boot
it started after updating my kernel so i thought it could be something to do with that but i tried lts and had the same problem so now im not sure
the fastest freeze ive seen is right after loading initramfs so im thinking its related to that
System specs:
CPU: AMD Ryzen 9 5900X
[EDIT]: this is not an APU
GPU: AMD Radeon RX 6800XT
RAM: 32GB Corsair Vengence (2x16)
[EDIT]: memtest showed no errors, ram is not overclocked
Motherboard: MSI B550 Unify-X
Boot drive: 500GB M.2 NVME SSD
i do not have any disk encryption
Kernel: 6.8.7-arch1-1
so far ive tried changing to the regular kernel, changing to lts kernel, ive updated amd-ucode, ive tried enabling tlp, i tried making a fresh install of arch, i have tried regenerating initramfs, i have set max_cstate=1, i tried each ram stick individually as well as in different dimms, i tried my old graphics card (Radeon RX570)
is there anything else i could try?
[EDIT]: since posting this i have had 3 boots where it has gone more than a few minutes (a few hours) without freezing, i have no idea why that happened or how to reproduce
[EDIT 2]: after setting max_ctate=1 it has regressed to freezing after a few minutes to a few hours
[EDIT 3]: a stacktrace said "nmi watchdog detected hard lockup on cpu 0"
Last edited by decawas (2024-04-22 16:16:54)
Offline
Sounds like a kernel halt, last resort: https://wiki.archlinux.org/title/Kdump
often having no entries at all for the last 10 to 15 minutes before a freeze
That's expectable w/ a hard reboot because the journal cannot be synced to disk anymore.
To be sure: "not even REISUB keys", you've actively enabled that before (by default this won't work, https://wiki.archlinux.org/title/Keyboa … el_(SysRq) )
Did you try the behavior w/ an entirely different sw stack, eg. a live distro like grml or knoppix (what will also take the nvme out of the equation)?
memtest86+ (is the RAM overclocked?)
This is a hybrid system, right? There's an APU in
https://wiki.archlinux.org/title/Ryzen# … nd_suspend ?
Online
i did not know that REISUB keys were not enabled by default thats something i had somehow missed, thank you for pointing that out. i am trying to enable that but having to hard reset every few minutes is making that difficult
i have tried the arch iso, both for reinstalling and for editing system files of my existing system to see if i could fix it while avoiding the constant freezes, the iso started to freeze too tho.
the arch iso is on a usb so i have already taken the nvme out of the equation to no effect, its worth noting that the new arch installation i made was onto a different drive (also nvme) to the one i was using before.
i have not tried something with a different sw stack, i will try to but i dont think ill get it to stay on long enough to even download an iso for that
i have done a memtest with no errors, i forgot to include that in my post, i will update the description to include that.
my ram is not overclocked
this is not a hybrid system,
Offline
You can activate the sysrq w/ a kernel parameter.
Meaningful memtest86+ runs are measured in days, you'd at least run it for 16h
Did you see the wiki on known ryzen issues?
First and foremost try "processor.max_cstate=1"
this is not a hybrid system
I actually stopped mid-sentence, googled the CPU as I figured I could answer that myself, figured it doesn't have an APU … and forgot to clear the half-sentence
Online
First and foremost try "processor.max_cstate=1"
i have added this and rebooted, so far its been up for 21m with no freeze which is a good sign
Did you see the wiki on known ryzen issues?
i have not, i will look into that if the cstate thing doesnt work out
Meaningful memtest86+ runs are measured in days
the memtest i ran was only a few hours, if i get another freeze with the cstate thing then i will run a longer one
Offline
so after adding processor.cstate_max=1 it is still freezing but now it seems to only freeze whenever i begin to use applications like if i play a video or music or something rather than just freezing no matter what. its a step in the right direction and now i can provide journal logs which i will add a link to in the post description in the morning
i also enabled REISUB keys and tried that, it had no effect
Offline
Limiting the cstate prevents the CPU from powering down.
You could test the behavior w/ processor.cstate_max=5, but the account is contradicted by "i begin to use applications like if i play a video or music or something"
However, the latter isn't a thing, you're probably running some DE what means you're using A LOT of "applications" at any given time.
Also
since posting this i have had 3 boots where it has gone more than a few minutes (a few hours) without freezing, i have no idea why that happened
so this might be entirely unrelated to the cstate?
"i play a video or music"
How exactly?
Does "aplay /path/to/some/music.wav" (wav being an uncompressed PCM stream and the least effort for any CPU) cause this, or is "music" actually music.youtube.com or spotify etc.?
Online
you're probably running some DE what means you're using A LOT of "applications" at any given time
i am using a DE (KDE Plasma) but i dont think that was effecting much, at least not before, when i said application i meant anything i explicitly open in Plasma (web browser, discord, etc)
before i was getting freezes as soon as loading initial ramdisk, long before any DE can be loaded, to as late as shortly after loading plasma, often before i could open anything else, usually freezing sometime between those two points
now, since adding processor.cstate_max=1, it freezes only when i do anything i would normally do (play music, watch video, use discord, etc), just getting to Plasma and doing nothing else or just using a terminal seems to not yield a freeze
its worth noting that it does not freeze straight away if i play music but it will freeze eventually
"i play a video or music"
How exactly?
by play a video or music i mean through youtube or spotify
so this might be entirely unrelated to the cstate?
i think this is at least somewhat related to the ctate, the 3 boots i had where it didnt freeze within a few minutes were non consecutive and seem to have had no cause, but since setting max_cstate i am consistently getting boots that last far longer than just a few minutes
Offline
If you're watching videos and listen to music in a chromium based browser, that's exactly the same as discord…
Do you run kde-unstable? On X11 or wayland?
Try to watch
mpv 'https://www.youtube.com/watch?v=v2AC41dglnM'
Online
If you're watching videos and listen to music in a chromium based browser
i use a firefox based web browser,
Do you run kde-unstable? On X11 or wayland?
i am using kde stable on X11, i have tried wayland but it was still freezing so i went back to X11
You could test the behavior w/ processor.cstate_max=5
i changed cstate_max to 5 and it went back to freezing in seconds to minutes so i changed it back to 1
i also tried mpv 'https://www.youtube.com/watch?v=v2AC41dglnM' as you said, it hasnt frozen yet but as i said before it doesnt freeze straight away
also i've done other things this session so a freeze now may or may not be related to that
Offline
Try also adding "idle=nomwait pci=nomsi"
Do you have a "Power idle control"-like setting in your UEFI?
Online
Try also adding "idle=nomwait pci=nomsi"
i added these and my system was freezing at loading initial ramdisk every single time, had to boot from arch iso and chroot to revert it
Do you have a "Power idle control"-like setting in your UEFI?
i do not have a Power idle control like setting in my UEFI
Offline
You can edit the kernel commandline in your bootloader, what is highly advisable when fooling around (but will allow you to get out of such no matter what)
Keep "idle=nomwait", drop "pci=nomsi".
Any freezes yet not using a browser-like application? (You can play thunderstruck in a loop w/ "mpv 'https://www.youtube.com/watch?v=v2AC41dglnM' -loop 0" )
Online
I had 4 identical builds on Ryzen, two of them randomly froze. A year later, the third build began to freeze. If at the start there were freezes every few months, after a couple of years the frequency of freezes became several times a day. The 4th build never froze.
hardware was identical, pattern of use too.
I used c-state management, it helped at the start. then no.
Now I have 4 builds on Intel = no freezes
And because I had 4 identical assemblies, I switched components to understand why the freezes were happening. This was a CPU, not memory or MB.
On one build I bought a new Ryzen - the freezes disappeared.
Last edited by sobersu (2024-03-03 22:36:19)
Offline
ive readded idle=nomwait, time between freezes is still on the longer side, but it seems on average shorter than without
i have had a few freezes when not opening anything browserlike
Now I have 4 builds on Intel = no freezes
in the thread i linked in the post description, most of the people who had a similar issue were using intel, even if my issue is something different and switching to intel would fix it, i'd rather not switch because of cost and because of personal preference, i'd rather keep looking for a fix on Ryzen
Offline
in my case the problem was hardware, replacing the processor solved it. The behavior is 100% identical to yours - try replacing the processor, at least for tests. It's impossible to fix
Offline
Hardware is fine but you need CoreCycler + Curve Optimizer to fix it: https://bbs.archlinux.org/viewtopic.php … 7#p2151647
Excuse my poor English.
Offline
so its been nearly 2 months, still havent solved it, there was a week in there where i had no freeze at all but its back now and i have no clue why that happened
ive ruled out the possibility of it being my ram, i have 2 sticks of ram which ive tried one at a time and in different dimms to no avail, so its not the ram nor is it the dimms, i also did a 10 hour memtest and it had no errors so that seems to confirm it
ive also tried using my old graphics card which didnt fix it either so i know its not the gpu
the only thing i have left to try hardware side, assuming the issue is hardware related, is using my old cpu
i have not tried CoreCycler + CurveOptimizer yet but ill give it a go
i should also mention i had a call race where it said "nmi watchdog detected hard lockup on cpu 0" if that helps at all, call trace also confirms its a kernel panic
Last edited by decawas (2024-04-23 14:51:49)
Offline
Pages: 1