You are not logged in.
Pages: 1
I have a desktop with the following specs:
Operating System: Arch Linux
KDE Plasma Version: 6.4.5
KDE Frameworks Version: 6.19.0
Qt Version: 6.10.0
Kernel Version: 6.12.51-1-lts (64-bit)
Graphics Platform: Wayland
Processors: 32 × AMD Ryzen 9 7950X 16-Core Processor
Memory: 32 GiB of RAM (30,0 GiB usable)
Graphics Processor: AMD Radeon Graphics
Manufacturer: ASUS
I used to have a discrete GPU in it but am not really a gamer so it was just sitting there wasting energy (Intel ARC) and I had another plan with it so I removed it.
Since then I've just been using the integrated graphics and performance wise it's totally fine.
Except about once a week it just hard resets the whole computer.
Right now I was just halfway into a youtube video and plop, hard reset.
I have disabled any type of CPU or memory overclock, I have experimented with LTS kernels, I have copy pasted some random cmdline arguments from god knows where.
There is nothing in dmesg or syslog or whatever log file I can find.
It's just running fine for days or weeks and then ploof, hard reset.
That makes it incredibly difficult to debug. I could try some random setting and believe it works and then a week later, ploof.
I would not be above having some uart or jtag thing dangling of my bios for a few weeks to figure out what is going on if that is what it takes.
Or should I simply put the discrete GPU back in because integrated amd graphics are just bad or my cpu is just broken?
I am open to any suggestions.
Offline
Except about once a week it just hard resets the whole computer.
"Hard reset" means a spontaneous reboot and the journal of that boot (sudo jurnalctl -b -1) ends abruptly, w/o proper shutdown?
https://wiki.archlinux.org/title/Ryzen#Troubleshooting
Ryzen aside, is there a parallel windows installation (though that doesn't really fit the "only happens on the APU" scenario)
Online
Yea it is a spontaneous reboot without warning at near idle conditions.
It does seem to happen most frequently when watching a video but not exclusively.
sudo jurnalctl -b -1 just shows a bunch of innocuous logs that end in nowhere, no sign of any kernel panics or even significant warnings.
Crucially, there are also no mce errors:
$ sudo journalctl -k | grep -i mce
okt 14 14:23:01 pepijn-arch kernel: MCE: In-kernel MCE decoding enabled.
Those troubleshooting bugs seem to be more CPU related while I have a very strong hunch this is igpu related.
There is no parallel windows installation ever since it started messing with my grub settings.
Do you think it's worth trying the overvolt?
Offline
Do you maybe have a second device you could use to SSH into this one?
Offline
Those troubleshooting bugs seem to be more CPU related while I have a very strong hunch this is igpu related.
A GPU crash would only affect the GPU and not trigger a reboot (cpu, ram, power or thermal causes) and w/ the APU the CPU will be affected by the GPU part and there've actually been reports about exactly that.
Do you think it's worth trying the overvolt?
No, because that's not what the curve optimizer does but yes: feed more power to the cores.
Online
Take a look at this thread:
https://bbs.archlinux.org/viewtopic.php?id=265239
Offline
Are you running your ram at EXPO? Maybe try disabling it.
Offline
Are you running your ram at EXPO? Maybe try disabling it.
As I said, I disabled all CPU and memory overclocks.
Do you maybe have a second device you could use to SSH into this one?
Yes, do you think that keeping dmesg open on my laptop will show any messages before the crash that will not show up in the syslog on next boot?
Take a look at this thread:
https://bbs.archlinux.org/viewtopic.php?id=265239
That seems to be about a freeze rather than a reset, with quite different symptoms, I'm not sure they are related.
I think I will try the bios curve optimization thing if I can figure out the right place to do it...
Offline
Disabling c-states altogether ("processor.max_cstate=1") *might* help (at high costs) because it'll mostly prevent cores from idling, avoiding the troublesome access patterns.
The curve optimizer would be found in your UEFI settings.
Online
Alright, I've set a positive curve optimizer offset of 4, maybe see you in a few weeks.
I mean yea if we're disabling C states I might be better off just putting the GPU back
Offline
Maybe update the bios as well, if you haven't already.
I would also try a memtest even if running at JEDEC speeds. The memory controller is very finicky with brands and timings, and using the integrated graphics might be just what pushes it over the edge (it's all inside the processor). Anyway I hope the little bit of extra juice from curve optimizer fixes it.
Offline
If the frequency lowers but the problem occasionally returns, increase the offset (the values in the wiki are all anecdotal estimates by necessity.
Online
theDOC wrote:Take a look at this thread:
https://bbs.archlinux.org/viewtopic.php?id=265239That seems to be about a freeze rather than a reset, with quite different symptoms, I'm not sure they are related.
I disabled C-States via BIOS setting long ago and recently made an update. After that, the settings got reset and my machine started rebooting randomly when idle. I almost forgot about the setting and as soon as I changed it back, my machine was stable again, no reboot since.
Offline
Do you have control over the curve optimizer?
limiting/disabling the c-states will have a similar effect on the underlying problem (the cores won't cycle at all and also run hotter, constraining PBO)
Online
Pages: 1