You are not logged in.

#1 2025-10-14 12:31:25

pepijndevos
Member
Registered: 2014-02-02
Posts: 29

Hard reset, how to debug?

I have a desktop with the following specs:

Operating System: Arch Linux
KDE Plasma Version: 6.4.5
KDE Frameworks Version: 6.19.0
Qt Version: 6.10.0
Kernel Version: 6.12.51-1-lts (64-bit)
Graphics Platform: Wayland
Processors: 32 × AMD Ryzen 9 7950X 16-Core Processor
Memory: 32 GiB of RAM (30,0 GiB usable)
Graphics Processor: AMD Radeon Graphics
Manufacturer: ASUS

I used to have a discrete GPU in it but am not really a gamer so it was just sitting there wasting energy (Intel ARC) and I had another plan with it so I removed it.
Since then I've just been using the integrated graphics and performance wise it's totally fine.

Except about once a week it just hard resets the whole computer.

Right now I was just halfway into a youtube video and plop, hard reset.

I have disabled any type of CPU or memory overclock, I have experimented with LTS kernels, I have copy pasted some random cmdline arguments from god knows where.
There is nothing in dmesg or syslog or whatever log file I can find.
It's just running fine for days or weeks and then ploof, hard reset.

That makes it incredibly difficult to debug. I could try some random setting and believe it works and then a week later, ploof.

I would not be above having some uart or jtag thing dangling of my bios for a few weeks to figure out what is going on if that is what it takes.

Or should I simply put the discrete GPU back in because integrated amd graphics are just bad or my cpu is just broken?

I am open to any suggestions.

Offline

#2 2025-10-14 13:22:16

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,058

Re: Hard reset, how to debug?

Except about once a week it just hard resets the whole computer.

"Hard reset" means a spontaneous reboot and the journal of that boot (sudo jurnalctl -b -1) ends abruptly, w/o proper shutdown?

https://wiki.archlinux.org/title/Ryzen#Troubleshooting
Ryzen aside, is there a parallel windows installation (though that doesn't really fit the "only happens on the APU" scenario)

Online

#3 2025-10-14 14:04:28

pepijndevos
Member
Registered: 2014-02-02
Posts: 29

Re: Hard reset, how to debug?

Yea it is a spontaneous reboot without warning at near idle conditions.
It does seem to happen most frequently when watching a video but not exclusively.

sudo jurnalctl -b -1 just shows a bunch of innocuous logs that end in nowhere, no sign of any kernel panics or even significant warnings.

Crucially, there are also no mce errors:
$ sudo journalctl -k | grep -i mce
okt 14 14:23:01 pepijn-arch kernel: MCE: In-kernel MCE decoding enabled.

Those troubleshooting bugs seem to be more CPU related while I have a very strong hunch this is igpu related.

There is no parallel windows installation ever since it started messing with my grub settings.

Do you think it's worth trying the overvolt?

Offline

#4 2025-10-14 14:12:25

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,312
Website

Re: Hard reset, how to debug?

Do you maybe have a second device you could use to SSH into this one?

Offline

#5 2025-10-14 14:36:56

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,058

Re: Hard reset, how to debug?

Those troubleshooting bugs seem to be more CPU related while I have a very strong hunch this is igpu related.

A GPU crash would only affect the GPU and not trigger a reboot (cpu, ram, power or thermal causes) and w/ the APU the CPU will be affected by the GPU part and there've actually been reports about exactly that.

Do you think it's worth trying the overvolt?

No, because that's not what the curve optimizer does but yes: feed more power to the cores.

Online

#6 2025-10-14 15:07:51

theDOC
Member
From: Aachen, Germany
Registered: 2009-06-18
Posts: 52

Re: Hard reset, how to debug?

Offline

#7 2025-10-14 15:58:03

LuxFerre
Member
Registered: 2010-03-01
Posts: 91

Re: Hard reset, how to debug?

Are you running your ram at EXPO? Maybe try disabling it.

Offline

#8 2025-10-14 18:11:59

pepijndevos
Member
Registered: 2014-02-02
Posts: 29

Re: Hard reset, how to debug?

LuxFerre wrote:

Are you running your ram at EXPO? Maybe try disabling it.

As I said, I disabled all CPU and memory overclocks.

gromit wrote:

Do you maybe have a second device you could use to SSH into this one?

Yes, do you think that keeping dmesg open on my laptop will show any messages before the crash that will not show up in the syslog on next boot?

theDOC wrote:

That seems to be about a freeze rather than a reset, with quite different symptoms, I'm not sure they are related.

I think I will try the bios curve optimization thing if I can figure out the right place to do it...

Offline

#9 2025-10-14 18:20:25

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,058

Re: Hard reset, how to debug?

Disabling c-states altogether ("processor.max_cstate=1") *might* help (at high costs) because it'll mostly prevent cores from idling, avoiding the troublesome access patterns.
The curve optimizer would be found in your UEFI settings.

Online

#10 Yesterday 08:20:37

pepijndevos
Member
Registered: 2014-02-02
Posts: 29

Re: Hard reset, how to debug?

Alright, I've set a positive curve optimizer offset of 4, maybe see you in a few weeks.

I mean yea if we're disabling C states I might be better off just putting the GPU back

Offline

#11 Yesterday 11:04:15

LuxFerre
Member
Registered: 2010-03-01
Posts: 91

Re: Hard reset, how to debug?

Maybe update the bios as well, if you haven't already.
I would also try a memtest even if running at JEDEC speeds. The memory controller is very finicky with brands and timings, and using the integrated graphics might be just what pushes it over the edge (it's all inside the processor). Anyway I hope the little bit of extra juice from curve optimizer fixes it.

Offline

#12 Yesterday 12:21:13

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,058

Re: Hard reset, how to debug?

If the frequency lowers but the problem occasionally returns, increase the offset (the values in the wiki are all anecdotal estimates by necessity.

Online

#13 Today 06:24:06

theDOC
Member
From: Aachen, Germany
Registered: 2009-06-18
Posts: 52

Re: Hard reset, how to debug?

pepijndevos wrote:
theDOC wrote:

That seems to be about a freeze rather than a reset, with quite different symptoms, I'm not sure they are related.

I disabled C-States via BIOS setting long ago and recently made an update.  After that, the settings got reset and my machine started rebooting randomly when idle. I almost forgot about the setting and as soon as I changed it back, my machine was stable again, no reboot since.

Offline

#14 Today 07:57:38

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 69,058

Re: Hard reset, how to debug?

Do you have control over the curve optimizer?
limiting/disabling the c-states will have a similar effect on the underlying problem (the cores won't cycle at all and also run hotter, constraining PBO)

Online

Board footer

Powered by FluxBB