You are not logged in.
This issue has occurred twice within a week but does not occur consistently.
When resuming from suspend (which I do daily) the CPU steadily becomes hotter (according to btop) and it doesn't stop until reaching 95°C and then X crashes and I force the power off to avoid overheating damage.
I have an AMD 7950X with a 280mm AIO watercooler and they both seem to work fine usually.
I have the CPU configured with a thermal limit of 88°C which it doesn't cross under any load I give it.
When this issue occurs, the cpu seems to be trying to clock itself down to manage the heat as I see ~500Mhz reported on btop when it is getting hot.
It takes around one minute after resuming and everything seeming fine until the CPU reaches >90°C.
There are no programs that btop reports as having high cpu usage when this happens.
I suspect that the issue might be related to my experiments in trying to get my GPU to behave properly with suspends and hibernations.
I recently enabled the nvidia-persistenced.service and added these settings:
$ cat /etc/modprobe.d/nvidia-power-management.conf
options nvidia NVreg_PreserveVideoMemoryAllocations=1 NVreg_TemporaryFilePath=/opt/tmpThe issue may have occurred with the preserve allocations setting off (NVreg_PreserveVideoMemoryAllocations=0) and if that was not the case, the temporary file path was set to `/tmp` not `/opt/tmp`
I have a 69GB swapfile which should be sufficient for both the 32GB of RAM and 24GB of VRAM on the 4090. I don't know if other accommodations are needed.
I also recently switched to pipewire and it has been misbehaving a little, giving corrupted output in some situations. That issue has always been fixable by changing outputs in pavucontrol back and forth.
It has also been missing buffers which produces noticeable pops and cracks every few seconds. I doubt that it's related but who knows?
I don't see anything unusual in the journal when the issue occurs and haven't been quick enough to see what is in dmesg while it's happening.
Any help is of course greatly appreciated and if any logs would be helpful, I will of course provide them.
Offline
When this issue occurs, the cpu seems to be trying to clock itself down to manage the heat as I see ~500Mhz reported on btop when it is getting hot.
So this works.
If the
280mm AIO watercooler
isn't capable of cooling the downclocked CPU down, maybe it's the source of the heat.
Does the GPU heat up as well?
Is it also connected to the watercooler?
Online
I don't think the GPU heated up. It's fans didn't ramp up noticeably like the CPU fans did.
The AIO is only cooling the CPU.
If I set the CPU scaling governor to powersaving, it doesn't go above 3.0Ghz. Running stockfish on 32 threads, loading the CPU 100% has the temperature reaching 45°C.
If I do the same with the governor set to performance, it goes up to ~4.7Ghz and reaches 88°C.
These numbers are in line with my expectations based on testing by reviewers and indicate that everything is working correctly.
I mounted the AIO with thermal paste when I set the computer together and have not touched it since.
As I said, btop didn't report any program over 50% cpu load (that's per core load so 3200% means full utilization of all 32 threads) when the issue occured.
I think I managed to change the cpu governor the last time this happened, thinking that it might be the culprit, but that had no effect.
The issue seems to be rather low level.
One theory is that something like the register that contains the CPU temperature is being incremented by something external. The temperature increase is very constant and doesn't look like it is caused by anything that typically affects CPU temperatures.
To manage the heat, the CPU reduces it's clock rate as much as it can but to no avail as the clock rate isn't causing the high "temperature". I have never seen the CPU go below 1Ghz in any other situation. I don't know why it would crash but underclocking this much seems like it could cause instability.
Another theory is that the CPU is actually heating up. Unfortunately I didn't think of putting my hand by the computer exhaust to feel if it was hot when the issue occurred. If this is the case this heating is very strange.
As I said, the temperature rises very linearly, much slower than it would if the CPU were under a high load. I can't explain why it would do that. The cause of the heating is also mysterious as the CPU is not under load according to btop and in any case, the CPU and cooler can handle high loads without this issue or similar occuring.
Offline
I am not familiar with btop, but it sure sounds like you have a runaway thread that is driving a CPU to 100%, causing the thermal output to exceed the capacity of your cooling system.
I just installed btop, nice display. I does show CPU utilization by percent, and also reports the load average. What do those numbers do during the pre-meltdown runup?
Nothing is too wonderful to be true, if it be consistent with the laws of nature -- Michael Faraday
The shortest way to ruin a country is to give power to demagogues.— Dionysius of Halicarnassus
---
How to Ask Questions the Smart Way
Offline
If I set the CPU scaling governor to powersaving, it doesn't go above 3.0Ghz. Running stockfish on 32 threads, loading the CPU 100% has the temperature reaching 45°C.
If I do the same with the governor set to performance, it goes up to ~4.7Ghz and reaches 88°C.
btop didn't report any program over 50% cpu load [of 3200% max] when the issue occured.
If we rol w/ the theory that the CPU just "thinks" it's hot, but actually is not, https://wiki.archlinux.org/title/Ryzen# … nd_suspend mentions problems w/ C6 a lot.
It might be worthwile to keep it away from that.
Online
I am not familiar with btop, but it sure sounds like you have a runaway thread that is driving a CPU to 100%, causing the thermal output to exceed the capacity of your cooling system.
I just installed btop, nice display. I does show CPU utilization by percent, and also reports the load average. What do those numbers do during the pre-meltdown runup?
I'm sorry that I am so verbose when I'm explaining myself.
As I mentioned, I have tried multiple high CPU load programs and they cause no issues. The temperature never exceeds 88˚C and the CPU clock is stable over 4.5Ghz, as opposed to downclocking to 500Mhz as happens when the issue occurs. The cooling system works fine.
I didn't look at the load but as I said, there are no programs that btop reports as having high cpu usage when this happens.
I recall seeing Firefox at the top of the list at around 40% core utilization or 1.25% of total CPU usage. That is normal behavior.
If I set the CPU scaling governor to powersaving, it doesn't go above 3.0Ghz. Running stockfish on 32 threads, loading the CPU 100% has the temperature reaching 45°C.
If I do the same with the governor set to performance, it goes up to ~4.7Ghz and reaches 88°C.btop didn't report any program over 50% cpu load [of 3200% max] when the issue occured.
If we rol w/ the theory that the CPU just "thinks" it's hot, but actually is not, https://wiki.archlinux.org/title/Ryzen# … nd_suspend mentions problems w/ C6 a lot.
It might be worthwile to keep it away from that.
Interesting, thank you for the pointer. I'll look into this and reply here with any updates.
Offline