You are not logged in.
Pages: 1
Greetings!
I have a new system, recently built, where I have installed Arch Linux. Unfortunately, this system hard locks after several days of uptime. This has happened many times now, where after 2-4 days of normal operation I find the computer unresponsive. I have started leaving journalctl -f running to see what was happening leading up to the hang.
Initially, I saw the following:
kernel: BUG: kernel NULL pointer dereference, address: number-with-lots-of-leading-zeros
I suspected this could be a genuine bug in the latest kernel, so I switched to linux-lts. The system ran fine for several days, then hard-locked.
The screen showed the following:
rcu_preempt self-detected stall on CPU {somenumber}
How can I go about determining the root cause? I had another system running ubuntu LTS, and it had over two years of uptime until the UPS failed.
Random info about the computer:
Ryzen 7 1700X
B450 chipset
No gui.
Runs zoneminder and smb.
CPU temperature 40C
Last edited by Zcool31 (2023-04-24 16:24:42)
Offline
Offline
Thanks! I wasn't aware extra steps were needed for Ryzen processors. Would it make sense to have a link to this page in the setup guide not too far from where it talks abut amd-ucode.img?
In the meantime, I'll try the various things suggested there and report back with updates.
Offline
Update! I changed "Power Idle Control" to "Typical Current Idle" in UEFI as suggested by the linked Ryzen Troubleshooting page. I also disabled C6 and C7 in UEFI. After this, the system ran for 8 days without locking up, which is the longest continuous period yet. But then it did soft-lock.
I'm currently running MemTest86+ to rule out the memory. It has passed 4 times so far with 0 errors.
One concerning thing is that the CPU temperature reaches 83 degrees C regularly. Could this be a cause of the hangs? Wouldn't the CPU throttle before failing?
Offline
Update! After 48 days of uptime, I'm now confident the cause was overheating. I installed a cheap-o 120mm tower cooler that has no trouble keeping the CPU below around 55 degrees C under sustained loads. Zero problems after that.
Disappointed by Noctua because the AM-4 mounting hardware kit for my old cooler likely did not provide enough pressure.
Disappointed by AMD because modern CPUs should throttle before corrupting my memory when run at stock settings.
Offline
Yes "should" … ![]()
Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
Offline
Pages: 1