You are not logged in.

#1 2024-03-11 11:16:23

Jphillips
Member
Registered: 2019-08-23
Posts: 68

Threadripper becomes unstable under minimal loads

I have a Threadripper PRO 5995WX on an Asus WS WRX80e-Sage board, with 8x 128 GB Samsung ECC RAM and a 4TB Kingston KC3000 NVMe. It runs fine under minimum load, but once I do anything that requires even a moderate number of cores (e.g., 20 out of 128) the system completely grinds to a halt. The processes are still running, but the user interface is completely unusable and has enormous lag. It's not totally frozen -- I can still kill processes but can take a minute for the terminal to register a simple command like pkill.

It seems to be largely independent of RAM, since even when I just do some random computational process that has no memory overhead, it still freezes up the system. And likewise independent of the hard disk, since it happens without reading/writing. But if there's a better way to troubleshoot this, let me know.

Right now I'm running Arch and Gnome, with fully updated firmware and BIOS. I'm not overclocking in any way, and am just using the BIOS defaults, though I've already tried various BIOS setting, like disable SMP, interleaving, CPPB preferred cores, but nothing changes. I have amd-ucode installed and set to preload. I've tried both acpi_cpufreq and amd-pstate cpu governors, under both powersafe and performance. Temperatures are all good, mostly below 55, though with some of the memory stick registering up into the 60s.

Any thoughts on what might be causing this, if there are any settings to try tweaking, or how to go about troubleshooting? Firstly I'm just trying to figure out if this is a software or hardware or BIOS issues.

Thanks!

Offline

#2 2024-03-11 15:28:48

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,315

Re: Threadripper becomes unstable under minimal loads

https://wiki.archlinux.org/title/Ryzen#Troubleshooting (curve optimizer or softlocks)
Anything in the journal/dmesg (you can access the journal of a previous boot "-b -1")

I'm running Arch and Gnome

To rule out the obvious: same issue w/ eg. an openbox session?
How's the CPU load when this happens? Maybe it's just the gnome compositor and the lag purely visual…

Offline

#3 2024-03-13 13:55:31

Jphillips
Member
Registered: 2019-08-23
Posts: 68

Re: Threadripper becomes unstable under minimal loads

Thanks for the suggestions -- I wasn't viewing this as a softlock so I hadn't tried those fixed. Unfortunately none of the suggested tweaks changes anything. Perhaps disabling the C6 state improved things *slightly* but not in a  meaningful way. I also tried it with openbox and the same problem happens.

The CPU load is at about 25% when this happens, and the voltages and thermals all look good. There also seems to be a bit of a lag when ssh'ing in while it's happening, so it seems to be system wide.

Offline

#4 2024-03-14 18:22:38

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,315

Re: Threadripper becomes unstable under minimal loads

What if you drastically reduce the number of cores and boot w/ eg "maxcpus=8"?
(Theory being that the kernel gets busy playing pingpong with the 128 cpus…)

Offline

Board footer

Powered by FluxBB