You are not logged in.
Update: Also tried fabric clock (FCLK) at 1800MHz and memory clock at 5600MT/s, and still had freezes.
I've had a issue since last month similar to https://bbs.archlinux.org/viewtopic.php?id=311723 but since that thread has been inactive for two months, I felt it might be better starting a new one. The freeze only started occuring after both hardware and software changes, so I haven't 100% ruled out the chances of a hardware failure (and thus my case is unrelated to the other thread), but given the behavior I don't think it's likely.
I'm running a system with an ASUS TUF Gaming B650M-PLUS WiFi motherboard with a Ryzen 9 7950X CPU. Since early April when I installed a PCIe packet switch, updated the UEFI firmware to 3842 (AGESA 1.3.0.0a), and after the update made minor changes to DRAM timings (they still pass stress tests), I've had random freezes where the kernel seems to soft or hard lockup (SysRq didn't work, NICs went down, switching TTYs don't work, and I can only force shutdown using the power button), and after rebooting there's no logs on disk related to the freezes.
I've tried disabling PCIe ASPM, setting the kernel parameter processor.max_cstate=1 and usbcore.autosuspend=-1 (as was discussed in the other thread), upgrading and downgrading UEFI firmware, upgrading and downgrading the kernel, tuning voltage and memory timings, raising minimum CPU frequency using cpupower, increasing Load-Line Calibration levels, and disabling boosting. These mitigations didn't help fix the freeze, they either changed the frequency of the freeze or did nothing (I haven't had enough freezes to tell if freezing in 5 or 15 minutes using the test explained below is just random or shows some sort of progress).
One specific scenario I discovered that increase the frequency of the freeze from around once every few days to around once every 10-15 minutes is to have the rhythm game osu! running its tournament client (multiple instances are started in this mode), capture all these game windows using OBS, and do some other light load work or just let the system stay in that state (this creates a somewhat large number of tasks that don't consume much resources). Before using usbcore.autosuspend=-1 as per the fix in the other thread, usually there's nothing in the logs before soft lockups happen, after setting the kernel command line I saw only once that just before a freeze happened, two different GPUs (one AMD, one Intel) behind a PCIe packet switch timed out at around the same time. The usb_poll test program (mentioned in the other thread) seems to never freeze my system with the command line. Other common actions before a freeze happens include starting VSCode and potentially other Electron applications. The freeze only happened under transients and spiky workloads, not when I play heavier games, run stress tests or in general have the system running under heavier load. I tried different stress tests (y-cruncher, stress-ng, mprime, running ffmpeg encode, LLM finetuning on the GPU, hashcat, OCCT CPU stress test both with constant and variable load, etc) and they all seem to run well without errors or freezes.
I tried raising voltage (reducing negative core voltage offsets, and increasing SoC voltage), which fixed the issue for me during April but since yesterday it's happening again, and this time further increasing voltage or loosening timings didn't help. I tried the latest UEFI firmware for my motherboard (3854, AGESA 1.3.0.0b) and also downgrading to 3602 (AGESA 1.2.7.0, released in 2025/11), both didn't fix the freeze. Downgrading the kernel to 6.19.12 also didn't help.
This had led me to suspect the I/O die or the fabric itself was unstable with certain powersaving features, not just a USB controller or a PCIe link. This isn't backed by any conclusive proof since I can only see something from netconsole around 10% of the time (I now insert the netconsole module and log to my other system at most times), and when logs do come through usually there's nothing before the kernel detected soft lockups on many different CPU cores. I also tried using pstore/memoops but it seems the kernel didn't panic before it completely hung and pstore wasn't preserving logs either (similar to the other thread). But since there are multiple cases where ASUS motherboards and Zen 4/5 Ryzen 7/9 CPUs had similar issues in the other thread, I don't think it's necessarily a hardware failure.
One specific case that might provide some insights into what's happening is that for one freeze my mouse stopped registering inputs first, and during the next second before my display and the whole system froze, an application did start, so in between mouse inputs gone and system lockup there is a brief period when userspace was still running.
At this point I'm out of ideas on where to explore next, and I'll be grateful if someone could point me in some directions on how to diagnose the issue.
lspci (this system had a PEX88080 PCIe packet switch hosting GPUs and another PLX8747 switch, which then hosts NVMe SSDs, so there are a lot of PCIe bridges, and while some I/O memory space failed to assign, every device has been working properly):
00:00.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Root Complex
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: ryzen_smu
Kernel modules: ryzen_smu
00:00.2 IOMMU: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge IOMMU
Subsystem: ASUSTeK Computer Inc. Device 8877
00:01.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:01.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: pcieport
Kernel modules: shpchp
00:01.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: pcieport
Kernel modules: shpchp
00:02.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:02.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: pcieport
Kernel modules: shpchp
00:02.2 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge GPP Bridge
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: pcieport
Kernel modules: shpchp
00:03.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:04.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:08.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Dummy Host Bridge
00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A]
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: pcieport
Kernel modules: shpchp
00:08.3 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A]
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: pcieport
Kernel modules: shpchp
00:14.0 SMBus: Advanced Micro Devices, Inc. [AMD] FCH SMBus Controller (rev 71)
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: piix4_smbus
Kernel modules: i2c_piix4, sp5100_tco
00:14.3 ISA bridge: Advanced Micro Devices, Inc. [AMD] FCH LPC Bridge (rev 51)
Subsystem: ASUSTeK Computer Inc. Device 8877
00:18.0 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 0
00:18.1 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 1
00:18.2 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 2
00:18.3 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 3
Kernel modules: k10temp
00:18.4 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 4
00:18.5 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 5
00:18.6 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 6
00:18.7 Host bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Data Fabric; Function 7
01:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
02:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
02:04.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
02:08.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
02:0c.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
02:1c.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
03:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
04:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
04:08.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
05:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Upstream Port of PCI Express Switch (rev 24)
Subsystem: ASUSTeK Computer Inc. Device 1478
Kernel driver in use: pcieport
Kernel modules: shpchp
06:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch (rev 24)
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 XL Downstream Port of PCI Express Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
07:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 [Radeon RX 9070/9070 XT/9070 GRE] (rev c0)
Subsystem: ASUSTeK Computer Inc. Device 061a
Kernel driver in use: amdgpu
Kernel modules: amdgpu
07:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 HDMI/DP Audio Controller
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Navi 48 HDMI/DP Audio Controller
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
09:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
0a:10.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
0a:18.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
0b:00.0 PCI bridge: Intel Corporation Device 4fa1 (rev 01)
Kernel driver in use: pcieport
Kernel modules: shpchp
0c:01.0 PCI bridge: Intel Corporation Device 4fa4
Subsystem: Intel Corporation Device 4fa4
Kernel driver in use: pcieport
Kernel modules: shpchp
0c:04.0 PCI bridge: Intel Corporation Device 4fa4
Subsystem: Intel Corporation Device 0000
Kernel driver in use: pcieport
Kernel modules: shpchp
0d:00.0 VGA compatible controller: Intel Corporation DG2 [Arc A380] (rev 05)
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1814
Kernel driver in use: i915
Kernel modules: i915, xe
0e:00.0 Audio device: Intel Corporation DG2 Audio Controller
Subsystem: Shenzhen Gunnir Technology Development Co., Ltd Device 1283
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
0f:00.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
10:08.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
10:09.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
10:10.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
10:11.0 PCI bridge: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch (rev ca)
Subsystem: PLX Technology, Inc. PEX 8747 48-Lane, 5-Port PCI Express Gen 3 (8.0 GT/s) Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
11:00.0 Non-Volatile memory controller: Yangtze Memory Technologies Co.,Ltd PC411 M.2 2280 NVMe SSD (DRAM-less) (rev 01)
Subsystem: Yangtze Memory Technologies Co.,Ltd PC411 M.2 2280 NVMe SSD (DRAM-less)
Kernel driver in use: nvme
Kernel modules: nvme
13:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM981/PM981/PM983
Subsystem: Samsung Electronics Co Ltd SSD 970 EVO/PRO
Kernel driver in use: nvme
Kernel modules: nvme
14:00.0 Non-Volatile memory controller: Intel Corporation NVMe Optane Memory Series
Subsystem: Intel Corporation Optane Memory M10 16GB
Kernel driver in use: nvme
Kernel modules: nvme
15:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
16:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
16:08.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
19:00.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
1a:14.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
1a:15.0 PCI bridge: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI PEX88064 64 lane/port PCIe Gen 4.0 Switch
Kernel driver in use: pcieport
Kernel modules: shpchp
1d:00.0 Mass storage controller: Broadcom / LSI PEX880xx PCIe Gen 4 Switch (rev b0)
Subsystem: Broadcom / LSI Device 00b2
1e:00.0 Non-Volatile memory controller: Yangtze Memory Technologies Co.,Ltd ZHITAI TiPlus7100 (rev 01)
Subsystem: Yangtze Memory Technologies Co.,Ltd ZHITAI TiPlus7100
Kernel driver in use: nvme
Kernel modules: nvme
1f:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Upstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:08.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:09.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:0a.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:0b.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:0c.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
20:0d.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset PCIe Switch Downstream Port (rev 01)
Subsystem: ASMedia Technology Inc. Device 3328
Kernel driver in use: pcieport
Kernel modules: shpchp
21:00.0 Network controller: Mellanox Technologies MT27500 Family [ConnectX-3]
Subsystem: Mellanox Technologies Device 0051
Kernel driver in use: mlx4_core
Kernel modules: mlx4_core
22:00.0 PCI bridge: Texas Instruments XIO2001 PCI Express-to-PCI Bridge
Kernel modules: shpchp
23:00.0 Multimedia audio controller: ESI Audiotechnik GmbH MAYA44 family PCI Audio Controller (rev 03)
Subsystem: Device 0e51:0003
Kernel driver in use: snd_ice1724
25:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8125 2.5GbE Controller (rev 05)
DeviceName: Realtek RTL8125BG LAN
Subsystem: ASUSTeK Computer Inc. Device 87d7
Kernel driver in use: r8169
Kernel modules: r8169
26:00.0 Network controller: MEDIATEK Corp. MT7921 802.11ax PCIe Wireless Network Adapter [Filogic 330]
Subsystem: AzureWave Device 4680
Kernel driver in use: mt7921e
Kernel modules: mt7921e
27:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset USB 3.2 Controller (rev 01)
Subsystem: ASMedia Technology Inc. Device 1142
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
28:00.0 SATA controller: Advanced Micro Devices, Inc. [AMD] 600 Series Chipset SATA Controller (rev 01)
Subsystem: ASMedia Technology Inc. Device 1062
Kernel driver in use: ahci
Kernel modules: ahci
29:00.0 Non-Volatile memory controller: Yangtze Memory Technologies Co.,Ltd ZHITAI TiPlus7100 (rev 01)
Subsystem: Yangtze Memory Technologies Co.,Ltd ZHITAI TiPlus7100
Kernel driver in use: nvme
Kernel modules: nvme
2a:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raphael (rev c1)
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: amdgpu
Kernel modules: amdgpu
2a:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Radeon High Definition Audio Controller
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
2a:00.2 Encryption controller: Advanced Micro Devices, Inc. [AMD] Family 19h PSP/CCP
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: ccp
Kernel modules: ccp
2a:00.3 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
2a:00.4 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 3.1 xHCI
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
2a:00.6 Audio device: Advanced Micro Devices, Inc. [AMD] Ryzen HD Audio Controller
DeviceName: Realtek ALC897 Audio
Subsystem: ASUSTeK Computer Inc. Device 8841
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
2b:00.0 USB controller: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge USB 2.0 xHCI
Subsystem: ASUSTeK Computer Inc. Device 8877
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pciCurrent kernel version: 7.0.5-arch1-1
Last edited by foraphe (2026-05-10 16:10:46)
Online
This reminds me of an issue I had, does it improve if you completely disable XMP?
Why I run Arch? To "BTW I run Arch" the guy one grade younger.
And to let my siblings and cousins laugh at Arsch Linux...
Offline
wow thats hard to read so sorry if i missed vital info, process of elimination is always the answer, remove everything but the basics (board/cpu/1 stick ram) (apparently your cpu has graphics built in so take the gpu out of the equation too) and set uefi to defaults with xmp off as suggested and see what happens, if you still get a crash then try different ram stick/cpu/ssd till you find the offending device
all that messing around with voltages and other crap is pointless, devices should run at defaults with no issues, if they dont theyre faulty
Offline
Does the script in https://bbs.archlinux.org/viewtopic.php … 2#p2297852 trigger the problem (not immediately but after a short while)?
To be clear, you did test "processor.max_cstate=1 iommu=soft rcu_nocbs=0-15 pcie_aspm=off" ? Together?
You are getting *soft* lockups, you can reboot the system (in doubt using https://wiki.archlinux.org/title/Keyboa … el_(SysRq) + REISUB) and or otherwise get a journal covering those incidents?
Online
This reminds me of an issue I had, does it improve if you completely disable XMP?
I tried running at reduced frequencies (DDR5-5600 with FCLK reduced to 1800MHz), which doesn't seem to help, but didn't try JEDEC (DDR5-4800). I'll try JEDEC specs.
remove everything but the basics (board/cpu/1 stick ram) (apparently your cpu has graphics built in so take the gpu out of the equation too) and set uefi to defaults with xmp off as suggested and see what happens, if you still get a crash then try different ram stick/cpu/ssd till you find the offending device
Does the script in https://bbs.archlinux.org/viewtopic.php … 2#p2297852 trigger the problem (not immediately but after a short while)?
I needed a stable system to work with so I have moved my peripherals to a different system. I'll try with only the motherboard+CPU, a single RAM stick and maybe a fresh install on a USB drive and report back once I had some free time tomorrow.
To be clear, you did test "processor.max_cstate=1 iommu=soft rcu_nocbs=0-15 pcie_aspm=off" ? Together?
I forgot to do iommu=soft, but all others were set together. I recall the related kernel command line was "pcie_aspm=off rcu_nocbs=0-31 processor.max_cstate=1 idle=nomwait pci=nomsi usbcore.autosuspend=-1 iommu=pt amd_iommu=on amd_pstate=passive".
You are getting *soft* lockups, you can reboot the system (in doubt using https://wiki.archlinux.org/title/Keyboa … el_(SysRq) + REISUB) and or otherwise get a journal covering those incidents?
SysRq also didn't seem to work when I do enable it. The only logs I got (the netconsole logs 10% of the time) were reporting soft lockups, but it seems to escalate until NICs and other things also go down. In some occurances pinging the computer worked for around a minute more after the freeze until it doesn't.
The keyboard/mouse also didn't work as soon as a freeze started (caps lock won't toggle, and the mouse stopped moving first in that one case when userspace was still alive).
Netconsole logs usually look very similar to this one when they do come through:
[ 8.231405] snd_hda_intel 0000:0e:00.0: Unknown capability 0
[ 9.126739] amdgpu: Overdrive is enabled, please disable it before reporting any bugs unrelated to overdrive.
[ 100.529643] watchdog: BUG: soft lockup - CPU#29 stuck for 26s! [kworker/u130:6:918]
[ 100.529645] CPU#29 Utilization every 4000ms during lockup:
[ 100.529646] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529648] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529649] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529650] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529651] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528398] watchdog: BUG: soft lockup - CPU#10 stuck for 22s! [ThreadPoolForeg:4407]
[ 100.528400] CPU#10 Utilization every 4000ms during lockup:
[ 100.528401] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528402] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528403] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528404] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529856] watchdog: BUG: soft lockup - CPU#15 stuck for 26s! [chrome:4241]
[ 100.529858] CPU#15 Utilization every 4000ms during lockup:
[ 100.529859] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529860] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528405] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529861] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529861] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529862] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528681] watchdog: BUG: soft lockup - CPU#27 stuck for 26s! [chrome:4716]
[ 100.528683] CPU#27 Utilization every 4000ms during lockup:
[ 100.528684] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529995] watchdog: BUG: soft lockup - CPU#13 stuck for 22s! [kworker/u130:0:212]
[ 100.529996] CPU#13 Utilization every 4000ms during lockup:
[ 100.529997] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529998] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528686] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529999] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.530000] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528686] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528687] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.530001] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528688] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528895] watchdog: BUG: soft lockup - CPU#16 stuck for 26s! [osu!.exe:7848]
[ 100.528898] CPU#16 Utilization every 4000ms during lockup:
[ 100.530194] watchdog: BUG: soft lockup - CPU#12 stuck for 26s! [osu!.exe:8616]
[ 100.530196] CPU#12 Utilization every 4000ms during lockup:
[ 100.530197] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.530198] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.530199] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.530200] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.530200] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528899] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528900] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528901] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.528901] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.528902] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529067] watchdog: BUG: soft lockup - CPU#17 stuck for 26s! [tosu.exe:4819]
[ 100.529069] CPU#17 Utilization every 4000ms during lockup:
[ 100.529070] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529071] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529072] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529073] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529074] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529255] watchdog: BUG: soft lockup - CPU#25 stuck for 26s! [ServiceWorker t:4437]
[ 100.529258] CPU#25 Utilization every 4000ms during lockup:
[ 100.529259] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529260] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529261] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529262] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529263] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529434] watchdog: BUG: soft lockup - CPU#11 stuck for 22s! [kworker/u130:5:917]
[ 100.529435] CPU#11 Utilization every 4000ms during lockup:
[ 100.529436] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529438] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529439] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 100.529439] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 100.529440] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528389] watchdog: BUG: soft lockup - CPU#8 stuck for 23s! [worker:8408]
[ 104.528391] CPU#8 Utilization every 4000ms during lockup:
[ 104.528392] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528393] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528394] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528395] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528396] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528570] watchdog: BUG: soft lockup - CPU#14 stuck for 23s! [worker:8476]
[ 104.528573] CPU#14 Utilization every 4000ms during lockup:
[ 104.528573] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528575] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528576] #3: 100% system, 1% softirq, 1% hardirq, 0% idle
[ 104.528577] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528577] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528759] watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [ThreadPoolForeg:3966]
[ 104.528761] CPU#28 Utilization every 4000ms during lockup:
[ 104.528761] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528762] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528763] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528764] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528764] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528879] watchdog: BUG: soft lockup - CPU#31 stuck for 23s! [electron:2999]
[ 104.528880] CPU#31 Utilization every 4000ms during lockup:
[ 104.528880] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528881] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528882] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 104.528883] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 104.528883] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 108.528543] watchdog: BUG: soft lockup - CPU#26 stuck for 23s! [chrome:4247]
[ 108.528545] CPU#26 Utilization every 4000ms during lockup:
[ 108.528546] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 108.528547] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 108.528548] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 108.528549] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 108.528550] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528394] watchdog: BUG: soft lockup - CPU#10 stuck for 48s! [ThreadPoolForeg:4407]
[ 128.528395] CPU#10 Utilization every 4000ms during lockup:
[ 128.528396] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528397] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528398] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528399] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528400] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528606] watchdog: BUG: soft lockup - CPU#27 stuck for 52s! [chrome:4716]
[ 128.528608] CPU#27 Utilization every 4000ms during lockup:
[ 128.528609] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528610] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528611] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528612] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528612] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528799] watchdog: BUG: soft lockup - CPU#17 stuck for 52s! [tosu.exe:4819]
[ 128.528801] CPU#17 Utilization every 4000ms during lockup:
[ 128.528802] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528803] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528804] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528805] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528806] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528971] watchdog: BUG: soft lockup - CPU#16 stuck for 52s! [osu!.exe:7848]
[ 128.528972] CPU#16 Utilization every 4000ms during lockup:
[ 128.528973] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528973] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528974] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.528975] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.528976] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529080] watchdog: BUG: soft lockup - CPU#25 stuck for 52s! [ServiceWorker t:4437]
[ 128.529082] CPU#25 Utilization every 4000ms during lockup:
[ 128.529083] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529084] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529085] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529086] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529087] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529237] watchdog: BUG: soft lockup - CPU#11 stuck for 48s! [kworker/u130:5:917]
[ 128.529238] CPU#11 Utilization every 4000ms during lockup:
[ 128.529238] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529239] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529240] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529241] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529242] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529439] watchdog: BUG: soft lockup - CPU#13 stuck for 48s! [kworker/u130:0:212]
[ 128.529440] CPU#13 Utilization every 4000ms during lockup:
[ 128.529441] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529442] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529443] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529444] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529445] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529640] watchdog: BUG: soft lockup - CPU#12 stuck for 52s! [osu!.exe:8616]
[ 128.529641] CPU#12 Utilization every 4000ms during lockup:
[ 128.529642] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529643] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529643] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529644] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529645] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529810] watchdog: BUG: soft lockup - CPU#15 stuck for 52s! [chrome:4241]
[ 128.529811] CPU#15 Utilization every 4000ms during lockup:
[ 128.529812] #1: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529813] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529813] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529814] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529815] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529931] watchdog: BUG: soft lockup - CPU#29 stuck for 52s! [kworker/u130:6:918]
[ 128.529932] CPU#29 Utilization every 4000ms during lockup:
[ 128.529933] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529934] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529935] #3: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 128.529936] #4: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 128.529937] #5: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 132.528386] watchdog: BUG: soft lockup - CPU#8 stuck for 49s! [worker:8408]
[ 132.528387] CPU#8 Utilization every 4000ms during lockup:
[ 132.528388] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 132.528389] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 132.528390] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 132.528391] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 132.528392] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 132.528552] watchdog: BUG: soft lockup - CPU#14 stuck for 49s! [worker:8476]
[ 132.528554] CPU#14 Utilization every 4000ms during lockup:
[ 132.528555] #1: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 132.528556] #2: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 132.528557] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 132.528558] #4: 100% system, 0% softirq, 1% hardirq, 0% idle
[ 132.528559] #5: 100% system, 0% softirq, 0% hardirq, 0% idleLast edited by foraphe (Today 03:06:14)
Online
Obviously
[ 9.126739] amdgpu: Overdrive is enabled, please disable it before reporting any bugs unrelated to overdrive.I don't think it's the ryzen bug but "iommu=soft" might actually do something and because of "somewhat large number of tasks" also add "maxcpus=15" (no HT and skip one core altogether)
Also please post your complete system journal:
sudo journalctl -b | curl -s -H "Accept: application/json, */*" --upload-file - 'https://paste.c-net.org/' and the output of "lsmod"
Edit: just for a general oversight and rounding up the usual suspects.
Last edited by seth (Today 09:04:38)
Online
My hypothesis might have been wrong since the beginning, and this might be a signal integrity issue that resulted in platform hangs instead.
Before I had the time to test the now "old" system I already encountered problems on the new one, and it might help figure out what happened. With a different CPU/motherboard combo with otherwise the same setup (now without any of the kernel options trying to mitigate the freeze), I had AER corrected error log spam on my SSDs and had two AER fatal errors on the main GPU in a day, and the system continued running somewhat normally this time except for the GPU being gone.
The AER spam is fixed by re-routing the SlimSAS cables coming from the PCIe switch properly, and I haven't had fatal errors since doing this. still encountered one.
The old Asus motherboard masked AER errors. I tried digging into it in the past with setpci but didn't succesfully unmask them.
Maybe the GPU is actually falling off the bus due to bad SI, and maybe the firmware on Asus motherboards didn't handle this type of events very cleanly (a device not waking up from deeper power-saving levels in time and a device not responding due to losing a PCIe link "might" be similar enough to trigger the same firmware quirks, i.e. a freeze).
I'll still test the old system on iGPU and with iommu=soft once I had some time (if this hypothesis is correct it should fix itself).
Also please post your complete system journal
Journal of a boot that ended up with a freeze (the kernel options are slightly diffrent when I tried different combinations): https://paste.c-net.org/ProofingWhatley. Regarding the lsmod part I also have to wait until I had more time to set up the system on a test bench.
Last edited by foraphe (Today 11:18:22)
Online