You are not logged in.
Hey there
My Dell XPS 7590, which is about 5 years old, has been randomly freezing for about 2 weeks. Until then, the system worked flawlessly.
A freeze usually starts when the network connection via the USB-C docking station suddenly drops out. journalctl then shows the following logs:
Nov 01 13:16:12 archie kernel: xhci_hcd 0000:3a:00.0: Controller not ready at resume -19
Nov 01 13:16:12 archie kernel: xhci_hcd 0000:3a:00.0: PCI post-resume error -19!
Nov 01 13:16:12 archie kernel: xhci_hcd 0000:3a:00.0: HC died; cleaning up
Nov 01 13:16:12 archie kernel: usb 4-1: USB disconnect, device number 2
Nov 01 13:16:12 archie kernel: usb 4-1.3: USB disconnect, device number 3
Nov 01 13:16:12 archie pipewire[2027]: pw.node: (alsa_output.usb-Macronix_Razer_Barracuda_X_2.4_1234-00.iec958-stereo-154) graph xrun not-triggered (0 suppressed)
Nov 01 13:16:12 archie pipewire[2027]: pw.node: (alsa_output.usb-Macronix_Razer_Barracuda_X_2.4_1234-00.iec958-stereo-154) xrun state:0x7275f8723008 pending:4/8 s:23989941171689 a:23989944709898 f:23989944721467 waiting:3538209 process:11569 status:triggered
Nov 01 13:16:12 archie pipewire[2027]: pw.node: (ee_soe_maximizer-121) xrun state:0x7275f8a32008 pending:0/2 s:23990011093721 a:23990011099307 f:23989943951037 waiting:5586 process:18446744073642403346 status:awake
Nov 01 13:16:17 archie kernel: xhci_hcd 0000:3a:00.0: xHCI host controller not responding, assume dead
Nov 01 13:16:17 archie kernel: xhci_hcd 0000:3a:00.0: HC died; cleaning up
Nov 01 13:16:17 archie kernel: xhci_hcd 0000:3a:00.0: Timeout while waiting for configure endpoint command
Nov 01 13:16:17 archie kernel: r8152-cfgselector 4-1.4: USB disconnect, device number 4
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4220] device (enp58s0u1u4): state change: activated -> unmanaged (reason 'unmanaged-link-not-init', managed-type: 'removed')
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4222] dhcp4 (enp58s0u1u4): canceled DHCP transaction
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4222] dhcp4 (enp58s0u1u4): activation: beginning transaction (timeout in 45 seconds)
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4222] dhcp4 (enp58s0u1u4): state changed no lease
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4224] dhcp6 (enp58s0u1u4): canceled DHCP transaction
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4224] dhcp6 (enp58s0u1u4): activation: beginning transaction (timeout in 45 seconds)
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4224] dhcp6 (enp58s0u1u4): state changed no lease
Nov 01 13:16:17 archie NetworkManager[1219]: <warn> [1730463377.4226] device (enp58s0u1u4): set-link: failure to reset link negotiation
Nov 01 13:16:17 archie kdeconnectd[2414]: 2024-11-01T13:16:17 default: Error sending UDP packet: QAbstractSocket::NetworkError
Nov 01 13:16:17 archie NetworkManager[1219]: <info> [1730463377.4279] manager: NetworkManager state is now CONNECTED_LOCAL
This happens with kernel 6.11 as well as with 6.6-lts. The freezes under 6.11 at least feel more frequent though.
The following logs are also suspicious, but they only occur when I have my laptop connected to the dock. These occur approximately every 5 to 30 seconds:
pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Does anyone have any ideas as to why this might be? All firmwares are up to date. Is this the gradual death of my computer? I would be grateful for any suggestions.
lspci -tvv
-[0000:00]-+-00.0 Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers
+-01.0-[01]----00.0 NVIDIA Corporation TU117M [GeForce GTX 1650 Mobile / Max-Q]
+-02.0 Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630]
+-04.0 Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Thermal Subsystem
+-08.0 Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
+-12.0 Intel Corporation Cannon Lake PCH Thermal Controller
+-14.0 Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller
+-14.2 Intel Corporation Cannon Lake PCH Shared SRAM
+-15.0 Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0
+-15.1 Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1
+-16.0 Intel Corporation Cannon Lake PCH HECI Controller
+-17.0 Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller
+-1b.0-[02-3a]----00.0-[03-3a]--+-00.0-[04]----00.0 Intel Corporation JHL6340 Thunderbolt 3 NHI (C step) [Alpine Ridge 2C 2016]
| +-01.0-[05-39]--
| \-02.0-[3a]----00.0 Intel Corporation JHL6340 Thunderbolt 3 USB 3.1 Controller (C step) [Alpine Ridge 2C 2016]
+-1c.0-[3b]----00.0 Intel Corporation Wi-Fi 6 AX200
+-1c.4-[3c]----00.0 Realtek Semiconductor Co., Ltd. RTS525A PCI Express Card Reader
+-1d.0-[3d]----00.0 Micron Technology Inc 2200S NVMe SSD [Cassandra]
+-1f.0 Intel Corporation Cannon Lake LPC Controller
+-1f.3 Intel Corporation Cannon Lake PCH cAVS
+-1f.4 Intel Corporation Cannon Lake PCH SMBus Controller
\-1f.5 Intel Corporation Cannon Lake PCH SPI Controller
lsusb -t
/: Bus 001.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/16p, 480M
|__ Port 001: Dev 002, If 0, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 001: Dev 002, If 1, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 001: Dev 002, If 2, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 001: Dev 002, If 3, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 004: Dev 003, If 0, Class=Wireless, Driver=btusb, 12M
|__ Port 004: Dev 003, If 1, Class=Wireless, Driver=btusb, 12M
|__ Port 005: Dev 004, If 0, Class=Hub, Driver=hub/5p, 480M
|__ Port 002: Dev 013, If 0, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 002: Dev 013, If 1, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 002: Dev 013, If 2, Class=Audio, Driver=snd-usb-audio, 12M
|__ Port 002: Dev 013, If 3, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 003: Dev 008, If 0, Class=Hub, Driver=hub/6p, 480M
|__ Port 002: Dev 010, If 0, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 002: Dev 010, If 1, Class=Human Interface Device, Driver=usbhid, 12M
|__ Port 004: Dev 011, If 0, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 004: Dev 011, If 1, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 004: Dev 011, If 2, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 004: Dev 011, If 3, Class=Audio, Driver=snd-usb-audio, 480M
|__ Port 005: Dev 012, If 0, Class=Human Interface Device, Driver=usbhid, 480M
|__ Port 005: Dev 009, If 0, Class=Human Interface Device, Driver=usbhid, 480M
|__ Port 007: Dev 005, If 0, Class=Communications, Driver=[none], 12M
|__ Port 007: Dev 005, If 1, Class=CDC Data, Driver=[none], 12M
|__ Port 012: Dev 007, If 0, Class=Video, Driver=uvcvideo, 480M
|__ Port 012: Dev 007, If 1, Class=Video, Driver=uvcvideo, 480M
/: Bus 002.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/10p, 10000M
/: Bus 003.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 480M
/: Bus 004.Port 001: Dev 001, Class=root_hub, Driver=xhci_hcd/2p, 10000M
|__ Port 001: Dev 002, If 0, Class=Hub, Driver=hub/4p, 10000M
|__ Port 003: Dev 003, If 0, Class=Hub, Driver=hub/4p, 5000M
|__ Port 004: Dev 004, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
Offline
x-ref, https://bbs.archlinux.org/viewtopic.php?id=300697
What if you install the latest 6.10 kernel from your cache or the https://wiki.archlinux.org/title/Arch_Linux_Archive ?
(This is "safe", the kernel is more or less self-contained - just be aware of OOT modules, you don't have an nvidia GPU, but maybe vbox or so)
Offline
Thanks for the quick response! Yesterday I encountered a crash on kernel 6.10, however this time Spotify crashed and for some reason the system became unresponsive. After restarting the computer I ran into a PCI post-resume error -19! again.
I'm now on kernel 6.9 just to make sure it is not some sort of bug that has been introduced in any of the newer 6.10 versions. pcieport 0000:00:1b.0: PME: Spurious native interrupt! occurrs still from time to time, but way less often... Let's see whether the system still freezes.
Edit: https://bbs.archlinux.org/viewtopic.php?id=300697 indeed seems to have a very similar issue. Will follow that thread as well.
Last edited by dito (2024-11-05 06:23:40)
Offline
My system still randomly crashes, even on kernel 6.9.10. This time however there was not even an error in the logs... Last few minutes from journalctl -b -1 is:
Nov 07 07:15:04 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:15:38 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:15:56 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:16:32 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:17:58 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:18:26 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:18:56 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:19:56 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:20:05 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:20:11 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:20:32 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:20:56 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 07 07:21:05 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
I'm a bit lost here...
Offline
That's yout thunderbolt controller, the warnings appear every 10s-90s and is probably just archieved because its frequent (did the system actually freeze ~07:21, if you immediately rebooted the subsequent journal should show)
Did you ever get the "PCI post-resume error -19!" on the 6.9 kernel?
When the system crashes, can you still reboot using the https://wiki.archlinux.org/title/Keyboa … el_(SysRq) (the entire REISUB sequence, notably "s" will sync the filesystems and hopefully preserve more journal)?
Offline
That's yout thunderbolt controller, the warnings appear every 10s-90s and is probably just archieved because its frequent (did the system actually freeze ~07:21, if you immediately rebooted the subsequent journal should show)
The system was completely unresponsive and unfortunately there is no more logs. Next entry is from the subsequent boot sequence.
Did you ever get the "PCI post-resume error -19!" on the 6.9 kernel?
So far, no. Last error was from 6.10. But then again, I'm not really sure what happens on the last crash, as the log seems to be incomplete.
When the system crashes, can you still reboot using the https://wiki.archlinux.org/title/Keyboa … el_(SysRq) (the entire REISUB sequence, notably "s" will sync the filesystems and hopefully preserve more journal)?
Good idea, will try next time!
Offline
Alright, the post-resume error just happened again:
Nov 14 11:53:06 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:53:26 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:54:47 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:55:00 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:55:00 archie kernel: NOHZ tick-stop error: local softirq work is pending, handler #08!!!
Nov 14 11:55:19 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:55:47 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:56:05 archie kernel: pcieport 0000:00:1b.0: PME: Spurious native interrupt!
Nov 14 11:56:05 archie kernel: xhci_hcd 0000:3a:00.0: Controller not ready at resume -19
Nov 14 11:56:05 archie kernel: xhci_hcd 0000:3a:00.0: PCI post-resume error -19!
Nov 14 11:56:05 archie kernel: xhci_hcd 0000:3a:00.0: HC died; cleaning up
Nov 14 11:56:05 archie kernel: usb 4-1: USB disconnect, device number 2
Nov 14 11:56:11 archie kernel: usb 4-1.3: USB disconnect, device number 3
Nov 14 11:56:11 archie kernel: xhci_hcd 0000:3a:00.0: xHCI host controller not responding, assume dead
Nov 14 11:56:11 archie kernel: xhci_hcd 0000:3a:00.0: HC died; cleaning up
Nov 14 11:56:11 archie kernel: xhci_hcd 0000:3a:00.0: Timeout while waiting for configure endpoint command
Nov 14 11:56:11 archie kernel: r8152-cfgselector 4-1.4: USB disconnect, device number 4
Luckily the system still runs... And this is the first time I had troubles again for 4 days on kernel 6.9. However, ethernet-over-thunderbolt is broken now of cours until the next restart.
Offline
Assuming it's the TB controller, https://bugzilla.kernel.org/show_bug.cgi?id=216728 or https://bbs.archlinux.org/viewtopic.php?id=252122
Notably https://bugzilla.kernel.org/show_bug.cgi?id=216728#c17 - the later patch seems to have caused a regression but also only delays the resume, so if the device never comes back, that won't help.
You could try your luck w/ the BIOS setting (wake on dock, provided that's available on your system)
Edit: grumpf
cat /proc/acpi/wakeup
Last edited by seth (2024-11-15 16:13:47)
Offline
Thanks for the tips seth. Disabling Wake on Dock in the BIOS didn't help either. However, I discovered something interesting while tinkering around over the past few days:
The two error messages
pcieport 0000:00:1b.0: PME: Spurious native interrupt!
and
NOHZ tick-stop error: local softirq work is pending, handler #08!!!
only appear after I have connected the dock to the laptop for the first time. If I remove the dock during operation and plug it in again, there are no more error messages. I have also not noticed any more crashes. It makes no difference whether I boot the laptop with or without the dock. IMO this would now again rather indicate a software problem... However, I don't really know what to do with it now.
I'm currently running kernel 6.12 btw.
Offline
I have also not noticed any more crashes.
Does that mean the overall problem has magically disappeared or only after re-plugging the dock?
About re-plugging the dock and the spurious interrupts, does it help to
- boot
- plug the dock
- wait few seconds
- echo 1 | sudo tee /sys/bus/pci/rescan
If yes, you could leverage that via some udev rule…
Offline
I have to connect the dock to the laptop twice after every restart. Either I
start the laptop without the dock plugged
plug in dock after startup -> dmesg shows error
unplug the dock
plug dock back in -> no more errors
or
start laptop with dock plugged -> dmesg shows error after startup
unplug the dock
plug dock back in -> no more errors
Unfortunately, a PCI rescan has no effect at all... Should I see something in dmesg?
Offline
Unfortunately, a PCI rescan has no effect at all... Should I see something in dmesg?
No, the hope was that the pci rescan could do the job of the second plug (because you could catch that w/ a udev event…) speaking of which:
https://wiki.archlinux.org/title/Udev#Triggering_events
Does it help to trigger the event (you can look up its details w/ "udevadm monitor" and actually plugging the dock)
Offline