You are not logged in.

#1 2023-01-05 11:04:49

GRUNT16
Member
Registered: 2023-01-04
Posts: 4

PCIe and USB suddenly die

Greetings gentlemen. Today was the day I decided to move from Windows to Linux and my choice fell on Arch. I downloaded ISO from official website, checksum and PGP was okay. During its installation process it began to crash. It would suddenly (not connected to my actions) spit about three lines of logs to the terminal, become suuuper slow and laggy for 5 to 30 seconds and eventually crash.

Here's some lines from dmesg/journalctl:

 янв 05 12:48:06 GRUNT kernel: igb 0000:08:00.0 enp8s0: PCIe link lost
янв 05 12:48:06 GRUNT kernel: xhci_hcd 0000:0b:00.3: xHCI host controller not responding, assume dead
янв 05 12:48:06 GRUNT kernel: xhci_hcd 0000:0b:00.3: HC died; cleaning up
янв 05 12:48:06 GRUNT kernel: usb 3-4: USB disconnect, device number 2
янв 05 12:48:06 GRUNT kernel: clocksource: timekeeping watchdog on CPU1: hpet wd-wd read-back delay of 201328031ns
янв 05 12:48:06 GRUNT kernel: clocksource: wd-tsc-wd read-back delay of 201328381ns, clock-skew test skipped!
янв 05 12:48:06 GRUNT kernel: usb 4-3: USB disconnect, device number 2
янв 05 12:48:06 GRUNT kernel: sdb: detected capacity change from 121610240 to 0
янв 05 12:48:06 GRUNT kernel: pcieport 0000:05:07.0: Unable to change power state from D3hot to D0, device inaccessible

It seems that PCI and USB interfaces suddenly die.
I looked up this problem and found someone saying that it could be power management issues . So I tried adding pcie_aspm=off to my grub. So /etc/default/grub has this line:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=3 quiet pcie_aspm=off"

I made grub-mkconfig and rebooted, but no luck: it crashed again. Now I have absolutely no idea what can be cause of this issue.
This problem occurs on live versions too: both archiso and archcraft-live. But only on Arch

P.S.
There was some other issues with system that can be somehow related to the main problem:
1) Energy Efficient Ethernet (EEE)
Ethernet interface would become dead if its EEE parameter was set to on
2) There is some crappy usb initialization going on during startup
Example:

янв 04 20:23:53 archlinux kernel: usb 1-4: config 1 interface 0 altsetting 0 has 1 endpoint descriptor, different from the interface descriptor's value: 0
янв 04 20:23:53 archlinux kernel: usb 1-4: config 1 interface 1 altsetting 0 has 1 endpoint descriptor, different from the interface descriptor's value: 0
янв 04 20:23:53 archlinux kernel: usb 1-4: New USB device found, idVendor=2e97, idProduct=1001, bcdDevice= 0.19
янв 04 20:23:53 archlinux kernel: usb 1-4: New USB device strings: Mfr=1, Product=2, SerialNumber=0
янв 04 20:23:53 archlinux kernel: usb 1-4: Product: USB DEVICE
янв 04 20:23:53 archlinux kernel: usb 1-4: Manufacturer: SONiX
янв 04 20:23:53 archlinux kernel: usbhid 1-4:1.0: couldn't find an input interrupt endpoint
янв 04 20:23:53 archlinux kernel: usbhid 1-4:1.1: couldn't find an input interrupt endpoint
янв 04 20:23:53 archlinux kernel: usb 1-5: device descriptor read/64, error -71
янв 04 20:23:53 archlinux kernel: usb 1-5: device descriptor read/64, error -71

But it doesn't seem to have any effect
---------------------------------------------------------
journalctl -b -1
journalctl -b -3 (-2 was not the crash)
journalctl -b -4
lspci
lsusb

Offline

#2 2023-01-05 11:54:12

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 25,259

Re: PCIe and USB suddenly die

All your logs, when you presumably opt for shutting down, show that your clocksource becomes unstable and is switched to hpet from tsc.

I've seen some mentions floating about this might be caused by a peculiarity with certain setups and the 6.1 kernel. Maybe test the LTS kernel to check whether you can reproduce there. Some other  options would be checking for a  UEFI upgrade of your firmware assuming it's some lower level issue. And try the kernels suggestion of tsc=unstable on the kernel parameters.

Offline

#3 2023-01-07 00:18:46

GRUNT16
Member
Registered: 2023-01-04
Posts: 4

Re: PCIe and USB suddenly die

I tried putting tsc=unstable, so now it uses hpet straightaway. Everything seem to work nicely for about 3 hours now (usually it would crash less then hour after restart). I'll left it for couple more hours and if everythings fine marking as solved

Offline

#4 2023-01-07 11:52:47

GRUNT16
Member
Registered: 2023-01-04
Posts: 4

Re: PCIe and USB suddenly die

Nope, crash again. Same symptoms and logs, but without any mentions about clocksource now. Once, however, USB and Ethernet disconnected just like that, without a crash, and I managed to reload them, but almost immediately after that they again disconnected and it crashed.

I tried installing lts kernel version(5.15.86-1-lts), without changing tsc=unstable. And it didn't help either, but logs a bit different though

янв 07 17:10:47 GRUNT kernel: hrtimer: interrupt took 604011264 ns
янв 07 17:10:47 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 1969 (reg 929, index 4) beyond range (117, 1011)
янв 07 17:10:48 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:48 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 1970 (reg 929, index 5) beyond range (117, 1011)
янв 07 17:10:48 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:50 GRUNT kernel: igb 0000:08:00.0 enp8s0: PCIe link lost
янв 07 17:10:51 GRUNT kernel: xhci_hcd 0000:0b:00.3: xHCI host controller not responding, assume dead
янв 07 17:10:51 GRUNT kernel: xhci_hcd 0000:0b:00.3: HC died; cleaning up
янв 07 17:10:51 GRUNT kernel: pcieport 0000:05:07.0: can't change power state from D3cold to D0 (config space inaccessible)
янв 07 17:10:52 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 490 (reg 5487, index 4) beyond range (687, 1580)
янв 07 17:10:52 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:52 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 491 (reg 5487, index 5) beyond range (687, 1580)
янв 07 17:10:53 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:53 GRUNT kernel: pcieport 0000:05:03.0: can't change power state from D3cold to D0 (config space inaccessible)
янв 07 17:10:53 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 1899 (reg 378, index 3) beyond range (48, 942)
янв 07 17:10:53 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:54 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 1900 (reg 378, index 4) beyond range (48, 942)
янв 07 17:10:54 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:54 GRUNT kernel: xhci_hcd 0000:10:00.3: Frame ID 1901 (reg 378, index 5) beyond range (48, 942)
янв 07 17:10:55 GRUNT kernel: xhci_hcd 0000:10:00.3: Ignore frame ID field, use SIA bit instead
янв 07 17:10:55 GRUNT kernel: pcieport 0000:05:01.0: can't change power state from D3cold to D0 (config space inaccessible)
янв 07 17:10:55 GRUNT kernel: usb 3-4: USB disconnect, device number 2
янв 07 17:10:55 GRUNT kernel: usb 4-3: USB disconnect, device number 2
янв 07 17:10:55 GRUNT kernel: INFO: NMI handler (perf_event_nmi_handler) took too long to run: 184.961 msecs
янв 07 17:10:55 GRUNT kernel: perf: interrupt took too long (1445021 > 2500), lowering kernel.perf_event_max_sample_rate to 100
янв 07 17:10:55 GRUNT kernel: sdb: detected capacity change from 121610240 to 0
янв 07 17:10:51 GRUNT rtkit-daemon[522]: The canary thread is apparently starving. Taking action.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Demoting known real-time threads.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Successfully demoted thread 1135 of process 515.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Successfully demoted thread 540 of process 515.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Successfully demoted thread 537 of process 515.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Successfully demoted thread 535 of process 515.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Successfully demoted thread 533 of process 515.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Successfully demoted thread 515 of process 515.
янв 07 17:10:51 GRUNT rtkit-daemon[522]: Demoted 6 threads.
янв 07 17:11:03 GRUNT kernel: perf: interrupt took too long (2116394 > 1806276), lowering kernel.perf_event_max_sample_rate to 100
янв 07 17:11:09 GRUNT kernel: watchdog: BUG: soft lockup - CPU#10 stuck for 26s! [kworker/u64:3:667]

I will look for any bios updates now and check if it was the cause
----------------------------------------------------------------------------
journalctl -b -2 (tsc=unstable)
journalctl -b -1 (tsc=unsable and lts kernel)

Last edited by GRUNT16 (2023-01-07 13:32:09)

Offline

#5 2023-01-07 16:12:15

GRUNT16
Member
Registered: 2023-01-04
Posts: 4

Re: PCIe and USB suddenly die

BIOS wasn't the problem. It crashes just the same. Maybe even more often than before.
I looked up this line

watchdog: BUG: soft lockup - CPU#10 stuck for 26s! [kworker/u64:3:667]

It was said that soft lock may occur if system runs out of free memory or there are troubles with swap file. There's about 30 gigs of free RAM at the moment of crash and 8 gigs swap file.
Is there something else I could do to make it work or is it a pure hardware issue and I should accept this fact?
UPD:
memtest86 showed no errors

Last edited by GRUNT16 (2023-01-07 18:52:27)

Offline

Board footer

Powered by FluxBB