You are not logged in.

#1 2022-12-02 22:10:33

hunter10
Member
Registered: 2022-08-29
Posts: 26

[SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

Hello!

We're running into an issue utilizing Intel NUC11ATKC4000 computers where all USB devices and bluetooth stop working. This has happened twice over multiple months (these 8 computers have been running since October almost non-stop).
This issue has occurred more recently (within the last 30 days), but I am unable to verify if the kernel version has an effect since the issue is extremely rare.

We are using Arch Linux with kernel v5.15.77-1-lts.

These computers are communicating constantly with a USB device via UART communication (not sure if that may have an effect).

From some research, it looks to be an old issue - just wanted to see if anyone has any solutions other than what I found from the following forum - https://bbs.archlinux.org/viewtopic.php?id=231078:

echo -n "0000:00:14.0" | tee /sys/bus/pci/drivers/xhci_hcd/unbind
sleep 5
echo -n "0000:00:14.0" | tee /sys/bus/pci/drivers/xhci_hcd/bind

I have also found this issue in the following topics:
https://bbs.archlinux.org/viewtopic.php?id=236536
https://bugs.launchpad.net/ubuntu/+sour … ug/1313279

Relevant journalctl logs:

Nov 29 22:18:09 myuser rtkit-daemon[585]: The canary thread is apparently starving. Taking action.
Nov 29 22:18:09 myuser kernel: xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command.
Nov 29 22:18:09 myuser kernel: xhci_hcd 0000:00:14.0: USBSTS: 0x00000008 EINT
Nov 29 22:18:09 myuser kernel: xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
Nov 29 22:18:09 myuser kernel: xhci_hcd 0000:00:14.0: HC died; cleaning up
Nov 29 22:18:09 myuser kernel: usb 1-5: USB disconnect, device number 2

Also one log entry when bluetooth service was stopped about 22 hours before:

Nov 29 00:35:38 myuser systemd[1]: Stopping Bluetooth service...
Nov 29 00:35:38 myuser kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.
Nov 29 00:35:38 myuser bluetoothd[15246]: Stopping SDP server
Nov 29 00:35:38 myuser bluetoothd[15246]: Exit
Nov 29 00:35:38 myuser systemd[1]: bluetooth.service: Deactivated successfully.
Nov 29 00:35:38 myuser systemd[1]: Stopped Bluetooth service.

Also I should mention the BIOS was updated from ATJSLCPX.0035.2022.0318.1130 to ATJSLCPX.0037.2022.0715.1547. See https://downloadmirror.intel.com/740799 … eNotes.pdf. The issue occurred with both versions of BIOS.

Last edited by hunter10 (2023-03-22 23:56:34)

Offline

#2 2022-12-03 14:37:47

seth
Member
Registered: 2012-09-03
Posts: 51,046

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

Does rebinding the port help?
I guess "myuser" is meant to be "yourhost" and "EINT" is actually "EINTR"?
Please post verbatim logs - and certainly far more context. The snippet shows the symptom, but of course nothing that could even possibly trigger this.

Widl guess: disable USB autosuspend, https://wiki.archlinux.org/title/Power_ … utosuspend
"usbcore.autosuspend=-1" disables it, but power management tools will change this value at runtime.

These computers are communicating constantly with a USB device via UART communication (not sure if that may have an effect).

Can you turn one of them into a testcase and remove that device (to see whether that stabilizes the bus)?

Offline

#3 2022-12-05 22:41:00

hunter10
Member
Registered: 2022-08-29
Posts: 26

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

Hey Seth, thanks for the reply.

Yes, "myuser" is just "yourhost" name. "EINT", however, is exactly what showed up in the logs.

Expanded logs after first xhci_pcd error: https://pastebin.com/er4Xgiwe
Expanded logs after xhcI controller dead: https://pastebin.com/myHJVqa4

I do have multiple computers that are not hooked into the USB device and as such never perform UART communication. Still waiting to see if this error ever occurs on those (have only seen this issue twice in a couple months so it may be quite some time before being able to see if the computers that are not hooked into the USB device ever have this issue).

For further context - the computers constantly talk to the USB device with a single UART request as a sort of "watchdog". If the computer ever stops talking via UART, this watchdog device will cut power and re-apply power to restart the computer. As of this time I'm performing tests to make sure these computers can be on without a hitch as long as possible - so the device is not currently restarting the computer if it stops talking to it.


I'll also try the wild guess and disable USB autosuspend next week (12/12/2022) and hopefully have some results in a couple more months to see if that makes it so this error does not occur (hoping so!).

Offline

#4 2022-12-06 08:34:05

seth
Member
Registered: 2012-09-03
Posts: 51,046

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

Nov 29 00:35:38 myuser kernel: xhci_hcd 0000:00:14.0: WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state.

Nothing ahead of this (eg. DMAR errors)?

Offline

#5 2022-12-06 16:42:34

hunter10
Member
Registered: 2022-08-29
Posts: 26

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

No DMAR errors seen - https://pastebin.com/mXX3WiXH

Here's what I have found in the log related to DMAR:

-- Boot b484b2dffe7644f0aaf4d99a3e157e2b --
Nov 28 15:41:28 simplifire kernel: ACPI: DMAR 0x000000007265D000 000088 (v02 INTEL  NUC11ATB 00000025      01000013)
Nov 28 15:41:28 simplifire kernel: ACPI: Reserving DMAR table memory at [mem 0x7265d000-0x7265d087]
Nov 28 15:41:28 simplifire kernel: DMAR: Host address width 39
Nov 28 15:41:28 simplifire kernel: DMAR: DRHD base: 0x000000fed90000 flags: 0x0
Nov 28 15:41:28 simplifire kernel: DMAR: dmar0: reg_base_addr fed90000 ver 4:0 cap 1c0000c40660462 ecap 49e2ff0505e
Nov 28 15:41:28 simplifire kernel: DMAR: DRHD base: 0x000000fed91000 flags: 0x1
Nov 28 15:41:28 simplifire kernel: DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da
Nov 28 15:41:28 simplifire kernel: DMAR: RMRR base: 0x0000007b800000 end: 0x0000007fffffff
Nov 28 15:41:28 simplifire kernel: DMAR-IR: IOAPIC id 2 under DRHD base  0xfed91000 IOMMU 1
Nov 28 15:41:28 simplifire kernel: DMAR-IR: HPET id 0 under DRHD base 0xfed91000
Nov 28 15:41:28 simplifire kernel: DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
Nov 28 15:41:28 simplifire kernel: DMAR-IR: Enabled IRQ remapping in x2apic mode
Nov 28 15:41:28 simplifire kernel: pci 0000:00:02.0: DMAR: Skip IOMMU disabling for graphics

Offline

#6 2023-01-21 13:11:46

zebulon
Member
Registered: 2008-10-20
Posts: 357

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

seth wrote:

Widl guess: disable USB autosuspend, https://wiki.archlinux.org/title/Power_ … utosuspend
"usbcore.autosuspend=-1" disables it, but power management tools will change this value at runtime.

Hi, I just wanted to let you know this is not a "wild guess". I had the de-connection problem with a Galaxy S10 and  found this thread searching for a solution. Indeed, disabling USB autosuspend fixed the issue. The issue was that, firstly, the S10 was difficult to plug and get it recognized in Dolphin. In addition, it disconnected as soon as I was trying to transfer files. So thank you seth!
EDIT: I still have from time to time disconnections,but this greatly improved the behaviour of the phone.

Last edited by zebulon (2023-01-21 13:15:42)

Offline

#7 2023-01-21 15:19:04

seth
Member
Registered: 2012-09-03
Posts: 51,046

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

It's a wild guess because it wasn't based on any data, just "this often screws things up, so let's look there"
And unfortunately i doesn't seem to be the cause for the OP either (who however runs a very specific setup)

"Galaxy S10", "recognized in Dolphin", "disconnected as soon as I was trying to transfer files" is more an mtp thing, which is a notoriously nasty protocol and issues w/ the full MTP implementations like libmtp seem to be absolutely common.
Try using https://aur.archlinux.org/packages/simple-mtpfs instead (I settled for this and never again had any transfer issues)

Offline

#8 2023-03-22 23:55:55

hunter10
Member
Registered: 2022-08-29
Posts: 26

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

Just want to update this and state that it has been solved by using usbcore.autosuspend=-1

Offline

#9 2023-04-07 17:55:24

adomino-engineer
Member
Registered: 2023-04-01
Posts: 8

Re: [SOLVED] xhci_hcd: xHCI host not responding to stop endpoint command

I don't believe that this is solved; I've run into this same issue after using usbcore.autosuspend=-1 via /etc/udev/rules.d/50-usb_power_save.rules:

ACTION=="add", SUBSYSTEM=="usb", TEST=="power/autosuspend", ATTR{power/autosuspend}="-1"

I doubled checked to verify that power/autosuspend was set to -1 via "cat /sys/devices/pci0000:00/0000:00:14.0/usb2/power/autosuspend", "cat /sys/devices/pci0000:00/0000:00:14.0/usb4/power/autosuspend", etc., and this USB power setting appeared to be set correctly everywhere.

Also, I have tried disabling pcie_aspm and pcie_port_pm via /etc/default/grub:

GRUB_CMDLINE_LINUX="pcie_aspm=off pcie_port_pm=off"

And I triple checked that things got set correctly here too in the /sys/devices/pci0000:00/0000:00:14.0 directory after remaking /boot/grub/grub.cfg, but these changes didn't seem to resolve the problem.

Although, it should be mentioned that these changes greatly reduced the prevalence of this issue, and this issue for me has only ever occur in tandem with a CPU lockup related to xhci/DMA, which could be apart of the real underlying problem in the first place in my opinion, especially considering that the same person who started this thread, started that thread with a similar setup, if not the same one.

Last edited by adomino-engineer (2023-04-07 18:03:45)

Offline

Board footer

Powered by FluxBB