You are not logged in.
I see reports suggesting that some recent kernel releases may have introduced a regression. Affected systems used to work fine for a long time, but now USB 3.x controllers randomly disappear with all their devices, like that (dmesg/journalctl):
xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command
xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
xhci_hcd 0000:00:14.0: Timeout while waiting for stop endpoint command
xhci_hcd 0000:00:14.0: HC died; cleaning up
usb 3-2: USB disconnect, device number 2
usb 3-3: USB disconnect, device number 15
Upstream bug:
https://bugzilla.kernel.org/show_bug.cgi?id=219824
Recent burst of activity in a dormant forum thread:
https://bbs.archlinux.org/viewtopic.php?id=236536&p=2
The thread includes a manual workaround which restores operation on affected systems.
I am not affected, but I'm trying to gather information about this problem.
If anyone is seeing it on hardware which was free of such issues until this year,
1. When did it start, which kernel versions are affected?
2. How often does it happen, which kernel versions were used for much longer than that and worked?
3. Is it still a thing on the latest 6.13 release, on hardware which was free of this issue until this year?
4. Is anyone seeing it on linux-lts 6.12, on hardware which was free of this issue until this year?
5. What sort of XHCI controllers are affected (lspci -nn)?
6. Anything in particular triggering it? Seems to be random, maybe suspend/resume?
7. Can anyone produce dynamic debug log from this event which upstream asked for?
Offline
I'm affected by this issue since some days or maybe 2 weeks.
1. I think it started in 6.13.3 or 6.13.2 but I can not be very precise.
2. It happens about once per day.
3. Yes it is still happening on 6.13.5
4. I don't run an LTS version.
5. The laptop is an Intel based, a Dell Latitude 7430.
[root@soad ~]# lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Alder Lake-U15 Host and DRAM Controller [8086:4601] (rev 04)
00:02.0 VGA compatible controller [0300]: Intel Corporation Alder Lake-UP3 GT2 [Iris Xe Graphics] [8086:46a8] (rev 0c)
00:04.0 Signal processing controller [1180]: Intel Corporation Alder Lake Innovation Platform Framework Processor Participant [8086:461d] (rev 04)
00:06.0 PCI bridge [0604]: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 [8086:464d] (rev 04)
00:07.0 PCI bridge [0604]: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #0 [8086:466e] (rev 04)
00:07.1 PCI bridge [0604]: Intel Corporation Alder Lake-P Thunderbolt 4 PCI Express Root Port #1 [8086:463f] (rev 04)
00:08.0 System peripheral [0880]: Intel Corporation 12th Gen Core Processor Gaussian & Neural Accelerator [8086:464f] (rev 04)
00:0d.0 USB controller [0c03]: Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller [8086:461e] (rev 04)
00:0d.2 USB controller [0c03]: Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0 [8086:463e] (rev 04)
00:12.0 Serial controller [0700]: Intel Corporation Alder Lake-P Integrated Sensor Hub [8086:51fc] (rev 01)
00:14.0 USB controller [0c03]: Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller [8086:51ed] (rev 01)
00:14.2 RAM memory [0500]: Intel Corporation Alder Lake PCH Shared SRAM [8086:51ef] (rev 01)
00:14.3 Network controller [0280]: Intel Corporation Alder Lake-P PCH CNVi WiFi [8086:51f0] (rev 01)
00:15.0 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #0 [8086:51e8] (rev 01)
00:15.1 Serial bus controller [0c80]: Intel Corporation Alder Lake PCH Serial IO I2C Controller #1 [8086:51e9] (rev 01)
00:16.0 Communication controller [0780]: Intel Corporation Alder Lake PCH HECI Controller [8086:51e0] (rev 01)
00:16.3 Serial controller [0700]: Intel Corporation Alder Lake AMT SOL Redirection [8086:51e3] (rev 01)
00:1f.0 ISA bridge [0601]: Intel Corporation Alder Lake PCH eSPI Controller [8086:5182] (rev 01)
00:1f.3 Audio device [0403]: Intel Corporation Alder Lake PCH-P High Definition Audio Controller [8086:51c8] (rev 01)
00:1f.4 SMBus [0c05]: Intel Corporation Alder Lake PCH-P SMBus Host Controller [8086:51a3] (rev 01)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Alder Lake-P PCH SPI Controller [8086:51a4] (rev 01)
01:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller PM9A1/PM9A3/980PRO [144d:a80a]
6. Seems to be random but I was not very attentive TBH.
7. Not now because I'm at my office but I will try to do it tomorrow.
Offline
Could you try the previous versions from the arch linux archive?
sudo pacman -U https://archive.archlinux.org/packages/l/linux/linux-6.13.2.arch1-1-x86_64.pkg.tar.zst
sudo pacman -U https://archive.archlinux.org/packages/l/linux/linux-6.13.3.arch1-1-x86_64.pkg.tar.zst
You can try the other versions by just adapting the URL as needed
Last edited by gromit (2025-03-03 15:11:28)
Offline
Yes I can do that. But since I can't reproduce the issue "on demand", I'm gonna test each version during one day.
Offline
Uh, bugs that are hard to reproduce are notoriously hard to debug
Offline
Yep, sounds like one of them...
Offline
Hi, thanks for the response.
Turns out, there is indeed a brand new bug in the 6.13 series which may cause such problems at some random time after resuming from suspend.
https://web.git.kernel.org/pub/scm/linu … b3f9e57e3b
If that sounds like your case, switching to linux-lts until the fix reaches Arch or preemptively doing the unbind/bind trick after resume should prevent random failures.
If you never suspend or if 6.12 starts having these problems, that will be something to worry about.
Last edited by mmy8x (2025-03-04 19:12:25)
Offline
I did not trigger the bug yesterday on 6.13.2, I stay on this version today.
Turns out, there is indeed a brand new bug in the 6.13 series which may cause such problems at some random time after resuming from suspend.
https://web.git.kernel.org/pub/scm/linu … b3f9e57e3b
That's interesting. I don't suspend my laptop manually, I use Sway and I don't have any swayidle call or similar tools in my config. I'll try to suspend my system to see if I reproduce the issue.
Another thing I did not mention is that the affected USB port is also my charging port. It is connected to my monitor which acts as a USB hub and power adapter.
Edit: I just realized that the day when the bug occurred, I unplugged my laptop to attend a meeting in a room, and I may close the lid before re-plugging it. That may triggered the suspend mode.
Last edited by Cyb3rD4d (2025-03-05 09:35:58)
Offline
This is being tracked here, help is needed:
https://bugzilla.kernel.org/show_bug.cgi?id=219824
One of the USB subsystem maintainers said reverting https://web.git.kernel.org/pub/scm/linu … 20c051f335 should be tried.
Please try `patch -R` this commit and check if it you can still repro the issue.
The patch to fix the issue (a revert) has been queued for 6.13.6:
https://lore.kernel.org/lkml/2025030408 … f@foxbook/
If you want to fix the issue ASAP, apply it.
Last edited by birdie-github (2025-03-06 12:06:43)
Offline
The patch to fix the issue (a revert) has been queued for 6.13.6:
Please link to the revert in the 6.13 queue. As of writing the patch has not been accepted in Linus's tree https://web.git.kernel.org/pub/scm/linu … ost/xhci.c which is a prerequisite for it being queued for stable and I do not see it in https://web.git.kernel.org/pub/scm/linu … queue-6.13.
Edit:
Also not listed in 6.13.6-rc2 https://lore.kernel.org/stable/20250306 … ation.org/ so probably 6.13.7 or 6.13.8 or ask the Arch linux package maintainers to pull the commit sooner by opening an issue on Arch's gitlab instance.
Last edited by loqs (2025-03-06 19:02:42)
Offline
The revert will be applied to the arch specific patches.
Offline
birdie-github wrote:The patch to fix the issue (a revert) has been queued for 6.13.6:
Please link to the revert in the 6.13 queue. As of writing the patch has not been accepted in Linus's tree https://web.git.kernel.org/pub/scm/linu … ost/xhci.c which is a prerequisite for it being queued for stable and I do not see it in https://web.git.kernel.org/pub/scm/linu … queue-6.13.
Edit:
Also not listed in 6.13.6-rc2 https://lore.kernel.org/stable/20250306 … ation.org/ so probably 6.13.7 or 6.13.8 or ask the Arch linux package maintainers to pull the commit sooner by opening an issue on Arch's gitlab instance.
I confirm that with 6.13.6 there is still problem.
Arch with notebook HP.
This morning, dmesg:
[192782.829393] xhci_hcd 0000:00:14.0: xHCI host not responding to stop endpoint command
[192782.829420] xhci_hcd 0000:00:14.0: xHCI host controller not responding, assume dead
[192782.829448] xhci_hcd 0000:00:14.0: HC died; cleaning up
[192782.829447] xhci_hcd 0000:00:14.0: Timeout while waiting for stop endpoint command
[192782.829482] usb 3-2: USB disconnect, device number 2
[192782.835615] usb 3-5: USB disconnect, device number 13
Offline
The known bug is not fixed in upstream 6.13.6, at best it may land in 6.13.7 sometime in near future.
However, linux-6.13.6.arch1-1 appears to have the fix applied:
https://github.com/archlinux/linux/rele … .patch.zst
and in my testing the known bug no longer reproduces on this Arch kernel:
Name : linux
Version : 6.13.6.arch1-1
Description : The Linux kernel and modules
Architecture : x86_64
URL : https://github.com/archlinux/linux
Licenses : GPL-2.0-only
Groups : None
Provides : KSMBD-MODULE VIRTUALBOX-GUEST-MODULES WIREGUARD-MODULE
Depends On : coreutils initramfs kmod
Optional Deps : linux-firmware: firmware images needed for some devices [installed]
scx-scheds: to use sched-ext schedulers
wireless-regdb: to set the correct wireless channels of your country [installed]
Required By : None
Optional For : base
Conflicts With : None
Replaces : virtualbox-guest-modules-arch wireguard-arch
Installed Size : 138.39 MiB
Packager : Jan Alexander Steffens (heftig) <heftig@archlinux.org>
Build Date : Fri Mar 7 21:19:00 2025
Install Date : Tue Mar 11 08:55:44 2025
Install Reason : Explicitly installed
Install Script : No
Validated By : Signature
So a few questions:
Is this the exact package you are using? Are you running Arch, or something "Arch-like" with its own kernel package?
Are you using system suspend? The known bug only triggers after resuming, otherwise you cannot possibly be affected by it and this patch won't help.
Is this a problem unique to 6.13? Did it exist before? Does it exist on linux-lts?
What's the hardware involved? I see plenty of those 0000:00:14.0 recently, it seems to be some Intel platform IIRC, is in not? There may still be some other problem, possibly triggered by particular HW.
Offline
Just to confirm, the patch shipped with linux 6.13.6.arch1-1 fixes the problem for me.
Offline
Problem fixed in 6.13.7
https://cdn.kernel.org/pub/linux/kernel … Log-6.13.7
...
commit 80cb8e694110dee4ac6fbf0956ba7439aeb0603d
Author: Michal Pecio <michal.pecio@gmail.com>
Date: Tue Mar 4 13:31:47 2025 +0200
usb: xhci: Fix host controllers "dying" after suspend and resume
commit c7c1f3b05c67173f462d73d301d572b3f9e57e3b upstream.
A recent cleanup went a bit too far and dropped clearing the cycle bit
of link TRBs, so it stays different from the rest of the ring half of
the time. Then a race occurs: if the xHC reaches such link TRB before
more commands are queued, the link's cycle bit unintentionally matches
the xHC's cycle so it follows the link and waits for further commands.
If more commands are queued before the xHC gets there, inc_enq() flips
the bit so the xHC later sees a mismatch and stops executing commands.
...
Offline
I confirm that with 6.13.6 there is still problem.
Problem fixed in 6.13.7
So which is it? On Arch, both releases include this patch.
Either you aren't using Arch, or you haven't rebooted after upgrading to 6.13.6, or you found another bug that's still unfixed.
Offline
Honstyxi wrote:I confirm that with 6.13.6 there is still problem.
Honstyxi wrote:Problem fixed in 6.13.7
So which is it? On Arch, both releases include this patch.
Either you aren't using Arch, or you haven't rebooted after upgrading to 6.13.6, or you found another bug that's still unfixed.
I download and compile the kernel directly from kernel.org
Until version 6.13.6 there was the bug..
This morning the kernel version 6.13.7 came out, downloaded, compiled and now everything is ok.
that's all
Offline