You are not logged in.

#1 2017-05-25 18:56:02

cmtonkinson
Member
Registered: 2013-11-15
Posts: 18

Failed boot with 4.11.2-1 when USB device connected - where to post?

Looking for some guidance on how to proceed vis-a-vis what I think may be a bug in kernel USB code. I'd love if someone could help me confirm or disprove that assessment, as I know enough only to be dangerous.

Tried to boot the system last night with 4.11.2-1. Systemd task for load kernel modules timed out. System booted into Gnome login screen, but was totally unresponsive (although the clock did keep up to date). I started unplugging hardware. After a successful boot I saw saw this in journalctl:

May 24 21:36:59 sub.domain.tld systemd-udevd[390]: seq 3714 '/devices/pci0000:00/0000:00:14.0/usb3' is taking a long time
May 24 21:37:29 sub.domain.tld systemd[1]: systemd-modules-load.service: Start operation timed out. Terminating.
May 24 21:37:51 sub.domain.tld kernel: INFO: task kworker/0:1:82 blocked for more than 120 seconds.
May 24 21:37:51 sub.domain.tld kernel:       Tainted: P           O    4.11.2-1-ARCH #1
May 24 21:37:51 sub.domain.tld kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 24 21:37:51 sub.domain.tld kernel: kworker/0:1     D    0    82      2 0x00000000
May 24 21:37:51 sub.domain.tld kernel: Workqueue: usb_hub_wq hub_event [usbcore]
May 24 21:37:51 sub.domain.tld kernel: Call Trace:
May 24 21:37:51 sub.domain.tld kernel:  __schedule+0x22e/0x8e0
May 24 21:37:51 sub.domain.tld kernel:  schedule+0x3d/0x90
May 24 21:37:51 sub.domain.tld kernel:  schedule_timeout+0x224/0x3c0
May 24 21:37:51 sub.domain.tld kernel:  ? lock_timer_base+0x81/0xa0
May 24 21:37:51 sub.domain.tld kernel:  ? __internal_add_timer+0x1f/0x60
May 24 21:37:51 sub.domain.tld kernel:  ? mod_timer+0x178/0x310
May 24 21:37:51 sub.domain.tld kernel:  wait_for_common+0xb9/0x180
May 24 21:37:51 sub.domain.tld kernel:  ? wait_for_common+0xb9/0x180
May 24 21:37:51 sub.domain.tld kernel:  ? wake_up_q+0x80/0x80
May 24 21:37:51 sub.domain.tld kernel:  wait_for_completion+0x1d/0x20
May 24 21:37:51 sub.domain.tld kernel:  xhci_alloc_dev+0xb8/0x2d0 [xhci_hcd]
May 24 21:37:51 sub.domain.tld kernel:  usb_alloc_dev+0x72/0x310 [usbcore]
May 24 21:37:51 sub.domain.tld kernel:  hub_event+0x59e/0x1550 [usbcore]
May 24 21:37:51 sub.domain.tld kernel:  process_one_work+0x1e0/0x490
May 24 21:37:51 sub.domain.tld kernel:  worker_thread+0x48/0x4e0
May 24 21:37:51 sub.domain.tld kernel:  kthread+0x125/0x140
May 24 21:37:51 sub.domain.tld kernel:  ? process_one_work+0x490/0x490
May 24 21:37:51 sub.domain.tld kernel:  ? kthread_create_on_node+0x70/0x70
May 24 21:37:51 sub.domain.tld kernel:  ret_from_fork+0x2c/0x40
May 24 21:38:59 sub.domain.tld systemd[1]: systemd-modules-load.service: State 'stop-final-sigterm' timed out. Killing.
May 24 21:38:59 sub.domain.tld systemd[1]: systemd-modules-load.service: Killing process 356 (systemd-modules) with signal SIGKILL.
May 24 21:38:59 sub.domain.tld systemd-udevd[390]: seq 3714 '/devices/pci0000:00/0000:00:14.0/usb3' killed
May 24 21:38:59 sub.domain.tld systemd-udevd[390]: worker [428] terminated by signal 9 (Killed)
May 24 21:38:59 sub.domain.tld systemd-udevd[390]: worker [428] failed while handling '/devices/pci0000:00/0000:00:14.0/usb3'
May 24 21:39:54 sub.domain.tld kernel: INFO: task kworker/0:1:82 blocked for more than 120 seconds.
May 24 21:39:54 sub.domain.tld kernel:       Tainted: P           O    4.11.2-1-ARCH #1
May 24 21:39:54 sub.domain.tld kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 24 21:39:54 sub.domain.tld kernel: kworker/0:1     D    0    82      2 0x00000000
May 24 21:39:54 sub.domain.tld kernel: Workqueue: usb_hub_wq hub_event [usbcore]
May 24 21:39:54 sub.domain.tld kernel: Call Trace:
May 24 21:39:54 sub.domain.tld kernel:  __schedule+0x22e/0x8e0
May 24 21:39:54 sub.domain.tld kernel:  schedule+0x3d/0x90
May 24 21:39:54 sub.domain.tld kernel:  schedule_timeout+0x224/0x3c0
May 24 21:39:54 sub.domain.tld kernel:  ? lock_timer_base+0x81/0xa0
May 24 21:39:54 sub.domain.tld kernel:  ? __internal_add_timer+0x1f/0x60
-- The start-up result is done.
May 24 21:36:00 sub.domain.tld kernel: input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1c.7/0000:06:00.1/sound/card1/input17
May 24 21:36:00 sub.domain.tld kernel: input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1c.7/0000:06:00.1/sound/card1/input18
May 24 21:36:00 sub.domain.tld kernel: input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1c.7/0000:06:00.1/sound/card1/input19
May 24 21:36:00 sub.domain.tld kernel: input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1c.7/0000:06:00.1/sound/card1/input20
May 24 21:36:59 sub.domain.tld systemd-udevd[390]: seq 3714 '/devices/pci0000:00/0000:00:14.0/usb3' is taking a long time
May 24 21:37:29 sub.domain.tld systemd[1]: systemd-modules-load.service: Start operation timed out. Terminating.
May 24 21:37:51 sub.domain.tld kernel: INFO: task kworker/0:1:82 blocked for more than 120 seconds.
May 24 21:37:51 sub.domain.tld kernel:       Tainted: P           O    4.11.2-1-ARCH #1
May 24 21:37:51 sub.domain.tld kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 24 21:37:51 sub.domain.tld kernel: kworker/0:1     D    0    82      2 0x00000000
May 24 21:37:51 sub.domain.tld kernel: Workqueue: usb_hub_wq hub_event [usbcore]
May 24 21:37:51 sub.domain.tld kernel: Call Trace:
May 24 21:37:51 sub.domain.tld kernel:  __schedule+0x22e/0x8e0
May 24 21:37:51 sub.domain.tld kernel:  schedule+0x3d/0x90
May 24 21:37:51 sub.domain.tld kernel:  schedule_timeout+0x224/0x3c0
May 24 21:37:51 sub.domain.tld kernel:  ? lock_timer_base+0x81/0xa0
May 24 21:37:51 sub.domain.tld kernel:  ? __internal_add_timer+0x1f/0x60
May 24 21:37:51 sub.domain.tld kernel:  ? mod_timer+0x178/0x310
May 24 21:37:51 sub.domain.tld kernel:  wait_for_common+0xb9/0x180
May 24 21:37:51 sub.domain.tld kernel:  ? wait_for_common+0xb9/0x180
May 24 21:37:51 sub.domain.tld kernel:  ? wake_up_q+0x80/0x80
May 24 21:37:51 sub.domain.tld kernel:  wait_for_completion+0x1d/0x20
May 24 21:37:51 sub.domain.tld kernel:  xhci_alloc_dev+0xb8/0x2d0 [xhci_hcd]
May 24 21:37:51 sub.domain.tld kernel:  usb_alloc_dev+0x72/0x310 [usbcore]
May 24 21:37:51 sub.domain.tld kernel:  hub_event+0x59e/0x1550 [usbcore]
May 24 21:37:51 sub.domain.tld kernel:  process_one_work+0x1e0/0x490
May 24 21:37:51 sub.domain.tld kernel:  worker_thread+0x48/0x4e0
May 24 21:37:51 sub.domain.tld kernel:  kthread+0x125/0x140
May 24 21:37:51 sub.domain.tld kernel:  ? process_one_work+0x490/0x490
May 24 21:37:51 sub.domain.tld kernel:  ? kthread_create_on_node+0x70/0x70
May 24 21:37:51 sub.domain.tld kernel:  ret_from_fork+0x2c/0x40
May 24 21:38:59 sub.domain.tld systemd[1]: systemd-modules-load.service: State 'stop-final-sigterm' timed out. Killing.
May 24 21:38:59 sub.domain.tld systemd[1]: systemd-modules-load.service: Killing process 356 (systemd-modules) with signal SIGKILL.
May 24 21:38:59 sub.domain.tld systemd-udevd[390]: seq 3714 '/devices/pci0000:00/0000:00:14.0/usb3' killed
May 24 21:38:59 sub.domain.tld systemd-udevd[390]: worker [428] terminated by signal 9 (Killed)
May 24 21:38:59 sub.domain.tld systemd-udevd[390]: worker [428] failed while handling '/devices/pci0000:00/0000:00:14.0/usb3'
May 24 21:39:54 sub.domain.tld kernel: INFO: task kworker/0:1:82 blocked for more than 120 seconds.
May 24 21:39:54 sub.domain.tld kernel:       Tainted: P           O    4.11.2-1-ARCH #1
May 24 21:39:54 sub.domain.tld kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 24 21:39:54 sub.domain.tld kernel: kworker/0:1     D    0    82      2 0x00000000
May 24 21:39:54 sub.domain.tld kernel: Workqueue: usb_hub_wq hub_event [usbcore]
May 24 21:39:54 sub.domain.tld kernel: Call Trace:
May 24 21:39:54 sub.domain.tld kernel:  __schedule+0x22e/0x8e0
May 24 21:39:54 sub.domain.tld kernel:  schedule+0x3d/0x90
May 24 21:39:54 sub.domain.tld kernel:  schedule_timeout+0x224/0x3c0
May 24 21:39:54 sub.domain.tld kernel:  ? lock_timer_base+0x81/0xa0
May 24 21:39:54 sub.domain.tld kernel:  ? __internal_add_timer+0x1f/0x60
May 24 21:39:54 sub.domain.tld kernel:  ? mod_timer+0x178/0x310
May 24 21:39:54 sub.domain.tld kernel:  wait_for_common+0xb9/0x180
May 24 21:39:54 sub.domain.tld kernel:  ? wait_for_common+0xb9/0x180
May 24 21:39:54 sub.domain.tld kernel:  ? wake_up_q+0x80/0x80
May 24 21:39:54 sub.domain.tld kernel:  wait_for_completion+0x1d/0x20
May 24 21:39:54 sub.domain.tld kernel:  xhci_alloc_dev+0xb8/0x2d0 [xhci_hcd]
May 24 21:39:54 sub.domain.tld kernel:  usb_alloc_dev+0x72/0x310 [usbcore]
May 24 21:39:54 sub.domain.tld kernel:  hub_event+0x59e/0x1550 [usbcore]
May 24 21:39:54 sub.domain.tld kernel:  process_one_work+0x1e0/0x490
May 24 21:39:54 sub.domain.tld kernel:  worker_thread+0x48/0x4e0
May 24 21:39:54 sub.domain.tld kernel:  kthread+0x125/0x140
May 24 21:39:54 sub.domain.tld kernel:  ? process_one_work+0x490/0x490
May 24 21:39:54 sub.domain.tld kernel:  ? kthread_create_on_node+0x70/0x70
May 24 21:39:54 sub.domain.tld kernel:  ret_from_fork+0x2c/0x40
May 24 21:39:54 sub.domain.tld kernel: INFO: task systemd-modules:356 blocked for more than 120 seconds.
May 24 21:39:54 sub.domain.tld kernel:       Tainted: P           O    4.11.2-1-ARCH #1
May 24 21:39:54 sub.domain.tld kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
May 24 21:39:54 sub.domain.tld kernel: systemd-modules D    0   356      1 0x00000004
May 24 21:39:54 sub.domain.tld kernel: Call Trace:
May 24 21:39:54 sub.domain.tld kernel:  __schedule+0x22e/0x8e0
May 24 21:39:54 sub.domain.tld kernel:  schedule+0x3d/0x90
May 24 21:39:54 sub.domain.tld kernel:  schedule_preempt_disabled+0x15/0x20
May 24 21:39:54 sub.domain.tld kernel:  __mutex_lock.isra.0+0x2ac/0x530
May 24 21:39:54 sub.domain.tld kernel:  ? sysfs_create_dir_ns+0x40/0x90
May 24 21:39:54 sub.domain.tld kernel:  __mutex_lock_slowpath+0x13/0x20
May 24 21:39:54 sub.domain.tld kernel:  ? __mutex_lock_slowpath+0x13/0x20
May 24 21:39:54 sub.domain.tld kernel:  mutex_lock+0x25/0x30
May 24 21:39:54 sub.domain.tld kernel:  __driver_attach+0x53/0xe0
May 24 21:39:54 sub.domain.tld kernel:  ? driver_probe_device+0x450/0x450
May 24 21:39:54 sub.domain.tld kernel:  bus_for_each_dev+0x6e/0xb0
May 24 21:39:54 sub.domain.tld kernel:  driver_attach+0x1e/0x20
May 24 21:39:54 sub.domain.tld kernel:  bus_add_driver+0x1d0/0x270
May 24 21:39:54 sub.domain.tld kernel:  driver_register+0x60/0xe0
May 24 21:39:54 sub.domain.tld kernel:  usb_register_device_driver+0x51/0xa0 [usbcore]
May 24 21:39:54 sub.domain.tld kernel:  ? 0xffffffffa00cf000
May 24 21:39:54 sub.domain.tld kernel:  usbip_host_init+0x75/0x1000 [usbip_host]
May 24 21:39:54 sub.domain.tld kernel:  do_one_initcall+0x50/0x1a0
May 24 21:39:54 sub.domain.tld kernel:  ? kmem_cache_alloc_trace+0x15e/0x1b0
May 24 21:39:54 sub.domain.tld kernel:  do_init_module+0x5f/0x1eb
May 24 21:39:54 sub.domain.tld kernel:  load_module+0x260a/0x2ae0
May 24 21:39:54 sub.domain.tld kernel:  SyS_init_module+0x173/0x1b0
May 24 21:39:54 sub.domain.tld kernel:  ? SyS_init_module+0x173/0x1b0
May 24 21:39:54 sub.domain.tld kernel:  entry_SYSCALL_64_fastpath+0x1a/0xa9
May 24 21:39:54 sub.domain.tld kernel: RIP: 0033:0x7f09679cbb0a
May 24 21:39:54 sub.domain.tld kernel: RSP: 002b:00007fff0dfa4f28 EFLAGS: 00000246 ORIG_RAX: 00000000000000af
May 24 21:39:54 sub.domain.tld kernel: RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007f09679cbb0a
May 24 21:39:54 sub.domain.tld kernel: RDX: 00007f0967eb1cb5 RSI: 000000000000c238 RDI: 00005598f9dc9d50
May 24 21:39:54 sub.domain.tld kernel: RBP: 0000000000000005 R08: 0000000000000016 R09: 0000000000000089
May 24 21:39:54 sub.domain.tld kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 00007fff0dfa3f20
May 24 21:39:54 sub.domain.tld kernel: R13: 00007fff0dfa3f00 R14: 0000000000000005 R15: 00005598f9dc7e30
May 24 21:40:00 sub.domain.tld systemd-udevd[390]: seq 3721 '/devices/pci0000:00/0000:00:14.0/usb3/3-10' is taking a long time
May 24 21:40:00 sub.domain.tld systemd-udevd[390]: seq 3723 '/devices/pci0000:00/0000:00:14.0/usb3/3-13' is taking a long time
May 24 21:40:00 sub.domain.tld systemd-udevd[390]: seq 3716 '/devices/pci0000:00/0000:00:14.0/usb3/3-1' is taking a long time
May 24 21:40:00 sub.domain.tld systemd-udevd[390]: seq 3744 '/devices/pci0000:00/0000:00:14.0/usb3/3-9' is taking a long time
May 24 21:40:00 sub.domain.tld systemd-udevd[390]: seq 3734 '/devices/pci0000:00/0000:00:14.0/usb3/3-5' is taking a long time
May 24 21:40:29 sub.domain.tld systemd[1]: systemd-modules-load.service: Processes still around after final SIGKILL. Entering failed mode.
May 24 21:40:29 sub.domain.tld systemd[1]: Failed to start Load Kernel Modules.
-- Subject: Unit systemd-modules-load.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit systemd-modules-load.service has failed.
-- 
-- The result is failed.

lspci indicated that 00:14.0 was in fact an Apple AirPods case I had plugged in to charge off one of the front USB3 ports. Rebooting a couple of times (alternating between unplugged and pluggin-in) proved this to be the culprit.

Apple failing to implement a spec correctly, leading to a mal- or non-functioning product is one thing, but IMO that shouldn't cause stack traces and boot failures, which leads to my hypothesis that this could be a bug in the PCI or USB subsystems or the kernel itself.

This isn't a blocker for me (since I can just unplug at boot) but I'd love some help dialing in exactly what's going on here, at least far enough that I know where it should be reported. To the uninitiated, it looks like the errors are coming directly out of the kernel (USB and scheduling code).

Offline

#2 2017-05-25 21:27:05

R00KIE
Forum Fellow
From: Between a computer and a chair
Registered: 2008-09-14
Posts: 4,734

Re: Failed boot with 4.11.2-1 when USB device connected - where to post?

The usual way to deal with these things is check if an older kernel works fine, you can try linux-lts from the repos.

If the older kernel works fine you bisect the kernel to find which commit introduced the regression. If an older kernel does not work there isn't any more homework to do.

The last step is reporting the problem upstream, describe the bug, your hardware how to reproduce the bug and the commit that introduces the regression (if applicable). Then you'll have to work with upstream to test any patches that might fix things.


R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K

Offline

Board footer

Powered by FluxBB