You are not logged in.
I haven't been able to figure this out so far. I found that USB devices were disconnecting on resume with kernel 5.5 so I downgraded to the -lts. However, I have to move the HDD to a new replacement enclosure and it appears to take more of its time starting, such that I'm having the disconnect with the kernels I'd thought were okay.
What I want to know is how to prevent the USB drive from disconnecting on resume and breaking my active mounts? I am able to manually unmount the filesystems on the drive, but that is a significant inconvenience as I have to close all terminals/programs that are accessing the filesystems on that drive, and then ensure the filesystems are mounted on resume.
Is there some mechanism to tune the USB behavior?
Offline
I made some partial progress on this. I incidentally discovered a way to selectively debug modules, using the dynamic debug interface. I did so this way:
echo 'module xhci_hcd +pflm' > /sys/kernel/debug/dynamic_debug/control
echo 'module usb_storage +pflm' > /sys/kernel/debug/dynamic_debug/control
echo 'module usbcore +pflm' > /sys/kernel/debug/dynamic_debug/controlThis led me to discover how the drive is timing out after I tested a suspend (S3). Choice excerpts from system log:
[1930927.084971] kernel: usbcore:wait_for_connected:3494: usb 2-1: Waited 2000ms for CONNECT
...
[1930931.280990] kernel: xhci_hcd:handle_port_status:1638: xhci_hcd 0000:00:14.0: Port change event, 2-1, id 13, portsc: 0xa021203The "Waited 2000ms for CONNECT" is the key line here. From the file ./drivers/usb/core/hub.c in the linux source, there is a while loop in the wait_for_connected function that has a hard-coded delay value of 2000ms. Since 2000 is the max value, it stood to reason that the port didn't power on in time and timeout was reached. The function fails out (with -19, -ENODEV) and a disconnect is initiated:
...
[1930927.086254] kernel: xhci_hcd:xhci_hub_control:1158: xhci_hcd 0000:00:14.0: Get port status 2-1 read: 0x2a0, return 0x2a0
[1930927.086285] kernel: usbcore:check_port_resume_type:3084: usb usb2-port1: status 0000.02a0 after resume, -19
[1930927.086291] kernel: usbcore:usb_port_resume:3606: usb 2-1: can't resume, status -19
[1930927.086295] kernel: usbcore:hub_port_logical_disconnect:960: usb usb2-port1: logical disconnect
...
[1930927.088664] kernel: usb 2-1: USB disconnect, device number 5
...
[1930927.123147] kernel: blk_update_request: I/O error, [...]Eventually, the system notices the port status has changed (the device connects) about 3 more seconds later (see above), and the device connects as if it had been plugged in new. It does not matter that the "USB-persist" feature is set in the kernel for the device, as wait_for_connected is called in response to this kernel feature, and we see above that it couldn't do its job.
I decided to change the hard-coded delay value in the usbcore code to something lengthier. The timestamp difference suggests at least 3.2 seconds more were needed. I round that up to 4, add the first 2 seconds, and add another 2 for good measure (so, 8000). Testing with a maximum delay of 8000 yielded the following successful results:
[ 54.657780] kernel: usbcore:wait_for_connected:3494: usb 2-3: Waited 5200ms for CONNECTNo partitions on the device are unmounted as the device "persists" as it should. With my change, there is a noticeable increase in the time it takes for the tasks to unfreeze upon resume, but this is a small price to pay for ensuring that my filesystems stay mounted. There's likely nothing I can do about it, as the delay is directly related to the time it takes for the device to connect.
So, I have found a "fix" to my issue, but I cannot call it a "solution" as it requires rebuilding the kernel every time I want to upgrade. The usbcore module is also built-in, so there's no way around not building the kernel. It would also break if I choose to upgrade the kernel and forget to rebuild it. There might be another solution (perhaps the single USB port isn't providing enough power, or some other kernel tweakable exists), but I do not know.
Offline
Well done ![]()
According to the comment in "wait_for_connected", the 2000 ms timer is based on known devices at the time the function was introduced: "It has been reported till now that few devices take as long as 2000 ms to train the link after host enabling its VBUS and termination"
Since the usbcore module has parameters already (as in /sys/module/usbcore/parameters/), the cleanest solution in my opinion would be to replace the hardcoded 2000 ms by a parameter (if I understood right, this can be read when the module is loaded via a call to "module_param"). I can only think that one would need to contribute a patch on kernel.org, or get a friendly developer to do it. Regarding persistance across kernel upgrades, I am not sure if the parameter has to be set again, but at least this would be much easier than patching and recompiling the kernel.
A practical alternative could be to a) try a different USB port (ideally outputting more power?) or b) try a different enclosure - perhaps you have been unlucky with the one selected for the replacement (or it may be partially defective). Now that you have access to the debugging messages, it would be easy to test another one.
Offline