You are not logged in.
$ git bisect bad
Bisecting: 231 revisions left to test after this (roughly 8 steps)
[af92c02fb2090692f4920ea4b74870940260cf49] Merge patch series "scsi: fixes for targets with many LUNs, and scsi_target_block rework"
https://drive.google.com/file/d/1UnqkZE … sp=sharing linux-6.4rc1.r201.gaf92c02fb209-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/16b-tw5 … sp=sharing linux-headers-6.4rc1.r201.gaf92c02fb209-1-x86_64.pkg.tar.zst
Offline
bad: 6.4.0-rc1-1-00201-gaf92c02fb209
Offline
$ git bisect bad
Bisecting: 100 revisions left to test after this (roughly 7 steps)
[2e2fe5ac695a00ab03cab4db1f4d6be07168ed9d] scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
https://drive.google.com/file/d/1eWiEP1 … sp=sharing linux-6.4rc1.r100.g2e2fe5ac695a-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/19_r7E_ … sp=sharing linux-headers-6.4rc1.r100.g2e2fe5ac695a-1-x86_64.pkg.tar.zst
Offline
bad: 6.4.0-rc1-1-00100-g2e2fe5ac695a
Thanks again for helping with this. I'm sure you've automated the process but I didn't realize this was that much work.
Offline
$ git bisect bad
Bisecting: 48 revisions left to test after this (roughly 6 steps)
[8759924ddb93498bd5777f0b05b6bc9cacf4ffe3] Merge patch series "scsi: hisi_sas: Some misc changes"
https://drive.google.com/file/d/1vwvtR0 … sp=sharing linux-6.4rc1.r51.g8759924ddb93-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1pI50Vq … sp=sharing linux-headers-6.4rc1.r51.g8759924ddb93-1-x86_64.pkg.tar.zst
Offline
I encountered the same problem in my VM that was previously running Linux 6.4.12.
After upgrading the kernel to Linux 6.5.*, the scsi_eh_1 process cost almost 50%.
Any update message please?
I've upgraded several VMs
- from Linux arch 6.4.12-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 24 Aug 2023 00:38:14 +0000 x86_64 GNU/Linux
- to Linux arch 6.5.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 06 Sep 2023 21:01:01 +0000 x86_64 GNU/Linuxand in all of them the scsi_eh_1 process eats 10-15% CPU.
I can't find any errors in the logs. When I go back to a snapshot with 6.4.12 the problem disappears. The FS is btrfs.
scsi_eh should be the scsi error handler so I've tried to enable logging (https://github.com/ibm-s390-linux/s390- … ging_level) but I see no output in dmesg/journalctl.
How can I find out what is causing this?
Offline
Do you have an optical drive?
Also check your journal, https://bbs.archlinux.org/viewtopic.php … 0#p2120490
Offline
$ git bisect bad Bisecting: 48 revisions left to test after this (roughly 6 steps) [8759924ddb93498bd5777f0b05b6bc9cacf4ffe3] Merge patch series "scsi: hisi_sas: Some misc changes"
https://drive.google.com/file/d/1vwvtR0 … sp=sharing linux-6.4rc1.r51.g8759924ddb93-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1pI50Vq … sp=sharing linux-headers-6.4rc1.r51.g8759924ddb93-1-x86_64.pkg.tar.zst
I'm unable to test this - I downloaded a few times but I always get:
...
(1/2) downgrading linux [############################################################] 100%
error: could not extract /usr/lib/modules/6.4.0-rc1-1-00051-g8759924ddb93/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.zst (Zstd decompression failed: Data corruption detected)
error: problem occurred while upgrading linux
error: could not commit transaction
error: failed to commit transaction (transaction aborted)
Errors occurred, no packages were upgraded.
Offline
I encountered the same problem in my VM that was previously running Linux 6.4.12.
After upgrading the kernel to Linux 6.5.*, the scsi_eh_1 process cost almost 50%.
Any update message please?
We are trying to find the cause.
As a workaround you can switch your drives to SATA or try to remove the optical drive.
Offline
I can not reproduce the error on this system. Hopefully rebuilding the package will fix whatever caused the issue.
Rebuilt package:
https://drive.google.com/file/d/1J7UMFX … sp=sharing linux-6.4rc1.r51.g8759924ddb93-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1UKNsWm … sp=sharing linux-headers-6.4rc1.r51.g8759924ddb93-1-x86_64.pkg.tar.zst
Offline
@laktak, could you confirm that it's directly tied to an optical drive?
Is your setup similar to MoZhonghua's?
Offline
good: 6.4.0-rc1-1-00051-g8759924ddb93
Offline
@laktak, could you confirm that it's directly tied to an optical drive?
Is your setup similar to MoZhonghua's?
I can confirm that the error disappears when I remove the cd-rom even though the device is configured as ide1 (there is no ide0) and not scsi:
ide1:0.present = "TRUE"
ide1:0.autodetect = "TRUE"
ide1:0.deviceType = "cdrom-image"
ide1:0.startConnected = "FALSE"
Offline
Just to add a "me too"
I found FS#79644 - [linux] Kernel 6.5.2 Causes Marvell Technology Group 88SE9128 PCIe SATA to Constantly Reset which directed me here
Same messages in journal:
Sep 14 17:01:11 nas kernel: ata16: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
Sep 14 17:01:11 nas kernel: ata16.00: configured for UDMA/66
Downgraded the kernel from 6.5.3 to 6.4.12 and the journal messages disappear
I stepped back up to 6.5.2 and the messages reappear, so it's definitely between 6.4.12 and 6.5.2.... 6.5.3 does not resolve it (for me).
This is on a NAS (ASRock Rack C2550D4I) with no optical drives.
In my case, it's a Marvel 88SE9230:
$ ls /sys/class/ata_port/ | grep ata16
lrwxrwxrwx 1 root root 0 Sep 14 21:45 ata16 -> ../../devices/pci0000:00/0000:00:04.0/0000:09:00.0/ata16/ata_port/ata16
$ lspci | grep 09:00
09:00.0 SATA controller: Marvell Technology Group Ltd. 88SE9230 PCIe 2.0 x2 4-port SATA 6 Gb/s RAID Controller (rev 11)
$ cat /sys/bus/pci/devices/0000\:09\:00.0/uevent
DRIVER=ahci
PCI_CLASS=10601
PCI_ID=1B4B:9230
PCI_SUBSYS_ID=1849:9230
PCI_SLOT_NAME=0000:09:00.0
MODALIAS=pci:v00001B4Bd00009230sv00001849sd00009230bc01sc06i01
Last edited by RuneArch (2023-09-14 20:56:28)
Offline
$ git bisect good
Bisecting: 21 revisions left to test after this (roughly 5 steps)
[7907ad748bdba8ac9ca47f0a650cc2e5d2ad6e24] Merge patch series "Use block pr_ops in LIO"
[stephen@arch ~/builds/linux-stable/src/linux]$ git bisect visualize
https://drive.google.com/file/d/1jYdoHb … sp=sharing linux-6.4rc1.r78.g7907ad748bdb-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/10s4GjB … sp=sharing linux-headers-6.4rc1.r78.g7907ad748bdb-1-x86_64.pkg.tar.zst
Offline
good: 6.4.0-rc1-1-00078-g7907ad748bdb
Offline
@RuneArch did linux-6.5.3.arch1-1.2-x86_64.pkg.tar.zst from https://bbs.archlinux.org/viewtopic.php … 4#p2120494 still have the issue? Would be useful to report your finding to https://lore.kernel.org/regressions/ZQH … x1-carbon/ or https://bugzilla.kernel.org/show_bug.cgi?id=217902
$ git bisect good
Bisecting: 10 revisions left to test after this (roughly 4 steps)
[390e2d1a587405a522dc6b433d45648f895a352c] scsi: sd: Handle read/write CDL timeout failures
https://drive.google.com/file/d/1PzxdNP … sp=sharing linux-6.4rc1.r11.g390e2d1a5874-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1QxZNd2 … sp=sharing linux-headers-6.4rc1.r11.g390e2d1a5874-1-x86_64.pkg.tar.zst
Offline
bad: 6.4.0-rc1-1-00011-g390e2d1a5874
Offline
$ git bisect bad
Bisecting: 5 revisions left to test after this (roughly 3 steps)
[734326937b65cec7ffd00bfbbce0f791ac4aac84] scsi: core: Rename and move get_scsi_ml_byte()
https://drive.google.com/file/d/1J2Gqwu … sp=sharing linux-6.4rc1.r5.g734326937b65-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1Y37yPf … sp=sharing linux-headers-6.4rc1.r5.g734326937b65-1-x86_64.pkg.tar.zst
Offline
good: 6.4.0-rc1-1-00005-g734326937b65
Offline
$ git bisect good
Bisecting: 2 revisions left to test after this (roughly 2 steps)
[624885209f31eb9985bf51abe204ecbffe2fdeea] scsi: core: Detect support for command duration limits
https://drive.google.com/file/d/1hvAmA3 … sp=sharing linux-6.4rc1.r8.g624885209f31-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1ein1Ww … sp=sharing linux-headers-6.4rc1.r8.g624885209f31-1-x86_64.pkg.tar.zst
Offline
@laktak @loqs @seth
On the contrary, my VM didn't connect an optical drive.
I tried to connect the optical drive with SATA type, and process scsi_eh_1 now works normally.
Thanks for your suggestions!
We are trying to find the cause.
As a workaround you can switch your drives to SATA or try to remove the optical drive.
Offline
bad: 6.4.0-rc1-1-00008-g624885209f31
Offline
Hit the same problem. Based on bisect results from @laktak @loqs, found that removing scsi_cdl_check() from commit 624885209f31eb9985bf51abe204ecbffe2fdeea can fix the problem:
diff --git a/drivers/scsi/scsi_scan.c b/drivers/scsi/scsi_scan.c
index aa13feb17c62..d217be323cc6 100644
--- a/drivers/scsi/scsi_scan.c
+++ b/drivers/scsi/scsi_scan.c
@@ -1087,8 +1087,6 @@ static int scsi_add_lun(struct scsi_device *sdev, unsigned char *inq_result,
if (sdev->scsi_level >= SCSI_3)
scsi_attach_vpd(sdev);
- scsi_cdl_check(sdev);
-
sdev->max_queue_depth = sdev->queue_depth;
WARN_ON_ONCE(sdev->max_queue_depth > sdev->budget_map.depth);
sdev->sdev_bflags = *bflags;
@@ -1626,7 +1624,6 @@ void scsi_rescan_device(struct device *dev)
device_lock(dev);
scsi_attach_vpd(sdev);
- scsi_cdl_check(sdev);
if (sdev->handler && sdev->handler->rescan)
sdev->handler->rescan(sdev);
Offline
Same issue here, on VMware, after kernel upgrading:
6.5.3-zen1-1-zen
Offline