You are not logged in.
I am reviving an old workstation which has 5 hdd bays. Recently when I add more discs (initially I only had two) and I found kernel can only successfully detect two of them while the BIOS can see all the discs. kernel.org already had the same report ( kernel.org Bug 214967 ) and fixed it back in 2021. but seems this issue resurfaced again. I posted issue in kernel.org and then realized that I should report back in Arch since I don't build the kernel myself.
my kernel:
6.6.72-1-lts
journal:
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 0 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 0 attach sas addr is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 1 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 1 attach sas addr is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 2 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 2 attach sas addr is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 3 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 3 attach sas addr is 3
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 4 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 4 attach sas addr is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 5 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 5 attach sas addr is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 6 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 6 attach sas addr is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1058:phy 7 attach dev info is 0
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1060:phy 7 attach sas addr is 7
Jan 14 08:33:36 d20-x8664 kernel: scsi host6: mvsas
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 261:phy 0 byte dmaded.
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 261:phy 3 byte dmaded.
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 261:phy 7 byte dmaded.
Jan 14 08:33:36 d20-x8664 kernel: sas: phy-6:0 added to port-6:0, phy_mask:0x1 (0000000000000000)
Jan 14 08:33:36 d20-x8664 kernel: sas: DOING DISCOVERY on port 0, pid:11
Jan 14 08:33:36 d20-x8664 kernel: sas: Enter sas_scsi_recover_host busy: 0 failed: 0
Jan 14 08:33:36 d20-x8664 kernel: sas: ata7: end_device-6:0: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: ata7.00: ATA-8: WDC WD2500AAJS-08L7A0, 03.03E03, max UDMA/100
Jan 14 08:33:36 d20-x8664 kernel: ata7.00: 488397168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 14 08:33:36 d20-x8664 kernel: ata7.00: configured for UDMA/100
Jan 14 08:33:36 d20-x8664 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Jan 14 08:33:36 d20-x8664 kernel: scsi 6:0:0:0: Direct-Access ATA WDC WD2500AAJS-0 3E03 PQ: 0 ANSI: 5
Jan 14 08:33:36 d20-x8664 kernel: sas: DONE DISCOVERY on port 0, pid:11, result:0
Jan 14 08:33:36 d20-x8664 kernel: sas: phy-6:3 added to port-6:1, phy_mask:0x8 (0300000000000000)
Jan 14 08:33:36 d20-x8664 kernel: sas: DOING DISCOVERY on port 1, pid:11
Jan 14 08:33:36 d20-x8664 kernel: sas: Enter sas_scsi_recover_host busy: 0 failed: 0
Jan 14 08:33:36 d20-x8664 kernel: sas: ata7: end_device-6:0: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: ata8: end_device-6:1: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: ata8.00: ATA-8: ST31000524NS, 130C, max UDMA/133
Jan 14 08:33:36 d20-x8664 kernel: ata8.00: 1953525168 sectors, multi 0: LBA48 NCQ (depth 32)
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1585:port 3 slot 0 rx_desc 20000 has error info0000000081000000.
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1585:port 3 slot 0 rx_desc 20000 has error info0000000081000000.
Jan 14 08:33:36 d20-x8664 kernel: ata8.00: configured for UDMA/133
Jan 14 08:33:36 d20-x8664 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Jan 14 08:33:36 d20-x8664 kernel: scsi 6:0:1:0: Direct-Access ATA ST31000524NS 130C PQ: 0 ANSI: 5
Jan 14 08:33:36 d20-x8664 kernel: sas: DONE DISCOVERY on port 1, pid:11, result:0
Jan 14 08:33:36 d20-x8664 kernel: sas: phy-6:7 added to port-6:2, phy_mask:0x80 (0700000000000000)
Jan 14 08:33:36 d20-x8664 kernel: sas: DOING DISCOVERY on port 2, pid:11
Jan 14 08:33:36 d20-x8664 kernel: sas: Enter sas_scsi_recover_host busy: 0 failed: 0
Jan 14 08:33:36 d20-x8664 kernel: sas: ata7: end_device-6:0: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: ata8: end_device-6:1: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: ata9: end_device-6:2: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1585:port 7 slot 0 rx_desc 20000 has error info0000000081000000.
Jan 14 08:33:36 d20-x8664 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 0 tries: 1
Jan 14 08:33:36 d20-x8664 kernel: sas: sas_probe_sata: for direct-attached device 0700000000000000 returned -19
Jan 14 08:33:36 d20-x8664 kernel: drivers/scsi/mvsas/mv_sas.c 1229:found dev[2:5] is gone.
Jan 14 08:33:36 d20-x8664 kernel: sas: DONE DISCOVERY on port 2, pid:11, result:0
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:0:0: [sda] 488397168 512-byte logical blocks: (250 GB/233 GiB)
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:0:0: [sda] Write Protect is off
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:0:0: [sda] Mode Sense: 00 3a 00 00
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:0:0: [sda] Preferred minimum I/O size 512 bytes
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:1:0: [sdb] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:1:0: [sdb] Write Protect is off
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:1:0: [sdb] Mode Sense: 00 3a 00 00
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:1:0: [sdb] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:1:0: [sdb] Preferred minimum I/O size 512 bytes
Jan 14 08:33:36 d20-x8664 kernel: sda: sda1 sda2
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:0:0: [sda] Attached SCSI disk
Jan 14 08:33:36 d20-x8664 kernel: sdb: sdb1 sdb2 sdb3
Jan 14 08:33:36 d20-x8664 kernel: sd 6:0:1:0: [sdb] Attached SCSI disk
Jan 14 08:33:36 d20-x8664 kernel: sas: Enter sas_scsi_recover_host busy: 2 failed: 2
Jan 14 08:33:36 d20-x8664 kernel: sas: ata7: end_device-6:0: cmd error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: ata8: end_device-6:1: cmd error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: ata7: end_device-6:0: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: ata8: end_device-6:1: dev error handler
Jan 14 08:33:36 d20-x8664 kernel: sas: --- Exit sas_scsi_recover_host: busy: 0 failed: 2 tries: 1
Offline
With which version of the kernel did the issue first occur? Also which version are you currently using?
Does switching to the linux-lts package help?
Offline
Thanks and these are good questions; not thought of difference between regular kernel vs lts one.
1. the 2021 posted issue in kernel.org showed regular kernel version of 5.15.x.
2. I use lts kernel for long time. and only recently I found the problem since I tried to add more discs. So I don't know when this issue emerged.
3. I assumed lts kernel would have no difference from the regular one if the issue was fixed back in 5.15 version and would be clean in upstream versions. Apparently I need to test the regular kernel if current regular linux shows the same problem as in lts one. Will report back.
Offline
update:
under the regular linux:
6.12.10-arch1-1
, same problem as seen in the above lts kernel.
Any comments?
Offline
Update and self-note for the solution for future reference:
1. The mvsas issue for not able to detect all hdds in the boot was first raised in 2012 and a complete explanation and patch was posted here 2012 .
2. The issue came back again in 2021 and reported here 2021
3. unfortunately the 2021 bug patch seems incomplete (see below) and missing the following statement that was in the original patch 2012:
memset( SATA_RECEIVED_D2H_FIS(mvi_dev->taskfileset), 0,
sizeof(struct dev_to_host_fis) );
the 2021 bug fixer mentioned to break a long statement into 2 lines and I think this was the line but ended up missed from the patch and all versions after that.
4. Using bisection method proposed in kernel.org, I found all kernel versions have the same issue and with the same mvsas driver, 0.8.16; (if a patch is applied, is the version meant to be the same?)
5. I ended up to compile a custom kernel, based off the stable
6.12.10
then insert the above statement into mv_sas.c at arround line 450. Then follow the steps, here in wiki , to rebuild kernel. Note, after patching, use the following command to build so that makepkg will NOT override modified source code:
makepkg -e
6. reboot into the new customized kernel, and
lsblk
to find your additional discs.
question is: is there anyway to get this fix into the main line of kernel? (don't want to patch it from my side everytime kernel is updated -- take me several hours to rebuild).
Cheers.
Offline
Could you post the full diff/patch you have applied in the end?
Offline
Here is the patch from diff:
diff -urpN linux/src/linux-6.12.10/drivers/scsi/mvsas/mv_sas.c linux-custom/src/linux-6.12.10/drivers/scsi/mvsas/mv_sas.c
--- linux/src/linux-6.12.10/drivers/scsi/mvsas/mv_sas.c 2025-01-22 18:50:56.693221869 -0500
+++ linux-custom/src/linux-6.12.10/drivers/scsi/mvsas/mv_sas.c 2025-01-22 18:30:27.249943408 -0500
@@ -447,6 +447,8 @@ static int mvs_task_prep_ata(struct mvs_
mvi_dev->device_id);
return -EBUSY;
}
+ memset( SATA_RECEIVED_D2H_FIS(mvi_dev->taskfileset), 0,
+ sizeof(struct dev_to_host_fis) );
slot = &mvi->slot_info[tag];
slot->tx = mvi->tx_prod;
del_q = TXQ_MODE_I | tag |
Offline
Do you want to send this patch to the linux kernel developers or at least report the bug? If you want I can help you with this
Offline
Hi, Grommit:
I definitely want to see this patch to be integrated into the mainline kernel development. I would think it could help some users out there; even though users, including myself, can compile a customer kernel, it is too cumbersome and time consuming to do so (for me it took hours to get kernel compled)...
I am not familar with the official kernel developers or official bug report/fixing. Although I initially reported this to kernel bugzilla , I don't think it is the right place for the official kernel development. So if you could help me out, it is great.
Thank you.
Offline