You are not logged in.
git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[e815d36548f01797ce381be8f0b74f4ba9befd15] scsi: sd: add concurrent positioning ranges support
https://drive.google.com/file/d/10e02LT … sp=sharing linux-5.15rc6.r161.ge815d36548f0-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1uCiGm8 … sp=sharing linux-headers-5.15rc6.r161.ge815d36548f0-1-x86_64.pkg.tar.zst
Offline
git bisect bad
Bisecting: 0 revisions left to test after this (roughly 0 steps)
[e815d36548f01797ce381be8f0b74f4ba9befd15] scsi: sd: add concurrent positioning ranges support
https://drive.google.com/file/d/10e02LT … sp=sharing linux-5.15rc6.r161.ge815d36548f0-1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/1uCiGm8 … sp=sharing linux-headers-5.15rc6.r161.ge815d36548f0-1-x86_64.pkg.tar.zst
it works.
Offline
$ git bisect good
fe22e1c2f705676a705d821301fc52eecc2fe055 is the first bad commit
commit fe22e1c2f705676a705d821301fc52eecc2fe055
Author: Damien Le Moal <damien.lemoal@wdc.com>
Date: Wed Oct 27 11:22:21 2021 +0900
libata: support concurrent positioning ranges log
Add support to discover if an ATA device supports the Concurrent
Positioning Ranges data log (address 0x47), indicating that the device
is capable of seeking to multiple different locations in parallel using
multiple actuators serving different LBA ranges.
Also add support to translate the concurrent positioning ranges log
into its equivalent Concurrent Positioning Ranges VPD page B9h in
libata-scsi.c.
The format of the Concurrent Positioning Ranges Log is defined in ACS-5
r9.
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Link: https://lore.kernel.org/r/20211027022223.183838-4-damien.lemoal@wdc.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
drivers/ata/libata-core.c | 57 +++++++++++++++++++++++++++++++++++++++++++++--
drivers/ata/libata-scsi.c | 48 +++++++++++++++++++++++++++++++--------
include/linux/ata.h | 1 +
include/linux/libata.h | 15 +++++++++++++
4 files changed, 110 insertions(+), 11 deletions(-)
$ git bisect log
git bisect start
# bad: [df0cc57e057f18e44dac8e6c18aba47ab53202f9] Linux 5.16
git bisect bad df0cc57e057f18e44dac8e6c18aba47ab53202f9
# good: [8bb7eca972ad531c9b149c0a51ab43a417385813] Linux 5.15
git bisect good 8bb7eca972ad531c9b149c0a51ab43a417385813
# bad: [2219b0ceefe835b92a8a74a73fe964aa052742a2] Merge tag 'soc-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
git bisect bad 2219b0ceefe835b92a8a74a73fe964aa052742a2
# bad: [fc02cb2b37fe2cbf1d3334b9f0f0eab9431766c4] Merge tag 'net-next-for-5.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad fc02cb2b37fe2cbf1d3334b9f0f0eab9431766c4
# good: [b7b98f868987cd3e86c9bd9a6db048614933d7a0] Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect good b7b98f868987cd3e86c9bd9a6db048614933d7a0
# bad: [6e5772c8d9cf0a77ba4d6fd34fd4126fb66c9983] Merge tag 'x86_cc_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 6e5772c8d9cf0a77ba4d6fd34fd4126fb66c9983
# bad: [19901165d90fdca1e57c9baa0d5b4c63d15c476a] Merge tag 'for-5.16/inode-sync-2021-10-29' of git://git.kernel.dk/linux-block
git bisect bad 19901165d90fdca1e57c9baa0d5b4c63d15c476a
# good: [33c8846c814c1c27c6e33af005042d15061f948b] Merge tag 'for-5.16/block-2021-10-29' of git://git.kernel.dk/linux-block
git bisect good 33c8846c814c1c27c6e33af005042d15061f948b
# good: [643a7234e0960cf63f1a51a15cfc969fafcbabad] Merge tag 'for-5.16/drivers-2021-10-29' of git://git.kernel.dk/linux-block
git bisect good 643a7234e0960cf63f1a51a15cfc969fafcbabad
# good: [88459b50b42a4bd58e528006663afabd0b8652f2] io_uring: simplify io_file_supports_nowait()
git bisect good 88459b50b42a4bd58e528006663afabd0b8652f2
# good: [97eeb5fc14cc4b2091df8b841a07a1ac69f2d762] partitions/ibm: use bdev_nr_sectors instead of open coding it
git bisect good 97eeb5fc14cc4b2091df8b841a07a1ac69f2d762
# bad: [fcaec17b3657a4f8b0b131d5c1ab87e255c3dee6] Merge tag 'for-5.16/scsi-ma-2021-10-29' of git://git.kernel.dk/linux-block
git bisect bad fcaec17b3657a4f8b0b131d5c1ab87e255c3dee6
# good: [d6a644a795451d5fd063a5c08d6bb3a91d021887] io_uring: clean up timeout async_data allocation
git bisect good d6a644a795451d5fd063a5c08d6bb3a91d021887
# good: [8d1f01775f8ead7ee313403158be95bffdbb3638] Merge tag 'for-5.16/io_uring-2021-10-29' of git://git.kernel.dk/linux-block
git bisect good 8d1f01775f8ead7ee313403158be95bffdbb3638
# bad: [6b3bae2324d2ecaa404ceab869018011b7ef6a90] doc: document sysfs queue/independent_access_ranges attributes
git bisect bad 6b3bae2324d2ecaa404ceab869018011b7ef6a90
# bad: [fe22e1c2f705676a705d821301fc52eecc2fe055] libata: support concurrent positioning ranges log
git bisect bad fe22e1c2f705676a705d821301fc52eecc2fe055
# good: [e815d36548f01797ce381be8f0b74f4ba9befd15] scsi: sd: add concurrent positioning ranges support
git bisect good e815d36548f01797ce381be8f0b74f4ba9befd15
# first bad commit: [fe22e1c2f705676a705d821301fc52eecc2fe055] libata: support concurrent positioning ranges log
This result is a lot more plausible than the last one at least. Will try to come up with a quick cross check.
Edit:
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index aba0c67d1bd6..0b3dc4eb24f2 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -2702,7 +2702,7 @@ int ata_dev_configure(struct ata_device *dev)
ata_dev_config_sense_reporting(dev);
ata_dev_config_zac(dev);
ata_dev_config_trusted(dev);
- ata_dev_config_cpr(dev);
+/* ata_dev_config_cpr(dev);*/
dev->cdb_len = 32;
if (ata_msg_drv(ap) && print_info)
Disable setting up cpr. Hopefully this is enough to disable it without causing something else to fail.
https://drive.google.com/file/d/1ZNAof7 … sp=sharing linux-5.16.2.arch1-1.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/17FN8Gr … sp=sharing linux-headers-5.16.2.arch1-1.1-x86_64.pkg.tar.zst
Last edited by loqs (2022-01-25 17:26:28)
Offline
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c index aba0c67d1bd6..0b3dc4eb24f2 100644 --- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -2702,7 +2702,7 @@ int ata_dev_configure(struct ata_device *dev) ata_dev_config_sense_reporting(dev); ata_dev_config_zac(dev); ata_dev_config_trusted(dev); - ata_dev_config_cpr(dev); +/* ata_dev_config_cpr(dev);*/ dev->cdb_len = 32; if (ata_msg_drv(ap) && print_info)
Disable setting up cpr. Hopefully this is enough to disable it without causing something else to fail.
https://drive.google.com/file/d/1ZNAof7 … sp=sharing linux-5.16.2.arch1-1.1-x86_64.pkg.tar.zst
https://drive.google.com/file/d/17FN8Gr … sp=sharing linux-headers-5.16.2.arch1-1.1-x86_64.pkg.tar.zst
great. it works.
Offline
The issue now needs to be reported upstream so a proper fix can be developed.
I would suggest reporting it to https://bugzilla.kernel.org Product='IO Storage' Component='Serial ATA'
CC=Damien Le Moal <damien.lemoal@wdc.com>,Jens Axboe <axboe@kernel.dk> Kernel Version=5.16 Regression=yes
Include the contents of post #53, a link to this thread. Also include details on all the storage devices in the system.
Edit:
https://bugzilla.kernel.org/show_bug.cgi?id=215519 possibly the same issue.
Last edited by loqs (2022-01-25 23:29:23)
Offline
The issue now needs to be reported upstream so a proper fix can be developed.
I would suggest reporting it to https://bugzilla.kernel.org Product='IO Storage' Component='Serial ATA'
CC=Damien Le Moal <damien.lemoal@wdc.com>,Jens Axboe <axboe@kernel.dk> Kernel Version=5.16 Regression=yesInclude the contents of post #53, a link to this thread. Also include details on all the storage devices in the system.
Edit:
https://bugzilla.kernel.org/show_bug.cgi?id=215519 possibly the same issue.
I'm the author of this report. 5.15 and lower works quite well, 5.16 doesn't boot at all, I'm not even able to debug or read any messages, it gets stuck at the early boot and then throws me errors like this:
ata3.00: exception Emask 0x0 SAct 0x40000000 SErr 0x0 action 0x6 frozen
ata3.00: failed command: READ FPDMA QUEUED
ata3.00: cmd 60/08:f0:00:00:00/00:00:00:00:00/40 tag 30 ncq dma 4096 in
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
ata3.00: status: { DRDY }
Offline
Post #53 contains links to a built 5.16.2 package with cpr disabled. If that works for you then the two issues are the same.
Offline
Post #53 contains links to a built 5.16.2 package with cpr disabled. If that works for you then the two issues are the same.
Thanks, I've missed the links when I've read the thread.
Also, confirmed, with the kernel of post #53, I was able to boot 5.16.2-arch1-1.1 without the READ FPDMA QUEUED issues.
Edit: Should I update my issue in kernel bugtraq? If yes, any suggestion?
Last edited by menelkir (2022-01-26 18:22:35)
Offline
Can you add to the bug report the bisection log or note the bug is caused by commit fe22e1c2f705676a705d821301fc52eecc2fe055 and subscribe that commit's author Damien Le Moal <damien.lemoal@wdc.com> and its commiter Jens Axboe <axboe@kernel.dk> to the bug report?
Offline
Can you add to the bug report the bisection log or note the bug is caused by commit fe22e1c2f705676a705d821301fc52eecc2fe055 and subscribe that commit's author Damien Le Moal <damien.lemoal@wdc.com> and its commiter Jens Axboe <axboe@kernel.dk> to the bug report?
Done.
Update: Jens Axboe removed himself as soon I've added him to the PR.
Update 2: Damien is looking into it, he offered a new patch there.
Last edited by menelkir (2022-01-27 01:21:20)
Offline
I'm giving some input to Damien in my PR, maybe it'll be nice if someone help me there sending logs and such if having a similar problem than mine? Because he isn't able to reproduce my problem.
Offline
Hey, I got the same problem. Had to install another distro in the meantime.
Can I Help somehow?
Offline
Hey, I got the same problem. Had to install another distro in the meantime.
Can I Help somehow?
If you're willing to debug and your bug is similar to that, your help would be very appreciated here.
https://bugzilla.kernel.org/show_bug.cgi?id=215519
Offline
What are the chances of this issue could cause physical damages to my SSDs (I have two SSDs with this issue)? Because I have a feeling that is more several than it looks like, specially because the problem is getting logged on S.M.A.R.T.
Offline
Do you have th problem w/ the kernel loqs posted in #53?
Also what gets logged by S.M.A.R.T.?
The last error in the smart log on the kernel bug was on "power-on lifetime: 671 hours" while the drive is now at 786h, so 115h on power ago.
Offline
Do you have th problem w/ the kernel loqs posted in #53?
Also what gets logged by S.M.A.R.T.?
The last error in the smart log on the kernel bug was on "power-on lifetime: 671 hours" while the drive is now at 786h, so 115h on power ago.
No, the kernel posted here is fine, I mean about helping the bug report at kernel.org. Every time I boot a regular kernel without being patched, I had my SMART logs flooded.
Offline
What "smart logs"?
"failed command: READ FPDMA QUEUED" in dmesg is because you're trying to use a feature the drive doesn't have - and https://bugzilla.kernel.org/attachment. … ction=edit doesn't show recent errors.
Offline
What "smart logs"?
"failed command: READ FPDMA QUEUED" in dmesg is because you're trying to use a feature the drive doesn't have - and https://bugzilla.kernel.org/attachment. … ction=edit doesn't show recent errors.
Like this:
60 08 60 08 00 00 40 08 00:01:00.630 READ FPDMA QUEUED
60 08 58 00 00 00 40 08 00:01:00.630 READ FPDMA QUEUED
Offline
Updated?
(Again: the errors in the smart log attached to the kernel bug are old, look at the error ID and "power-on lifetime")
Offline
Hopefully with the 2022.02.01 ISO release tomorrow which should be using a 5.16 kernel you may be able to get required dmesg output.
Offline
You mean like this?
Error 103 occurred at disk power-on lifetime: 671 hours (27 days + 23 hours)
This generates every time I try to boot a non-patched kernel. My actual power-on lifetime is 942 hours.
Offline
Hopefully with the 2022.02.01 ISO release tomorrow which should be using a 5.16 kernel you may be able to get required dmesg output.
I've tried a weekly artix iso which uses 5.16 (I don't remember any other distribution that have 5.16 right now), I'm also unable to get the required dmesg output because it occurs too early.
Offline
Yes, are there newer errors that match the 942h power on time?
Because the old ones are old - they're not "generated" when you run smartctl, but logged in nvram on the drive.
Offline
Yes, are there newer errors that match the 942h power on time?
Because the old ones are old - they're not "generated" when you run smartctl, but logged in nvram on the drive.
I can make new ones appear, it's just a matter of booting a non-patched kernel. That's the reason I'm not sure how safe is to do it anymore.
Offline
When you use the ISO with 5.16 is the system still locking up or is the dmesg buffer too small and the output is lost?
Offline