You are not logged in.

#1 2017-01-13 21:58:40

dotamin
Member
Registered: 2014-09-02
Posts: 5
Website

libata error after upgrading kernel (SOLVED)

Hi, Few weeks earlier i upgraded the system, at that time i had "linux kernel 4.7.1". After upgrade and reboot, arch couldnt boot because of the newer kernel (linux 4.8.7). I downgraded the kernel to 4.7.1. it solved my problem temporarily.

I believe this is a bug related to ata1, check the log after upgrade:

ata1.00: exception Emask 0x20 SAct 0x20 SErr 0x0 action 0x6 frozen
ata1.00: irq_stat 0x20000000, host bus error
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/28:28:78:a5:11/00:00:06:00:00/40 tag 5 ncq dma 2048 ...
               res 40/00:20:18:b9:06/00:00:06:00:00/40 Emask 0x20 (host bus error)
ata1.00: status: { DRDY }
ata1.00: revalidation failed {errno=-5}

at first sight i thought my samsung ssd is broken. but smartctl reports there is no error, meanwhile ata1 status is DRDY meaning "Device Ready".
here is the "smartctl -a /dev/sda" output :

smartctl 6.5 2016-05-07 r4318 [x86_64-linux-4.7.1-1-ARCH] (local build)
Copyright (C) 2002-16, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     SAMSUNG MZHPU128HCGM-00000
Serial Number:    S1ACNYAD400813
LU WWN Device Id: 5 002538 600000cd4
Firmware Version: UXM6401Q
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-2, ATA8-ACS T13/1699-D revision 4c
SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Jan 14 00:59:54 2017 IRST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)	The previous self-test routine completed
					without error or no self-test has ever 
					been run.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x5f) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Abort Offline collection upon new
					command.
					Offline surface scan supported.
					Self-test supported.
					No Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  10) minutes.
SCT capabilities: 	       (0x003d)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  9 Power_On_Hours          0x0032   098   098   000    Old_age   Always       -       9103
 12 Power_Cycle_Count       0x0032   093   093   000    Old_age   Always       -       6269
177 Wear_Leveling_Count     0x0013   095   095   017    Pre-fail  Always       -       406
178 Used_Rsvd_Blk_Cnt_Chip  0x0013   088   088   010    Pre-fail  Always       -       147
179 Used_Rsvd_Blk_Cnt_Tot   0x0013   089   089   010    Pre-fail  Always       -       285
180 Unused_Rsvd_Blk_Cnt_Tot 0x0013   089   089   010    Pre-fail  Always       -       2339
183 Runtime_Bad_Block       0x0013   100   100   010    Pre-fail  Always       -       0

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%         0         -
# 2  Short offline       Completed without error       00%         0         -

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

I cant upgrade linux, because i have the same problem with newer versions of kernel.
Even disabling NCQ (booting with kernel parameters "libata.force=noncq") and PCI=nomsi didnt solve the problem. ( check this similar Bug that has been reported LINK )

Is this really a bug? what should i do?
Thanks

Last edited by dotamin (2017-02-08 23:22:41)

Offline

#2 2017-01-13 23:06:12

teateawhy
Member
From: GER
Registered: 2012-03-05
Posts: 1,138
Website

Re: libata error after upgrading kernel (SOLVED)

Write the bug report.

Offline

#3 2017-01-15 18:21:32

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: libata error after upgrading kernel (SOLVED)

teateawhy wrote:

Write the bug report.

You would not test 4.9.4 or 4.10rc3 first?
The fix from 89171 appears to be applied in both:
linux-stable

linux
You believe the issue is different to 89261?
Edit:
The 89171 fix is in 4.8.13 and 4.8.7 dotamin encountered the issue with 2b21ef0aae65f22f5ba86b13c4588f6f0c2dbefb
Edit2:
Also please read https://www.kernel.org/doc/html/latest/ … -bugs.html before reporting a bug upstream ( none packaging / integration bugs should be reported upstream Reporting_bug_guidelines )

Last edited by loqs (2017-01-15 19:00:35)

Offline

#4 2017-01-15 21:42:35

teateawhy
Member
From: GER
Registered: 2012-03-05
Posts: 1,138
Website

Re: libata error after upgrading kernel (SOLVED)

loqs wrote:
teateawhy wrote:

Write the bug report.

You would not test 4.9.4 or 4.10rc3 first?
The 89171 fix is in 4.8.13 and 4.8.7 dotamin encountered the issue with 2b21ef0aae65f22f5ba86b13c4588f6f0c2dbefb

That patch sets "board_ahci_nomsi", and OP already wrote that setting "PCI=nomsi" didn't solve the problem.
I would assume that both "board_ahci_nomsi" and "PCI=nomsi" do the same thing.

Offline

#5 2017-01-15 21:46:18

teateawhy
Member
From: GER
Registered: 2012-03-05
Posts: 1,138
Website

Re: libata error after upgrading kernel (SOLVED)

teateawhy wrote:
loqs wrote:
teateawhy wrote:

Write the bug report.

You would not test 4.9.4 or 4.10rc3 first?
The 89171 fix is in 4.8.13 and 4.8.7 dotamin encountered the issue with 2b21ef0aae65f22f5ba86b13c4588f6f0c2dbefb

That patch sets "board_ahci_nomsi", and OP already wrote that setting "PCI=nomsi" didn't solve the problem.
I would assume that both "board_ahci_nomsi" and "PCI=nomsi" do the same thing.

kernel commandline parameters are case sensitive! Most of them are in lower case, so if you specify them in upper case (capital letters), it won't work.

Offline

#6 2017-01-15 21:58:13

dotamin
Member
Registered: 2014-09-02
Posts: 5
Website

Re: libata error after upgrading kernel (SOLVED)

loqs wrote:
teateawhy wrote:

Write the bug report.

You would not test 4.9.4 or 4.10rc3 first?
The fix from 89171 appears to be applied in both:
linux-stable

linux
You believe the issue is different to 89261?
Edit:
The 89171 fix is in 4.8.13 and 4.8.7 dotamin encountered the issue with 2b21ef0aae65f22f5ba86b13c4588f6f0c2dbefb
Edit2:
Also please read https://www.kernel.org/doc/html/latest/ … -bugs.html before reporting a bug upstream ( none packaging / integration bugs should be reported upstream Reporting_bug_guidelines )

Thanks for the reply.
No, i didnt test 4.9.4 or 4.10rc3. i dont know how to do that (yet).
In my case everything is almost similar to 89261 except that there is no

[ 3657.844543] ata1: SError: { PHYRdyChg CommWake }

After your reply i downloaded latest arch linux (ARCH_201701) which uses linux 4.8.13 and i created a Live bootable flash. Booting into Live USB produced the same errors : check_this. disabling NCQ (libata.force=noncq) resolve this problem and i dont see any ata1 errors ! However arch freezes for other reasons: check_this
So i conclude NCQ bug with samsung SSD still exists. (am i wrong ?)

Thanks again.

Offline

#7 2017-01-15 22:46:26

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: libata error after upgrading kernel (SOLVED)

dotamin wrote:

So i conclude NCQ bug with samsung SSD still exists. (am i wrong ?)

It could so far the kernel only blocks

 	{ PCI_VDEVICE(SAMSUNG, 0x1600), board_ahci_nomsi },
	{ PCI_VDEVICE(SAMSUNG, 0xa800), board_ahci_nomsi },

If your samsung SSD has a different device ID it would not be covered by the blocking.
Upstream support for the 4.8.y series has ended and arch does not support upstream issues so you would need to generate the issue on the linux-lts kernel which is currently based off the 4.4 series which upstream does support or 4.9.4 / 4.10rc3 which upstream also supports.
As you are still experiencing boot issues that look related would second teateawhy's suggestion of trying "pci=nomsi" kernel-parameters
Edit:
formatting

Last edited by loqs (2017-01-15 22:47:52)

Offline

#8 2017-02-08 23:22:20

dotamin
Member
Registered: 2014-09-02
Posts: 5
Website

Re: libata error after upgrading kernel (SOLVED)

After a long break, i upgraded linux to latest version (4.9.8) and everything works fine. There is no problem with NCQ and my samsung SSD.
Thanks again for replying smile

Offline

#9 2017-04-02 20:44:51

gururise
Member
Registered: 2011-11-03
Posts: 33

Re: libata error after upgrading kernel (SOLVED)

I just upgraded from ARCH 4.9.9-1 to 4.10.8-1-ARCH and my system won't boot anymore.  I get the same error posted by OP.  I had to add libata.force=noncq to get my system to boot.  I tried only with pci=nomsi and that did not work.  I wonder what's different about the new kernel

Offline

Board footer

Powered by FluxBB