You are not logged in.
Sometimes I boot and the BIOS doesn't see the drive. Rebooting often allows it to show up and I can boot as normal. When running applications the system often freezes with the following log in journalctl:
Feb 13 08:20:56 acer-c720 kernel: ata1.00: exception Emask 0x0 SAct 0xf800000 SErr 0x400000 action 0x6 frozen
Feb 13 08:20:57 acer-c720 kernel: ata1: SError: { Handshk }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: READ FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 60/08:b8:68:84:0a/00:00:01:00:00/40 tag 23 ncq 4096 in
res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:c0:20:8b:4d/00:00:07:00:00/40 tag 24 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:c8:28:8b:4d/00:00:07:00:00/40 tag 25 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:d0:30:8b:4d/00:00:07:00:00/40 tag 26 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:d8:38:8b:4d/00:00:07:00:00/40 tag 27 ncq 4096 out
res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1: hard resetting link
Feb 13 08:20:57 acer-c720 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: configured for UDMA/133
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1: EH completeNot sure if relevant: My boot and root partitions are both encrypted with LUKS. I give GRUB a passphrase for the boot partition. The boot partition then loads a key for the root from initramfs. Then boot is mounted via crypttab with another key.
$ lspci | grep ATA
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)$ sudo smartctl -a /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.1-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, [url=http://www.smartmontools.org]www.smartmontools.org[/url]
=== START OF INFORMATION SECTION ===
Device Model: SB M2 SSD
Serial Number: D45B0752062600000051
Firmware Version: S9FM02.0
User Capacity: 128,035,676,160 bytes [128 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Form Factor: < 1.8 inches
Device is: Not in smartctl database [for details use: -P showall]
ATA Version is: ACS-3 (minor revision not indicated)
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Sat Feb 13 08:34:35 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: ( 30) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 2) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 2831
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 2458
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 7
170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 35
173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 26411045
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 84
194 Temperature_Celsius 0x0023 070 070 000 Pre-fail Always - 30
196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0
218 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 108
241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 1849830
SMART Error Log Version: 1
ATA Error Count: 4
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 4 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 7f 23 00 00 40 Error: ICRC, ABRT 127 sectors at LBA = 0x00000023 = 35
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
c8 01 7f 23 00 00 40 00 00:00:02.679 READ DMA
c8 01 01 22 00 00 40 00 00:00:02.656 READ DMA
c8 01 01 00 00 00 40 00 00:00:02.656 READ DMA
ec 00 00 00 00 00 00 00 00:00:01.655 IDENTIFY DEVICE
a1 00 00 00 00 00 00 00 00:00:01.655 IDENTIFY PACKET DEVICE
Error 3 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 08 d0 1a 4b e3 Error: ICRC, ABRT at LBA = 0x034b1ad0 = 55253712
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 08 d0 1a 4b e3 08 00:04:38.412 WRITE DMA
ef 10 02 00 00 00 a0 08 00:04:38.412 SET FEATURES [Enable SATA feature]
27 00 00 00 00 00 e0 08 00:04:38.411 READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
ec 00 00 00 00 00 a0 08 00:04:38.411 IDENTIFY DEVICE
ef 03 46 00 00 00 a0 08 00:04:38.411 SET FEATURES [Set transfer mode]
Error 2 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 08 d0 1a 4b e3 Error: ICRC, ABRT at LBA = 0x034b1ad0 = 55253712
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 08 d0 1a 4b e3 08 00:04:38.082 WRITE DMA
e7 00 00 00 00 00 a0 08 00:04:37.510 FLUSH CACHE
ca 00 08 b0 d0 4d e7 08 00:04:37.510 WRITE DMA
e7 00 00 00 00 00 a0 08 00:04:37.508 FLUSH CACHE
ca 00 50 60 d0 4d e7 08 00:04:37.507 WRITE DMA
Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 58 d8 ff de e5 Error: ICRC, ABRT at LBA = 0x05deffd8 = 98500568
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
ca 00 58 d8 ff de e5 08 00:04:37.140 WRITE DMA
e7 00 00 00 00 00 a0 08 00:04:34.177 FLUSH CACHE
ca 00 08 d8 ce 4d e7 08 00:04:34.177 WRITE DMA
e7 00 00 00 00 00 a0 08 00:04:34.175 FLUSH CACHE
ca 00 08 d0 ce 4d e7 08 00:04:34.174 WRITE DMA
SMART Self-test log structure revision number 1
No self-tests have been logged. [To run self-tests, use: smartctl -t]
SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.$ sudo smartctl -H /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.1-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED$ sudo smartctl --attributes --log=selftest /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.1-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000a 100 100 000 Old_age Always - 0
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 2831
12 Power_Cycle_Count 0x0012 100 100 000 Old_age Always - 2459
168 Unknown_Attribute 0x0012 100 100 000 Old_age Always - 0
170 Unknown_Attribute 0x0013 100 100 010 Pre-fail Always - 35
173 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 26411045
192 Power-Off_Retract_Count 0x0012 100 100 000 Old_age Always - 84
194 Temperature_Celsius 0x0023 070 070 000 Pre-fail Always - 30
196 Reallocated_Event_Count 0x0000 100 100 000 Old_age Offline - 0
218 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 108
241 Total_LBAs_Written 0x0012 100 100 000 Old_age Always - 1849852
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 2831 -
# 2 Short offline Completed without error 00% 2831 -Last edited by wba072 (2016-02-13 18:17:23)
Offline
I'm still a little worried that the BIOS had not been detecting the drive at times, as this to me indicates a possible failure. But, the SMART tests don't indicate this. I followed this advice: https://wiki.archlinux.org/index.php/SS … NCQ_errors The system at least seem snappier and no errors so far. Also I didn't realize I had not enabled TRIM in the system, so that may help as well. I'll mark as solved and post again if the problem occurs again, but I think it's good now.
Offline