You are not logged in.

#1 2016-02-13 14:30:56

wba072
Member
Registered: 2010-11-11
Posts: 33

[Solved] System freezes. BIOS sometimes doesn't see hard drive.

Sometimes I boot and the BIOS doesn't see the drive. Rebooting often allows it to show up and I can boot as normal. When running applications the system often freezes with the following log in journalctl:

Feb 13 08:20:56 acer-c720 kernel: ata1.00: exception Emask 0x0 SAct 0xf800000 SErr 0x400000 action 0x6 frozen
Feb 13 08:20:57 acer-c720 kernel: ata1: SError: { Handshk }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: READ FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 60/08:b8:68:84:0a/00:00:01:00:00/40 tag 23 ncq 4096 in
                                           res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:c0:20:8b:4d/00:00:07:00:00/40 tag 24 ncq 4096 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:c8:28:8b:4d/00:00:07:00:00/40 tag 25 ncq 4096 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:d0:30:8b:4d/00:00:07:00:00/40 tag 26 ncq 4096 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1.00: failed command: WRITE FPDMA QUEUED
Feb 13 08:20:57 acer-c720 kernel: ata1.00: cmd 61/08:d8:38:8b:4d/00:00:07:00:00/40 tag 27 ncq 4096 out
                                           res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: status: { DRDY }
Feb 13 08:20:57 acer-c720 kernel: ata1: hard resetting link
Feb 13 08:20:57 acer-c720 kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
Feb 13 08:20:57 acer-c720 kernel: ata1.00: configured for UDMA/133
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1.00: device reported invalid CHS sector 0
Feb 13 08:20:57 acer-c720 kernel: ata1: EH complete

Not sure if relevant: My boot and root partitions are both encrypted with LUKS. I give GRUB a passphrase for the boot partition. The boot partition then loads a key for the root from initramfs. Then boot is mounted via crypttab with another key.

$ lspci | grep ATA
00:1f.2 SATA controller: Intel Corporation 8 Series SATA Controller 1 [AHCI mode] (rev 04)
$ sudo smartctl -a /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.1-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, [url=http://www.smartmontools.org]www.smartmontools.org[/url]

=== START OF INFORMATION SECTION ===
Device Model:     SB M2 SSD
Serial Number:    D45B0752062600000051
Firmware Version: S9FM02.0
User Capacity:    128,035,676,160 bytes [128 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      < 1.8 inches
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 (minor revision not indicated)
SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Sat Feb 13 08:34:35 2016 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00)    Offline data collection activity
                    was never started.
                    Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0)    The previous self-test routine completed
                    without error or no self-test has ever 
                    been run.
Total time to complete Offline 
data collection:         (   30) seconds.
Offline data collection
capabilities:              (0x7b) SMART execute Offline immediate.
                    Auto Offline data collection on/off support.
                    Suspend Offline collection upon new
                    command.
                    Offline surface scan supported.
                    Self-test supported.
                    Conveyance Self-test supported.
                    Selective Self-test supported.
SMART capabilities:            (0x0003)    Saves SMART data before entering
                    power-saving mode.
                    Supports SMART auto save timer.
Error logging capability:        (0x01)    Error logging supported.
                    General Purpose Logging supported.
Short self-test routine 
recommended polling time:      (   1) minutes.
Extended self-test routine
recommended polling time:      (   2) minutes.
Conveyance self-test routine
recommended polling time:      (   2) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2831
 12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       2458
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       7
170 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       35
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       26411045
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       84
194 Temperature_Celsius     0x0023   070   070   000    Pre-fail  Always       -       30
196 Reallocated_Event_Count 0x0000   100   100   000    Old_age   Offline      -       0
218 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       108
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       1849830

SMART Error Log Version: 1
ATA Error Count: 4
    CR = Command Register [HEX]
    FR = Features Register [HEX]
    SC = Sector Count Register [HEX]
    SN = Sector Number Register [HEX]
    CL = Cylinder Low Register [HEX]
    CH = Cylinder High Register [HEX]
    DH = Device/Head Register [HEX]
    DC = Device Command Register [HEX]
    ER = Error register [HEX]
    ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 4 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 7f 23 00 00 40  Error: ICRC, ABRT 127 sectors at LBA = 0x00000023 = 35

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 01 7f 23 00 00 40 00      00:00:02.679  READ DMA
  c8 01 01 22 00 00 40 00      00:00:02.656  READ DMA
  c8 01 01 00 00 00 40 00      00:00:02.656  READ DMA
  ec 00 00 00 00 00 00 00      00:00:01.655  IDENTIFY DEVICE
  a1 00 00 00 00 00 00 00      00:00:01.655  IDENTIFY PACKET DEVICE

Error 3 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 d0 1a 4b e3  Error: ICRC, ABRT at LBA = 0x034b1ad0 = 55253712

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 d0 1a 4b e3 08      00:04:38.412  WRITE DMA
  ef 10 02 00 00 00 a0 08      00:04:38.412  SET FEATURES [Enable SATA feature]
  27 00 00 00 00 00 e0 08      00:04:38.411  READ NATIVE MAX ADDRESS EXT [OBS-ACS-3]
  ec 00 00 00 00 00 a0 08      00:04:38.411  IDENTIFY DEVICE
  ef 03 46 00 00 00 a0 08      00:04:38.411  SET FEATURES [Set transfer mode]

Error 2 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 d0 1a 4b e3  Error: ICRC, ABRT at LBA = 0x034b1ad0 = 55253712

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 08 d0 1a 4b e3 08      00:04:38.082  WRITE DMA
  e7 00 00 00 00 00 a0 08      00:04:37.510  FLUSH CACHE
  ca 00 08 b0 d0 4d e7 08      00:04:37.510  WRITE DMA
  e7 00 00 00 00 00 a0 08      00:04:37.508  FLUSH CACHE
  ca 00 50 60 d0 4d e7 08      00:04:37.507  WRITE DMA

Error 1 occurred at disk power-on lifetime: 0 hours (0 days + 0 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 58 d8 ff de e5  Error: ICRC, ABRT at LBA = 0x05deffd8 = 98500568

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  ca 00 58 d8 ff de e5 08      00:04:37.140  WRITE DMA
  e7 00 00 00 00 00 a0 08      00:04:34.177  FLUSH CACHE
  ca 00 08 d8 ce 4d e7 08      00:04:34.177  WRITE DMA
  e7 00 00 00 00 00 a0 08      00:04:34.175  FLUSH CACHE
  ca 00 08 d0 ce 4d e7 08      00:04:34.174  WRITE DMA

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

SMART Selective self-test log data structure revision number 0
Note: revision number not 1 implies that no selective self-test has ever been run
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
$ sudo smartctl -H /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.1-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
$ sudo smartctl --attributes --log=selftest /dev/sda
smartctl 6.4 2015-06-04 r4109 [x86_64-linux-4.4.1-2-ARCH] (local build)
Copyright (C) 2002-15, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000a   100   100   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       2831
 12 Power_Cycle_Count       0x0012   100   100   000    Old_age   Always       -       2459
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       -       0
170 Unknown_Attribute       0x0013   100   100   010    Pre-fail  Always       -       35
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       26411045
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       -       84
194 Temperature_Celsius     0x0023   070   070   000    Pre-fail  Always       -       30
196 Reallocated_Event_Count 0x0000   100   100   000    Old_age   Offline      -       0
218 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      -       108
241 Total_LBAs_Written      0x0012   100   100   000    Old_age   Always       -       1849852

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      2831         -
# 2  Short offline       Completed without error       00%      2831         -

Last edited by wba072 (2016-02-13 18:17:23)

Offline

#2 2016-02-13 18:17:04

wba072
Member
Registered: 2010-11-11
Posts: 33

Re: [Solved] System freezes. BIOS sometimes doesn't see hard drive.

I'm still a little worried that the BIOS had not been detecting the drive at times, as this to me indicates a possible failure. But, the SMART tests don't indicate this. I followed this advice: https://wiki.archlinux.org/index.php/SS … NCQ_errors The system at least seem snappier and no errors so far. Also I didn't realize I had not enabled TRIM in the system, so that may help as well. I'll mark as solved and post again if the problem occurs again, but I think it's good now.

Offline

Board footer

Powered by FluxBB