You are not logged in.
Hello
<EDIT>
The freezes have stopped as suddenly as they had started. Logs show all that all smart tests passed successfully for the last three days:
# journalctl |grep self-test
juil. 26 05:05:25 llewellyn smartd[442]: Device: /dev/sdb [SAT], previous self-test completed without error
juil. 28 02:41:32 llewellyn smartd[459]: Device: /dev/sda [SAT], previous self-test completed without error [x3]
juil. 29 02:41:32 llewellyn smartd[459]: Device: /dev/sda [SAT], previous self-test completed without error [x3]
Do you mind to share your interpretation of this behavior? Call me dumb but was this SSD getting tired of fasting (Ramadan ended yesterday) or has a strong personality... :-o ?
In the meantime have increased backups frequency.
</EDIT>
My system started to freeze from time to time with the error log below. But SMARTctl says drive's fine. Already happened six months ago then the problem vanished. Wonder how to effectively check a SSD's status?!
This workstation has two Sata Hard Drives and a SanDisk SDSSDX120GG25 15 months old.
Kernel~3.15.5-2-ck x86_64 Up~6 days Mem~1533.7/3770.6MB HDD~940.2GB(51.1% used).
1) smartctl (and hdsentinel) says drive's fine:
# smartctl -a /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.15.5-2-ck] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: SandForce Driven SSDs
Device Model: SanDisk SDSSDX120GG25
Serial Number: 120645400154
LU WWN Device Id: 5 001b44 7229bb25a
Firmware Version: R211
User Capacity: 120 034 123 776 bytes [120 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: Solid State Device
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS, ACS-2 T13/2015-D revision 3
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Wed Jul 23 11:23:05 2014 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x02) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
# smartctl -A /dev/sda
smartctl 6.2 2013-07-26 r3841 [x86_64-linux-3.15.5-2-ck] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 110 110 050 Pre-fail Always - 0/29297014
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 088 088 000 Old_age Always - 11290h+19m+12.600s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 180
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 66
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 5
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 032 035 000 Old_age Always - 32 (Min/Max 12/35)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/29297014
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/29297014
204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/29297014
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 6048
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 11062
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 11062
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 3152
2) the system freezes (for about 40 seconds) in what appears to be random, for 4 days:
$ dmesg
[511666.153499] ata1: EH complete
[511666.153512] EXT4-fs (dm-3): discard request in group:1055 block:7424 count:65 failed with -5
[511827.786342] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x50000 action 0x6 frozen
[511827.786347] ata1: SError: { PHYRdyChg CommWake }
[511827.786350] ata1.00: failed command: DATA SET MANAGEMENT
[511827.786354] ata1.00: cmd 06/01:01:00:00:00/00:00:00:00:00/a0 tag 4 dma 512 out
res 40/00:ff:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[511827.786357] ata1.00: status: { DRDY }
[511827.786360] ata1: hard resetting link
[511828.091280] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[511828.102562] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[511828.102568] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[511828.102571] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[511828.122516] ata1.00: ACPI cmd ef/10:06:00:00:00:00 (SET FEATURES) succeeded
[511828.122521] ata1.00: ACPI cmd f5/00:00:00:00:00:00 (SECURITY FREEZE LOCK) filtered out
[511828.122525] ata1.00: ACPI cmd b1/c1:00:00:00:00:00 (DEVICE CONFIGURATION OVERLAY) filtered out
[511828.132926] ata1.00: configured for UDMA/133
[511828.133329] ata1.00: device reported invalid CHS sector 0
[511828.133341] sd 0:0:0:0: [sda]
[511828.133342] Result: hostbyte=0x00 driverbyte=0x08
[511828.133344] sd 0:0:0:0: [sda]
[511828.133345] Sense Key : 0xb [current] [descriptor]
[511828.133348] Descriptor sense data with sense descriptors (in hex):
[511828.133349] 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[511828.133356] 00 00 00 00
[511828.133359] sd 0:0:0:0: [sda]
[511828.133360] ASC=0x0 ASCQ=0x0
[511828.133362] sd 0:0:0:0: [sda] CDB:
[511828.133363] cdb[0]=0x93: 93 08 00 00 00 00 04 2e 24 c2 00 00 00 40 00 00
[511828.133372] end_request: I/O error, dev sda, sector 70132930
[511828.133389] EXT4-fs (dm-3): discard request in group:1055 block:3680 count:32 failed with -5
[511828.133390] ata1: EH complete
# journalctl -xb | grep sda
juil. 17 11:25:29 llewellyn smartd[451]: Device: /dev/sda [SAT], SanDisk SDSSDX120GG25, S/N:120645400154, WWN:5-001b44-7229bb25a, FW:R211, 120 GB
(...)
juil. 18 00:25:31 llewellyn smartd[451]: Device: /dev/sda [SAT], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 108 to 109
(...)
juil. 19 02:25:31 llewellyn smartd[451]: Device: /dev/sda [SAT], starting scheduled Short Self-Test.
juil. 19 02:55:31 llewellyn smartd[451]: Device: /dev/sda [SAT], previous self-test completed without error
juil. 19 03:26:02 llewellyn smartd[451]: Device: /dev/sda [SAT], starting scheduled Long Self-Test.
juil. 19 03:55:31 llewellyn smartd[451]: Device: /dev/sda [SAT], SMART Usage Attribute: 194 Temperature_Celsius changed from 33 to 34
juil. 19 03:55:31 llewellyn smartd[451]: Device: /dev/sda [SAT], previous self-test could not complete due to a fatal or unknown error
(...)
juil. 19 11:23:09 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 19 11:23:09 llewellyn kernel: end_request: I/O error, dev sda, sector 72829890
(...)
juil. 19 18:55:15 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 19 18:55:15 llewellyn kernel: end_request: I/O error, dev sda, sector 71426050
(...)
juil. 19 22:14:50 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 19 22:14:50 llewellyn kernel: end_request: I/O error, dev sda, sector 72564994
(...)
juil. 19 23:19:04 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 19 23:19:04 llewellyn kernel: end_request: I/O error, dev sda, sector 56920066
(...)
juil. 20 03:51:16 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 20 03:51:16 llewellyn kernel: end_request: I/O error, dev sda, sector 61002754
(...)
juil. 20 11:36:13 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 20 11:36:13 llewellyn kernel: end_request: I/O error, dev sda, sector 56936898
(...)
juil. 21 22:40:37 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 21 22:40:37 llewellyn kernel: end_request: I/O error, dev sda, sector 64332610
juil. 21 22:40:37 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 21 22:40:37 llewellyn kernel: end_request: I/O error, dev sda, sector 64332610
(...)
juil. 21 22:56:40 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 21 22:56:40 llewellyn kernel: end_request: I/O error, dev sda, sector 64332226
(...)
juil. 23 06:51:20 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 23 06:51:20 llewellyn kernel: end_request: I/O error, dev sda, sector 63981570
(...)
juil. 23 09:33:24 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 23 09:33:24 llewellyn kernel: end_request: I/O error, dev sda, sector 70140418
(...)
juil. 23 09:36:06 llewellyn kernel: sd 0:0:0:0: [sda] CDB:
juil. 23 09:36:06 llewellyn kernel: end_request: I/O error, dev sda, sector 70132930
Six months ago I got the same issue. Disappeared after I upgraded one the Sata Drives' firmware (not the SSD's). strange enough.
Please help me learn how to check the real state of a SSD?. Hard to move on when you dunno if a drive's issue is hardware related
Useful References I checked but couldn't learn how to check what's causing the system's freeze:
* checking SMART's statuses or conditions by GrapefruiTgirl
* Disabling TRIM on the kernel level (Unresolved) The "failed command: DATA SET MANAGEMENT" shows that the kernel hangs while trying to issue a TRIM command
* http://www.linuxjournal.com/magazine/mo … t?page=0,0
Last edited by kozaki (2014-07-29 16:47:19)
Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery ) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9
Offline
What I find strange is that only one on the ten latest (long?) self-tests couldn't complete, all others went fine. Which one to follow is unclear to me:
juil. 25 02:35:26 llewellyn smartd[442]: Device: /dev/sdc [SAT], previous self-test completed without error ( x6 )
juil. 26 02:35:25 llewellyn smartd[442]: Device: /dev/sda [SAT], previous self-test could not complete due to a fatal or unknown error
juil. 26 02:35:25 llewellyn smartd[442]: Device: /dev/sdb [SAT], previous self-test completed without error
juil. 26 02:35:25 llewellyn smartd[442]: Device: /dev/sdc [SAT], previous self-test completed without error
juil. 26 03:35:25 llewellyn smartd[442]: Device: /dev/sda [SAT], self-test in progress, 10% remaining
juil. 26 03:35:25 llewellyn smartd[442]: Device: /dev/sdb [SAT], self-test in progress, 70% remaining
juil. 26 03:35:25 llewellyn smartd[442]: Device: /dev/sdc [SAT], self-test in progress, 60% remaining
juil. 26 04:05:25 llewellyn smartd[442]: Device: /dev/sda [SAT], previous self-test completed without error
juil. 26 04:05:25 llewellyn smartd[442]: Device: /dev/sdb [SAT], self-test in progress, 40% remaining
juil. 26 04:05:25 llewellyn smartd[442]: Device: /dev/sdc [SAT], self-test in progress, 20% remaining
juil. 26 04:35:25 llewellyn smartd[442]: Device: /dev/sdb [SAT], self-test in progress, 10% remaining
juil. 26 04:35:26 llewellyn smartd[442]: Device: /dev/sdc [SAT], previous self-test completed without error ( x2 )
Seeded last month: Arch 50 gig, derivatives 1 gig
Desktop @3.3GHz 8 gig RAM, linux-ck
laptop #1 Atom 2 gig RAM, Arch linux stock i686 (6H w/ 6yrs old battery ) #2: ARM Tegra K1, 4 gig RAM, ChrOS
Atom Z520 2 gig RAM, OMV (Debian 7) kernel 3.16 bpo on SDHC | PGP Key: 0xFF0157D9
Offline