At the beginning I used 120Gb OCZ Vertex2 disk and started facing a following problem: at some moment all the programs inside X stop working (like opening tab in the browser or file in emacs), when I terminanted X session I saw following errors in the console (they seems to be not written to the logs):
EXT4_fs (sda1): delayed block allocation failed for inode ... at logical offset ... with max blocks 1 with error -5.
This should not happen!!! Data will be lost.
EXT4_fs: previous I/O error to superblock detected
end_request: I/O error on device sda1, logical block ...
Buffer I/O error on device sda1, logical block ...
Maybe these errors not written to the log because disk really had problems. I checked disk S.M.A.R.T. attributes using gsmartcontrol, here is an output:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 120 120 050 Pre-fail Always - 0/0
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 1656h+09m+39.140s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 668
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 37
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 1
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 030 129 000 Old_age Always - 30 (Min/Max 30/30)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/0
196 Reallocated_Event_Count 0x0033 100 100 000 Pre-fail Always - 0
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 1088
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 768
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 768
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 1344
it says that for of attributes are in the Pre_fail condition. So I bought new OCZ Vertex3 disk, successfully made a complete system backup using rsync, changed the disk in the laptop and run gsmartcontrol on a new disk. It shows 5 attributes are of Pre-fail type:
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 084 084 050 Pre-fail Always - 0/6384180
5 Retired_Block_Count 0x0033 100 100 003 Pre-fail Always - 0
9 Power_On_Hours_and_Msec 0x0032 100 100 000 Old_age Always - 14h+46m+27.780s
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 11
171 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
172 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
174 Unexpect_Power_Loss_Ct 0x0030 000 000 000 Old_age Offline - 9
177 Wear_Range_Delta 0x0000 000 000 000 Old_age Offline - 0
181 Program_Fail_Count 0x0032 000 000 000 Old_age Always - 0
182 Erase_Fail_Count 0x0032 000 000 000 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
194 Temperature_Celsius 0x0022 030 030 000 Old_age Always - 30 (Min/Max 30/30)
195 ECC_Uncorr_Error_Count 0x001c 120 120 000 Old_age Offline - 0/6384180
196 Reallocated_Event_Count 0x0033 100 100 003 Pre-fail Always - 0
201 Unc_Soft_Read_Err_Rate 0x001c 120 120 000 Old_age Offline - 0/6384180
204 Soft_ECC_Correct_Rate 0x001c 120 120 000 Old_age Offline - 0/6384180
230 Life_Curve_Status 0x0013 100 100 000 Pre-fail Always - 100
231 SSD_Life_Left 0x0013 100 100 010 Pre-fail Always - 0
233 SandForce_Internal 0x0000 000 000 000 Old_age Offline - 59
234 SandForce_Internal 0x0032 000 000 000 Old_age Always - 83
241 Lifetime_Writes_GiB 0x0032 000 000 000 Old_age Always - 83
242 Lifetime_Reads_GiB 0x0032 000 000 000 Old_age Always - 141
So I'm not sure if I can trust gsmartcontrol output, can it be that these Pre-fail values mean something different for OCZ drives?
Also how can I get EXT4_fs error if I use ext2 filesystem? Here is my /etc/fstab:
#
# /etc/fstab: static file system information
#
# <file system> <dir> <type> <options> <dump> <pass>
tmpfs /tmp tmpfs nodev,nosuid 0 0
/dev/sda1 / ext2 defaults 0 1
I saw following in the everything.log:
Mar 28 16:11:52 localhost kernel: [ 3.508463] EXT4-fs (sda1): mounting ext2 file system using the ext4 subsystem
Mar 28 16:11:52 localhost kernel: [ 3.510985] EXT4-fs (sda1): mounted filesystem without journal. Opts: (null)
Does it mean that EXT4-fs is used even for ext2 filesystems?
Also with the new disk X session hangs again, this time I cannot even terminate it, it was just a black screen, so I'm not sure if it is the same problem as with the old drive. Is it possible to enable some debug logging to get more evidence on what is happening?
]]>