You are not logged in.
I got
May 10 10:08:13 qslap kernel: sd 4:0:0:0: [sdb] Unhandled sense code
May 10 10:08:13 qslap kernel: sd 4:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08
May 10 10:08:13 qslap kernel: sd 4:0:0:0: [sdb] Sense Key : 0x3 [current]
May 10 10:08:13 qslap kernel: sd 4:0:0:0: [sdb] ASC=0x14 ASCQ=0x0
May 10 10:08:13 qslap kernel: sd 4:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 25 42 ea af 00 00 01 00
May 10 10:08:13 qslap kernel: end_request: I/O error, dev sdb, sector 625142447
May 10 10:08:13 qslap kernel: Buffer I/O error on device sdb, logical block 78142805
in system log when I try to access /dev/sdb in some way (for example, plug in, fdisk, gparted, but not palimpsest).
This kind of log repeats several times and blocks any access to that device for tens of seconds (Seems kernel keep retrying, not give up the first time), which is annoying.
From palimpsest, I can see:
Current Pending Sector Count: Value: 1 sector
Uncorrectable Sector Count: Value: 1 sector
It says when write fails, "Current Pending Sector" will be remapped automatically by hardware.
I got the sector size = 512 bytes:
# fdisk -lu /dev/sdb
Disk /dev/sdb: 320.1 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders, total 625142448 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0xaaaaaaaa
Disk /dev/sdb doesn't contain a valid partition table
badblocks detects the bad sector well:
# badblocks -svw -b 512 /dev/sdb 625142447 625142447
Checking for bad blocks in read-write mode
From block 625142447 to 625142447
Testing with pattern 0xaa: 625142447one, 0:20 elapsed
done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 1 bad blocks found.
From above, you can see that write a block one time takes 20 seconds due to kernel blocking.
badblocks writes 4 times, ~80 seconds.
Note: badblocks doesn't find any bad blocks when performing a full disk read-only test.
However, the sector wasn't automatically remapped (badblocks has already written that sector)
the kernel is still generating logs and blocking, which is very annoying.
I also tried to write at that sector directly, no luck:
# dd if=/dev/zero of=/dev/sdb bs=512 count=1 seek=625142447
dd: writing `/dev/sdb': Input/output error
1+0 records in
0+0 records out
0 bytes (0 B) copied, 7.26951 s, 0.0 kB/s
What should I do to let the hardware remap that sector?
If no way due to hardware limitation, then how can I mute the annoying log and let the kernel not blocking ?
Additional: I am looking for a way to let kernel not blocking, give up at the begining asap, or let the hardware SMART mark that sector not to be 'Pending', not for a way to create fs with bad blocks marked.
I know if I provide a list of badblocks to mkfs.*** to create a fs, these blocks will not be used.
However, when I plug in the removable harddisk, BEFORE performing ANY r/w instructions, the kernel starts to generate logs and /dev/sdb is not visible in tens of seconds. Same situation occurs when I run / fdisk / gparted (these programs are not responsible for tens of seconds due to kernel blocking) ...
I guess that SMART does these checks automatically and cause kernel blocking, while SMART can't handle these things well.
This is the output of smartctl -a /dev/sdb -d sat, which may be helpful:
smartctl 5.39.1 2010-01-28 r3054 [i686-pc-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.5 series
Device Model: ST9320320AS
Serial Number: 5SX3YFQ8
Firmware Version: SD03
User Capacity: 320,072,933,376 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: ATA-8-ACS revision 4
Local Time is: Mon May 10 11:25:42 2010 CST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 700) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 114) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103f) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 094 088 006 Pre-fail Always - 182650280
3 Spin_Up_Time 0x0003 099 099 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 595
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 30942693
9 Power_On_Hours 0x0032 095 095 000 Old_age Always - 4482
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 1
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 579
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 1812
188 Command_Timeout 0x0032 100 099 000 Old_age Always - 2
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 067 039 045 Old_age Always In_the_past 33 (0 166 39 23)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 98
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 48
193 Load_Cycle_Count 0x0032 011 011 000 Old_age Always - 178621
194 Temperature_Celsius 0x0022 033 061 000 Old_age Always - 33 (0 12 0 0)
195 Hardware_ECC_Recovered 0x001a 060 039 000 Old_age Always - 182650280
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 1
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 1
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 1979 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 1979 occurred at disk power-on lifetime: 4480 hours (186 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 da 01 ff ff ff 4f 00 13:43:15.498 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:13.155 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.887 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.887 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.886 READ DMA EXT
Error 1978 occurred at disk power-on lifetime: 4480 hours (186 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 da 01 ff ff ff 4f 00 13:43:13.155 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.887 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.887 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.886 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.886 READ DMA EXT
Error 1977 occurred at disk power-on lifetime: 4480 hours (186 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 da 01 ff ff ff 4f 00 13:43:10.887 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.887 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.886 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.886 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:10.885 READ DMA EXT
Error 1976 occurred at disk power-on lifetime: 4480 hours (186 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 da 01 ff ff ff 4f 00 13:43:08.457 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:06.082 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.814 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.813 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.813 READ DMA EXT
Error 1975 occurred at disk power-on lifetime: 4480 hours (186 days + 16 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 da 01 ff ff ff 4f 00 13:43:06.082 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.814 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.813 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.813 READ DMA EXT
25 da 01 ff ff ff 4f 00 13:43:03.813 READ DMA EXT
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 90% 4480 625142447
# 2 Short offline Completed: read failure 90% 4474 625142447
# 3 Extended offline Completed: read failure 90% 4474 625142447
# 4 Short offline Completed: read failure 90% 4474 625142447
# 5 Conveyance offline Completed: read failure 90% 4473 625142447
# 6 Short offline Completed: read failure 90% 4473 625142447
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Last edited by b6fan (2010-05-10 07:18:46)
Offline
Bad block repaired by SeaTools for DOS.
Seems only SeaTools for DOS can repair this issue.
Offline