You are not logged in.
I'm after the installation of ArchLinux on the ASUS laptop.
The partitions has been created as follows:
/dev/sda3 46G /
/dev/sda2 72M /boot
/dev/sda4 230G /home
All partitions are EXT4. The /dev/sda1 partition is another Windoze NTFS partition.
Before this, there were two NTFS partitions (sda1 and sda2). sda2 has been removed and new sda2, sda3, sda4 has been created.
After running an ArchLinux install medium dmesg showed up with kernel errors. I ignored it and installed new system on the above partitions.
Now, on every boot kernel shows following errors:
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
[145B blob data]
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
end_request: I/O error, dev sda, sector 976773167
Buffer I/O error on device sda, logical block 122096645
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
[145B blob data]
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
and some logs below:
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]
Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]
Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
3a 38 60 2f
sd 0:0:0:0: [sda]
ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB:
cdb[0]=0x28: 28 00 3a 38 60 28 00 00 08 00
Do you guys think this is caused by bad sectors???
I searched the web and found some ideas.
It can be an Kernel Bug on ATA ACPI. On Ubuntu 9.10 they solved this issue by creating a file containing "options libata noacpi=1" in /etc/modprobe.d/.
http://ubuntuforums.org/archive/index.p … 34762.html
But Bugzilla says it can be an NCQ issue (if I understand correctly):
https://bugzilla.redhat.com/show_bug.cgi?id=404851
I managed to swich from AHCI to IDE and nothing changed... So I'm back on AHCI.
This part of this HDD was previously used by windozes. I think there are bad sectors but I don't know if its even related to this issue.
Whole (current) boot log pasted here:
http://pastebin.com/Z32ZDc99
Offline
Have you tried to smartctl that shit? Or rather, have you thought about checking the SMART status on the drive itself. It can be a nice glimpse into what the actual drive thinks of itself, and it is pretty honest when it gets depressed and thinks it sucks at life.
Offline
Yup. It's in smartctl database.
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.6
Device Model: ST9500325AS
Serial Number: 6VE8BETY
LU WWN Device Id: 5 000c50 027d1f358
Firmware Version: 0003SDM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Sat May 11 17:36:04 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Drive found in smartmontools Database. Drive identity strings:
MODEL: ST9500325AS
FIRMWARE: 0003SDM1
match smartmontools Drive Database entry:
MODEL REGEXP: ST9(80313|160(301|314)|(12|25)0315|250317|(320|500)325|500327|640320)ASG?
FIRMWARE REGEXP: .*
MODEL FAMILY: Seagate Momentus 5400.6
ATTRIBUTE OPTIONS: None preset; no -v options are required.
Last edited by marzecki (2013-05-11 15:38:04)
Offline
# smartctl --test=short /dev/sda
Then inspect the output for errors. If none, run a long test. If no errors, check each partition for bad sectors (lengthy process) like this (boot into a live CD if you need to check the root partition):
# e2fsck -vcck /dev/sdaX # unmount the partition first
Here is what bad blocks look like as I recently found out:
% sudo e2fsck -c -c -k -v LABEL=arch32
e2fsck 1.42.7 (21-Jan-2013)
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done
arch32: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes
Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 404766: 3658656
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 1 inodes containing multiply-claimed blocks.)
File /usr/share/icons/gnome/icon-theme.cache (inode #404766, mod time Mon Apr 22 16:01:20 2013)
has 1 multiply-claimed block(s), shared with 1 file(s):
<The bad blocks inode> (inode #1, mod time Mon Apr 22 16:19:28 2013)
Clone multiply-claimed blocks<y>? yes
Error reading block 3658656 (Attempt to read block from filesystem resulted in short read). Ignore error<y>? yes
Force rewrite<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (22605, counted=22604).
Fix<y>? yes
Free blocks count wrong for group #111 (6069, counted=6070).
Fix<y>? yes
arch32: ***** FILE SYSTEM WAS MODIFIED *****
105142 inodes used (11.46%, out of 917504)
516 non-contiguous files (0.5%)
41 non-contiguous directories (0.0%)
# of inodes with ind/dind/tind blocks: 0/0/0
Extent depth histogram: 92177/120
1051659 blocks used (28.66%, out of 3670016)
3 bad blocks
1 large file
84048 regular files
7976 directories
0 character device files
0 block device files
0 fifos
1494 links
13107 symbolic links (12835 fast symbolic links)
2 sockets
------------
106627 files
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I already performed the long test:
# smartctl -t conveyance /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Conveyance self-test routine immediately in off-line mode".
Drive command "Execute SMART Conveyance self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Sat May 11 17:44:45 2013
Results:
# smartctl -H /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022 057 043 045 Old_age Always In_the_past 43 (0 7 44 43 0)
# smartctl -l selftest /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed: read failure 90% 5759 976773167
"Completed: read failure" looks pretty strange.
Full results: (don't know if needed)
# smartctl -a /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate Momentus 5400.6
Device Model: ST9500325AS
Serial Number: 6VE8BETY
LU WWN Device Id: 5 000c50 027d1f358
Firmware Version: 0003SDM1
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Rotation Rate: 5400 rpm
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 4
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Sat May 11 17:46:39 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.
General SMART Values:
Offline data collection status: (0x00) Offline data collection activity
was never started.
Auto Offline Data Collection: Disabled.
Self-test execution status: ( 121) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: ( 0) seconds.
Offline data collection
capabilities: (0x73) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
No Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 1) minutes.
Extended self-test routine
recommended polling time: ( 137) minutes.
Conveyance self-test routine
recommended polling time: ( 2) minutes.
SCT capabilities: (0x103b) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 086 076 006 Pre-fail Always - 121757063
3 Spin_Up_Time 0x0003 098 098 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 097 097 020 Old_age Always - 4000
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 2
7 Seek_Error_Rate 0x000f 081 060 030 Pre-fail Always - 127258422
9 Power_On_Hours 0x0032 094 094 000 Old_age Always - 5759
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 097 037 020 Old_age Always - 3992
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 24198
188 Command_Timeout 0x0032 100 097 000 Old_age Always - 3078
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 057 043 045 Old_age Always In_the_past 43 (0 7 44 43 0)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 1333
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 6
193 Load_Cycle_Count 0x0032 055 055 000 Old_age Always - 90785
194 Temperature_Celsius 0x0022 043 057 000 Old_age Always - 43 (0 7 0 0 0)
195 Hardware_ECC_Recovered 0x001a 048 038 000 Old_age Always - 121757063
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 4
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 4
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
254 Free_Fall_Sensor 0x0032 100 100 000 Old_age Always - 0
SMART Error Log Version: 1
ATA Error Count: 26285 (device log contains only the most recent five errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.
Error 26285 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 07 ff ff ff 4f 00 02:01:10.572 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:10.551 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:10.550 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:10.549 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:10.549 READ FPDMA QUEUED
Error 26284 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:01:08.104 READ FPDMA QUEUED
60 00 20 ff ff ff 4f 00 02:01:08.099 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:08.094 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:08.093 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:08.084 READ FPDMA QUEUED
Error 26283 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:01:05.546 READ FPDMA QUEUED
60 00 07 ff ff ff 4f 00 02:01:05.543 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:05.543 READ FPDMA QUEUED
61 00 18 ff ff ff 4f 00 02:01:05.543 WRITE FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:05.542 READ FPDMA QUEUED
Error 26282 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 42 ff ff ff 4f 00 02:01:03.100 READ FPDMA QUEUED
60 00 07 ff ff ff 4f 00 02:01:03.099 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:03.098 READ FPDMA QUEUED
60 00 02 ff ff ff 4f 00 02:01:03.098 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:01:03.096 READ FPDMA QUEUED
Error 26281 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
When the command that caused the error occurred, the device was active or idle.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 00 ff ff ff 0f Error: UNC at LBA = 0x0fffffff = 268435455
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
60 00 08 ff ff ff 4f 00 02:00:59.911 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:00:59.910 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:00:59.910 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:00:59.906 READ FPDMA QUEUED
60 00 08 ff ff ff 4f 00 02:00:59.905 READ FPDMA QUEUED
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed: read failure 90% 5759 976773167
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
# e2fsck -vcck /dev/sdaX # unmount the partition first
Isn't that taking a risk of damaging existing data on this partition?
Offline
OK... you failed the long test so you have something bad on that disk. I would go through the non-destructive test I outlined above, and no, the -vcck does not damage data. It is a non-destructive test. If you select other switches, you WILL overwrite data. I will not confuse the thread by posting them.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
A non-destructive test might not do much I suppose, since there are pending sectors
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 4
I'm not sure what badblocks will do in this case since any reads from those sectors will fail, so badblock's non-destructive read-modify-test-write might not work, these sectors should be written to so the write will either complete successfully or fail, in the case it fails the sector should be automatically reallocated and the problem solved.
What I usually do, if these kind of sectors are not part of any file, is to fill the partition which contains these sectors with zeros, so they will be written, which should make the problem go away.
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
Dear Guys,
following scans have been carried out:
# e2fsck -vcck /dev/sda2
# e2fsck -vcck /dev/sda3
# e2fsck -vcck /dev/sda4
No bad blocks have been found
0 bad blocks
A non-destructive test might not do much I suppose, since there are pending sectors
You were right. What are pending sectors and what to do next?
Offline
OK... we have:
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 4
and we know from the kernel:
end_request: I/O error, dev sda, sector 976773167
... thats correct, because:
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed: read failure 90% 5759 976773167
This is our windozed sector.
Full kernel errors are type:
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
[145B blob data]
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
(... some text ...)
end_request: I/O error, dev sda, sector 976773167
Buffer I/O error on device sda, logical block 122096645
ata1: EH complete
... with different numbers of blocks (the same sector). Lets find out them:
# journalctl | grep "Buffer I/O error" | cut -c23- | sort | uniq -c
20 kernel: Buffer I/O error on device sda, logical block 122096645
4 kernel: Buffer I/O error on device sda4, logical block 244126291
6 kernel: Buffer I/O error on device sda4, logical block 488252582
3 kernel: Buffer I/O error on device sda4, logical block 61031572
680 kernel: Buffer I/O error on device sda, logical block 122096645
31 kernel: Buffer I/O error on device sda4, logical block 244126291
208 kernel: Buffer I/O error on device sda4, logical block 488252582
74 kernel: Buffer I/O error on device sda4, logical block 61031572
The repetition is becouse of hostname has been changed.
Ok, so we have following blocks:
# journalctl | grep "z kernel: Buffer I/O error" | cut -c78- | sort | uniq
244126291
488252582
61031572
122096645
... and surprise:
# journalctl | grep "z kernel: Buffer I/O error" | cut -c78- | sort | uniq | wc -l
4
there are four blocks! Three of them on /dev/sda4 and one on /dev/sda (which is strange for me).
So... isn't that easy?
Shouldn't I just kill those blocks with following:? (unmounted)
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=244126291 oseek=244126291 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=488252582 oseek=488252582 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=61031572 oseek=61031572 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=122096645 oseek=122096645 conv=noerror,sync
Is this operation proper (especially for /dev/sda ???) ?
Please help.
Last edited by marzecki (2013-05-13 18:25:45)
Offline
R00KIE wrote:A non-destructive test might not do much I suppose, since there are pending sectors
You were right. What are pending sectors and what to do next?
To know what a pending sector is check [1] or use google
To try to solve the problem, personally I would backup all files and do the destructive test with badblocks on the whole disk and then copy everything back.
The alternative is filling all your partitions with a file (and delete it afterwards). I do that with 'dd if=/dev/zero of=zerofile bs=10M; rm zerofile'. With luck you will overwrite the pending sectors and things should start to work, you can check the pending sector count after you use dd and see if it worked. However those sectors may be allocated to some files, in which case using the dd method will do nothing unless you delete those files first (you will need to restore those files from a known good backup anyway).
R00KIE
Tm90aGluZyB0byBzZWUgaGVyZSwgbW92ZSBhbG9uZy4K
Offline
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=244126291 oseek=244126291 conv=noerror,sync # dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=488252582 oseek=488252582 conv=noerror,sync # dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=61031572 oseek=61031572 conv=noerror,sync # dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=122096645 oseek=122096645 conv=noerror,sync
This likely won't work because of read errors.
Since you use ext, try this to find out which files or filesystem structures have been hit by bad sectors. If it's just some files and you don't mind losing them, use if=/dev/zero instead and it'll simply overwrite the bad part with zeros and make the drive reallocate this sector to spare area. Then you can restore these files from backup and all will be good.
If it's some files you care about then well, you have a problem You may try reading this sector repeatedly until it gives in, but it probably won't.
If it's filesystem data, overwriting it with zeros may casue weird problems, salvage as much as possible before nuking this sector and run fsck afterwards.
Last edited by mich41 (2013-05-14 15:39:51)
Offline
The alternative is filling all your partitions with a file (and delete it afterwards). I do that with 'dd if=/dev/zero of=zerofile bs=10M; rm zerofile'.
I tried that with no luck (on all partitions).
Since you use ext, try this to find out which files or filesystem structures have been hit by bad sectors. If it's just some files and you don't mind losing them, use if=/dev/zero instead and it'll simply overwrite the bad part with zeros and make the drive reallocate this sector to spare area. Then you can restore these files from backup and all will be good.
I tried this method and in the part wth using debugfs it said <block not found>.
I'm sure I calculated the file system block number correctly. Below explaination just in case I'm wrong, but I hope not.
Explaination (not sure if needed):
b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.
In my case:
B = 4096
L = 976773167
S = 488520585
ad. B
# tune2fs -l /dev/sda4 | grep Block
Block count: 61031572
Block size: 4096
ad. L (first post)
ad. S
# fdisk -l /dev/sda
Dysk /dev/sda: 500.1 GB, bajtów: 500107862016, sektorów: 976773168
Jednostka = sektorów, czyli 1 * 512 = 512 bajtów
Rozmiar sektora (logiczny/fizyczny) w bajtach: 512 / 512
Rozmiar we/wy (minimalny/optymalny) w bajtach: 512 / 512
Typ etykiety dysku: dos
Identyfikator dysku: 0x73636731
Urządzenie Rozruch Początek Koniec Bloków ID System
/dev/sda1 63 390700799 195350368+ 7 HPFS/NTFS/exFAT
/dev/sda2 * 390700800 390861449 80325 83 Linux
/dev/sda3 390861450 488520584 48829567+ 83 Linux
/dev/sda4 488520585 976773167 244126291+ 83 Linux
b = (int)61031572.75 = 61031572
# debugfs
debugfs 1.42.7 (21-Jan-2013)
debugfs: open /dev/sda4
debugfs: icheck 61031572
Block Inode number
61031572 <block not found>
But... performing a dd solved problems...
dd if=/dev/zero of=/dev/sda4 bs=4096 count=1 seek=61031572
There are no kernel errors like from the first post.
Current error is on /dev/sda1, which is NTFS:
# smartctl -l selftest /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.9.2-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Conveyance offline Completed: read failure 90% 5814 2159
Tried:
dd if=/dev/sda bs=512 count=1 seek=2159
... with no changes in conveyance smartctl test.
Last edited by marzecki (2013-05-17 18:50:03)
Offline