HDD errors after ArchLinux installation on partition used by Windows

marzecki · 2013-05-11 15:08:31

I'm after the installation of ArchLinux on the ASUS laptop.

The partitions has been created as follows:

/dev/sda3        46G /
/dev/sda2        72M /boot
/dev/sda4       230G /home

All partitions are EXT4. The /dev/sda1 partition is another Windoze NTFS partition.

Before this, there were two NTFS partitions (sda1 and sda2). sda2 has been removed and new sda2, sda3, sda4 has been created.

After running an ArchLinux install medium dmesg showed up with kernel errors. I ignored it and installed new system on the above partitions.

Now, on every boot kernel shows following errors:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
[145B blob data]
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }

end_request: I/O error, dev sda, sector 976773167
Buffer I/O error on device sda, logical block 122096645
ata1: EH complete
ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
[145B blob data]
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }

and some logs below:

ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
sd 0:0:0:0: [sda]  
Result: hostbyte=0x00 driverbyte=0x08
sd 0:0:0:0: [sda]  
Sense Key : 0x3 [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
        72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
        3a 38 60 2f 
sd 0:0:0:0: [sda]  
ASC=0x11 ASCQ=0x4
sd 0:0:0:0: [sda] CDB: 
cdb[0]=0x28: 28 00 3a 38 60 28 00 00 08 00

Do you guys think this is caused by bad sectors???

I searched the web and found some ideas.

It can be an Kernel Bug on ATA ACPI. On Ubuntu 9.10 they solved this issue by creating a file containing "options libata noacpi=1" in /etc/modprobe.d/.

http://ubuntuforums.org/archive/index.p … 34762.html

But Bugzilla says it can be an NCQ issue (if I understand correctly):

https://bugzilla.redhat.com/show_bug.cgi?id=404851

I managed to swich from AHCI to IDE and nothing changed... So I'm back on AHCI.

This part of this HDD was previously used by windozes. I think there are bad sectors but I don't know if its even related to this issue.

Whole (current) boot log pasted here:
http://pastebin.com/Z32ZDc99

WonderWoofy · 2013-05-11 15:22:26

Have you tried to smartctl that shit? Or rather, have you thought about checking the SMART status on the drive itself. It can be a nice glimpse into what the actual drive thinks of itself, and it is pretty honest when it gets depressed and thinks it sucks at life.

marzecki · 2013-05-11 15:37:42

Yup. It's in smartctl database.

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Momentus 5400.6
Device Model:     ST9500325AS
Serial Number:    6VE8BETY
LU WWN Device Id: 5 000c50 027d1f358
Firmware Version: 0003SDM1
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat May 11 17:36:04 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

Drive found in smartmontools Database.  Drive identity strings:
MODEL:              ST9500325AS
FIRMWARE:           0003SDM1
match smartmontools Drive Database entry:
MODEL REGEXP:       ST9(80313|160(301|314)|(12|25)0315|250317|(320|500)325|500327|640320)ASG?
FIRMWARE REGEXP:    .*
MODEL FAMILY:       Seagate Momentus 5400.6
ATTRIBUTE OPTIONS:  None preset; no -v options are required.

Last edited by marzecki (2013-05-11 15:38:04)

graysky · 2013-05-11 15:40:56

# smartctl --test=short /dev/sda

Then inspect the output for errors. If none, run a long test. If no errors, check each partition for bad sectors (lengthy process) like this (boot into a live CD if you need to check the root partition):

# e2fsck -vcck /dev/sdaX  # unmount the partition first

Here is what bad blocks look like as I recently found out:

% sudo e2fsck -c -c -k -v LABEL=arch32
e2fsck 1.42.7 (21-Jan-2013)
Checking for bad blocks (non-destructive read-write test)
Testing with random pattern: done                                                 
arch32: Updating bad block inode.
Pass 1: Checking inodes, blocks, and sizes

Running additional passes to resolve blocks claimed by more than one inode...
Pass 1B: Rescanning for multiply-claimed blocks
Multiply-claimed block(s) in inode 404766: 3658656
Pass 1C: Scanning directories for inodes with multiply-claimed blocks
Pass 1D: Reconciling multiply-claimed blocks
(There are 1 inodes containing multiply-claimed blocks.)

File /usr/share/icons/gnome/icon-theme.cache (inode #404766, mod time Mon Apr 22 16:01:20 2013) 
  has 1 multiply-claimed block(s), shared with 1 file(s):
	<The bad blocks inode> (inode #1, mod time Mon Apr 22 16:19:28 2013)
Clone multiply-claimed blocks<y>? yes
Error reading block 3658656 (Attempt to read block from filesystem resulted in short read).  Ignore error<y>? yes
Force rewrite<y>? yes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong for group #0 (22605, counted=22604).
Fix<y>? yes
Free blocks count wrong for group #111 (6069, counted=6070).
Fix<y>? yes

arch32: ***** FILE SYSTEM WAS MODIFIED *****

      105142 inodes used (11.46%, out of 917504)
         516 non-contiguous files (0.5%)
          41 non-contiguous directories (0.0%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 92177/120
     1051659 blocks used (28.66%, out of 3670016)
           3 bad blocks
           1 large file

       84048 regular files
        7976 directories
           0 character device files
           0 block device files
           0 fifos
        1494 links
       13107 symbolic links (12835 fast symbolic links)
           2 sockets
------------
      106627 files

marzecki · 2013-05-11 15:54:21

I already performed the long test:

# smartctl -t conveyance /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF OFFLINE IMMEDIATE AND SELF-TEST SECTION ===
Sending command: "Execute SMART Conveyance self-test routine immediately in off-line mode".
Drive command "Execute SMART Conveyance self-test routine immediately in off-line mode" successful.
Testing has begun.
Please wait 2 minutes for test to complete.
Test will complete after Sat May 11 17:44:45 2013

Results:

# smartctl -H /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
Please note the following marginal Attributes:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
190 Airflow_Temperature_Cel 0x0022   057   043   045    Old_age   Always   In_the_past 43 (0 7 44 43 0)

# smartctl -l selftest /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed: read failure       90%      5759         976773167

"Completed: read failure" looks pretty strange.

Full results: (don't know if needed)

# smartctl -a /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.8.11-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Family:     Seagate Momentus 5400.6
Device Model:     ST9500325AS
Serial Number:    6VE8BETY
LU WWN Device Id: 5 000c50 027d1f358
Firmware Version: 0003SDM1
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    5400 rpm
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   ATA8-ACS T13/1699-D revision 4
SATA Version is:  SATA 2.6, 3.0 Gb/s
Local Time is:    Sat May 11 17:46:39 2013 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
See vendor-specific Attribute list for marginal Attributes.

General SMART Values:
Offline data collection status:  (0x00)	Offline data collection activity
					was never started.
					Auto Offline Data Collection: Disabled.
Self-test execution status:      ( 121)	The previous self-test completed having
					the read element of the test failed.
Total time to complete Offline 
data collection: 		(    0) seconds.
Offline data collection
capabilities: 			 (0x73) SMART execute Offline immediate.
					Auto Offline data collection on/off support.
					Suspend Offline collection upon new
					command.
					No Offline surface scan supported.
					Self-test supported.
					Conveyance Self-test supported.
					Selective Self-test supported.
SMART capabilities:            (0x0003)	Saves SMART data before entering
					power-saving mode.
					Supports SMART auto save timer.
Error logging capability:        (0x01)	Error logging supported.
					General Purpose Logging supported.
Short self-test routine 
recommended polling time: 	 (   1) minutes.
Extended self-test routine
recommended polling time: 	 ( 137) minutes.
Conveyance self-test routine
recommended polling time: 	 (   2) minutes.
SCT capabilities: 	       (0x103b)	SCT Status supported.
					SCT Error Recovery Control supported.
					SCT Feature Control supported.
					SCT Data Table supported.

SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   086   076   006    Pre-fail  Always       -       121757063
  3 Spin_Up_Time            0x0003   098   098   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   097   097   020    Old_age   Always       -       4000
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       2
  7 Seek_Error_Rate         0x000f   081   060   030    Pre-fail  Always       -       127258422
  9 Power_On_Hours          0x0032   094   094   000    Old_age   Always       -       5759
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   037   020    Old_age   Always       -       3992
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       24198
188 Command_Timeout         0x0032   100   097   000    Old_age   Always       -       3078
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   057   043   045    Old_age   Always   In_the_past 43 (0 7 44 43 0)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       1333
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       6
193 Load_Cycle_Count        0x0032   055   055   000    Old_age   Always       -       90785
194 Temperature_Celsius     0x0022   043   057   000    Old_age   Always       -       43 (0 7 0 0 0)
195 Hardware_ECC_Recovered  0x001a   048   038   000    Old_age   Always       -       121757063
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       4
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       4
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
254 Free_Fall_Sensor        0x0032   100   100   000    Old_age   Always       -       0

SMART Error Log Version: 1
ATA Error Count: 26285 (device log contains only the most recent five errors)
	CR = Command Register [HEX]
	FR = Features Register [HEX]
	SC = Sector Count Register [HEX]
	SN = Sector Number Register [HEX]
	CL = Cylinder Low Register [HEX]
	CH = Cylinder High Register [HEX]
	DH = Device/Head Register [HEX]
	DC = Device Command Register [HEX]
	ER = Error register [HEX]
	ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 26285 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 07 ff ff ff 4f 00      02:01:10.572  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:10.551  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:10.550  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:10.549  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:10.549  READ FPDMA QUEUED

Error 26284 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      02:01:08.104  READ FPDMA QUEUED
  60 00 20 ff ff ff 4f 00      02:01:08.099  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:08.094  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:08.093  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:08.084  READ FPDMA QUEUED

Error 26283 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      02:01:05.546  READ FPDMA QUEUED
  60 00 07 ff ff ff 4f 00      02:01:05.543  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:05.543  READ FPDMA QUEUED
  61 00 18 ff ff ff 4f 00      02:01:05.543  WRITE FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:05.542  READ FPDMA QUEUED

Error 26282 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 42 ff ff ff 4f 00      02:01:03.100  READ FPDMA QUEUED
  60 00 07 ff ff ff 4f 00      02:01:03.099  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:03.098  READ FPDMA QUEUED
  60 00 02 ff ff ff 4f 00      02:01:03.098  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:01:03.096  READ FPDMA QUEUED

Error 26281 occurred at disk power-on lifetime: 5758 hours (239 days + 22 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 00 ff ff ff 0f  Error: UNC at LBA = 0x0fffffff = 268435455

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 00 08 ff ff ff 4f 00      02:00:59.911  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:00:59.910  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:00:59.910  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:00:59.906  READ FPDMA QUEUED
  60 00 08 ff ff ff 4f 00      02:00:59.905  READ FPDMA QUEUED

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed: read failure       90%      5759         976773167

SMART Selective self-test log data structure revision number 1
 SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

graysky wrote:

# e2fsck -vcck /dev/sdaX  # unmount the partition first

Isn't that taking a risk of damaging existing data on this partition?

graysky · 2013-05-11 16:00:20

OK... you failed the long test so you have something bad on that disk. I would go through the non-destructive test I outlined above, and no, the -vcck does not damage data. It is a non-destructive test. If you select other switches, you WILL overwrite data. I will not confuse the thread by posting them.

R00KIE · 2013-05-11 21:21:25

A non-destructive test might not do much I suppose, since there are pending sectors

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       4

I'm not sure what badblocks will do in this case since any reads from those sectors will fail, so badblock's non-destructive read-modify-test-write might not work, these sectors should be written to so the write will either complete successfully or fail, in the case it fails the sector should be automatically reallocated and the problem solved.

What I usually do, if these kind of sectors are not part of any file, is to fill the partition which contains these sectors with zeros, so they will be written, which should make the problem go away.

marzecki · 2013-05-12 16:25:55

Dear Guys,

following scans have been carried out:

# e2fsck -vcck /dev/sda2
# e2fsck -vcck /dev/sda3
# e2fsck -vcck /dev/sda4

No bad blocks have been found

           0 bad blocks

R00KIE wrote:

A non-destructive test might not do much I suppose, since there are pending sectors

You were right. What are pending sectors and what to do next?

marzecki · 2013-05-13 18:10:55

OK... we have:

197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       4

and we know from the kernel:

end_request: I/O error, dev sda, sector 976773167

... thats correct, because:

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed: read failure       90%      5759         976773167

This is our windozed sector.

Full kernel errors are type:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
[145B blob data]
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: configured for UDMA/133
sd 0:0:0:0: [sda] Unhandled sense code
 (... some text ...)
end_request: I/O error, dev sda, sector 976773167
Buffer I/O error on device sda, logical block 122096645
ata1: EH complete

... with different numbers of blocks (the same sector). Lets find out them:

# journalctl | grep "Buffer I/O error" | cut -c23- | sort | uniq -c
     20  kernel: Buffer I/O error on device sda, logical block 122096645
      4  kernel: Buffer I/O error on device sda4, logical block 244126291
      6  kernel: Buffer I/O error on device sda4, logical block 488252582
      3  kernel: Buffer I/O error on device sda4, logical block 61031572
    680 kernel: Buffer I/O error on device sda, logical block 122096645
     31 kernel: Buffer I/O error on device sda4, logical block 244126291
    208 kernel: Buffer I/O error on device sda4, logical block 488252582
     74 kernel: Buffer I/O error on device sda4, logical block 61031572

The repetition is becouse of hostname has been changed.

Ok, so we have following blocks:

# journalctl | grep "z kernel: Buffer I/O error" | cut -c78- | sort | uniq
 244126291
 488252582
 61031572
122096645

... and surprise:

# journalctl | grep "z kernel: Buffer I/O error" | cut -c78- | sort | uniq | wc -l
4

there are four blocks! Three of them on /dev/sda4 and one on /dev/sda (which is strange for me).

So... isn't that easy?
Shouldn't I just kill those blocks with following:? (unmounted)

# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=244126291 oseek=244126291 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=488252582 oseek=488252582 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=61031572 oseek=61031572 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=122096645 oseek=122096645 conv=noerror,sync

Is this operation proper (especially for /dev/sda ???) ?

Please help.

Last edited by marzecki (2013-05-13 18:25:45)

R00KIE · 2013-05-14 10:13:32

marzecki wrote:

R00KIE wrote:
A non-destructive test might not do much I suppose, since there are pending sectors
You were right. What are pending sectors and what to do next?

To know what a pending sector is check [1] or use google

To try to solve the problem, personally I would backup all files and do the destructive test with badblocks on the whole disk and then copy everything back.

The alternative is filling all your partitions with a file (and delete it afterwards). I do that with 'dd if=/dev/zero of=zerofile bs=10M; rm zerofile'. With luck you will overwrite the pending sectors and things should start to work, you can check the pending sector count after you use dd and see if it worked. However those sectors may be allocated to some files, in which case using the dd method will do nothing unless you delete those files first (you will need to restore those files from a known good backup anyway).

[1] https://en.wikipedia.org/wiki/S.M.A.R.T.

mich41 · 2013-05-14 15:38:56

marzecki wrote:

# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=244126291 oseek=244126291 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=488252582 oseek=488252582 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=61031572 oseek=61031572 conv=noerror,sync
# dd if=/dev/sda4 of=/dev/sda4 bs=512 count=1 iseek=122096645 oseek=122096645 conv=noerror,sync

This likely won't work because of read errors.

Since you use ext, try this to find out which files or filesystem structures have been hit by bad sectors. If it's just some files and you don't mind losing them, use if=/dev/zero instead and it'll simply overwrite the bad part with zeros and make the drive reallocate this sector to spare area. Then you can restore these files from backup and all will be good.

If it's some files you care about then well, you have a problem You may try reading this sector repeatedly until it gives in, but it probably won't.

If it's filesystem data, overwriting it with zeros may casue weird problems, salvage as much as possible before nuking this sector and run fsck afterwards.

Last edited by mich41 (2013-05-14 15:39:51)

marzecki · 2013-05-17 18:39:49

R00KIE wrote:

The alternative is filling all your partitions with a file (and delete it afterwards). I do that with 'dd if=/dev/zero of=zerofile bs=10M; rm zerofile'.

I tried that with no luck (on all partitions).

mich41 wrote:

Since you use ext, try this to find out which files or filesystem structures have been hit by bad sectors. If it's just some files and you don't mind losing them, use if=/dev/zero instead and it'll simply overwrite the bad part with zeros and make the drive reallocate this sector to spare area. Then you can restore these files from backup and all will be good.

I tried this method and in the part wth using debugfs it said <block not found>.
I'm sure I calculated the file system block number correctly. Below explaination just in case I'm wrong, but I hope not.

Explaination (not sure if needed):

b = (int)((L-S)*512/B)
where:
b = File System block number
B = File system block size in bytes
L = LBA of bad sector
S = Starting sector of partition as shown by fdisk -lu
and (int) denotes the integer part.

In my case:
B = 4096
L = 976773167
S = 488520585

ad. B

# tune2fs -l /dev/sda4 | grep Block
Block count:              61031572
Block size:               4096

ad. L (first post)

ad. S

# fdisk -l /dev/sda

Dysk /dev/sda: 500.1 GB, bajtów: 500107862016, sektorów: 976773168
Jednostka = sektorów, czyli 1 * 512 = 512 bajtów
Rozmiar sektora (logiczny/fizyczny) w bajtach: 512 / 512
Rozmiar we/wy (minimalny/optymalny) w bajtach: 512 / 512
Typ etykiety dysku: dos
Identyfikator dysku: 0x73636731

Urządzenie Rozruch   Początek      Koniec   Bloków   ID  System
/dev/sda1              63   390700799   195350368+   7  HPFS/NTFS/exFAT
/dev/sda2   *   390700800   390861449       80325   83  Linux
/dev/sda3       390861450   488520584    48829567+  83  Linux
/dev/sda4       488520585   976773167   244126291+  83  Linux

b = (int)61031572.75 = 61031572

# debugfs
debugfs 1.42.7 (21-Jan-2013)
debugfs:  open /dev/sda4
debugfs:  icheck 61031572
Block   Inode number
61031572        <block not found>

But... performing a dd solved problems...

dd if=/dev/zero of=/dev/sda4 bs=4096 count=1 seek=61031572

There are no kernel errors like from the first post.

Current error is on /dev/sda1, which is NTFS:

# smartctl -l selftest /dev/sda
smartctl 6.1 2013-03-16 r3800 [x86_64-linux-3.9.2-1-ARCH] (local build)
Copyright (C) 2002-13, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Conveyance offline  Completed: read failure       90%      5814         2159

Tried:

dd if=/dev/sda bs=512 count=1 seek=2159

... with no changes in conveyance smartctl test.

Last edited by marzecki (2013-05-17 18:50:03)

Arch Linux

#1 2013-05-11 15:08:31

HDD errors after ArchLinux installation on partition used by Windows

#2 2013-05-11 15:22:26

Re: HDD errors after ArchLinux installation on partition used by Windows

#3 2013-05-11 15:37:42

Re: HDD errors after ArchLinux installation on partition used by Windows

#4 2013-05-11 15:40:56

Re: HDD errors after ArchLinux installation on partition used by Windows

#5 2013-05-11 15:54:21

Re: HDD errors after ArchLinux installation on partition used by Windows

#6 2013-05-11 16:00:20

Re: HDD errors after ArchLinux installation on partition used by Windows

#7 2013-05-11 21:21:25

Re: HDD errors after ArchLinux installation on partition used by Windows

#8 2013-05-12 16:25:55

Re: HDD errors after ArchLinux installation on partition used by Windows

#9 2013-05-13 18:10:55

Re: HDD errors after ArchLinux installation on partition used by Windows

#10 2013-05-14 10:13:32

Re: HDD errors after ArchLinux installation on partition used by Windows

#11 2013-05-14 15:38:56

Re: HDD errors after ArchLinux installation on partition used by Windows

#12 2013-05-17 18:39:49

Re: HDD errors after ArchLinux installation on partition used by Windows

Board footer