Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

gay · 2013-05-18 23:14:08

I got errors like the following:

ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x0
ata1.00: irq_stat 0x40000008
ata1.00: failed command: READ FPDMA QUEUED
ata1.00: cmd 60/08:00:68:33:32/00:00:31:00:00/40 tag 0 ncq 4096 in
         res 41/40:00:68:33:32/00:00:31:00:00/40 Emask 0x409 (media error) <F>
ata1.00: status: { DRDY ERR }
ata1.00: error: { UNC }
ata1.00: end_request: I/O error, dev sda, sector 825373544

Also: my filesystem was inconsitent; i.e. reading specific ordinary files (for instance ~/.config/dconf/user) resulted in an I/O error.

So I ran xfs_repair from a livesystem ... it seemed to work; xfs_check checked out without errors.

After rebooting into the installed system, however, I keep getting error messages like the one shown above. The cmd and res lines as well as sectors are different. Running xfs_repair (with and without -n; without, it doesnt actually do anything except for resetting the superclock, that's what it says) from the livesystem a large number of times always produces the above error giving the sector as 472904080.

I cannot run xfs_check and/or xfs_repair from the installed system because it won't unmount /.

Also I cannot find any problems with how the harddisk is connected etc. (it is screwed into the laptop; I checked that it is not loose or something).

The problem seems similar to this one https://bbs.archlinux.org/viewtopic.php?id=135872 ... which was a hardware problem.

As I understand the error message it says it couldn't read sector 825373544 (and several others) on the harddisk (sda). Is that right? If so, what does this mean? ... or what could it possibly mean? (That that disk is broken and I have to get the vendor to give me a new one? That some other hardware could be broken? Or could it still be a software problem ... if so: how do I solve it?)

nagaseiori · 2013-05-19 00:08:25

Your hard drive is probably dying. How old is it?

gay · 2013-05-19 01:23:42

Thanks, nagaseiori, for your response...

Just 4 months; I bought it in January. It is certainly not its age but nonetheless ... here we are.

You say it is probably dying. Is there a way to be certain?

In the thread I linked above, someone recommended doing a "Complete drive test" with a certain Ultimate Boot CD. Will this give me some answers? Or is there some tool in the arch install iso - or (since you can use pacman from the livesystem without problems) another arch package I could use to determine what's happening with my harddirive.

Btw: what exactly does this mean that the hard drive 'dies'? As I understand it the drive is a spinning metal disk on which data is (magnetically) 'written'; it is organized in a couple on million sectors. Since the error message says I/O error when reading sector XXXXXX ... this should mean that the hard drive couldn't read this sector (which is unlikely to be repairable)? Or more like: the hard disk wasn't willing to communicate with the system / sata driver about the contents of the respective sector?

WonderWoofy · 2013-05-19 02:11:21

Look into check the SMART status with smartmontools. There is a nice page in our wiki about how to get basic usage out of it.

gay · 2013-05-19 16:33:22

Thanks WonderWoofie. I tried to test the disk with (all 3: extended, conveyance, short) smartmontool tests. The results are ...

smartctl -l selftest /dev/sda:
(...)

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed: read failure       90%       1121         316478416
# 1  Conveyance offline  Completed: read failure       90%       1121         316478420
# 1  Short offline       Completed: read failure       90%       1121         316478416

and (http://smartmontools.sourceforge.net/badblockhowto.html says nonzero Current_Pending_Sector count (last column, "RAW_VALUE") indicates bad sectors)

smartctl -A /dev/sda
(...)

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
(...)
197 Current_Pending_Sector  0x0032   195   194   000    Old_age   Always       -       931

Now what does this tell me?

http://smartmontools.sourceforge.net/badblockhowto.html is a howto discussing how to correct bad sectors detected using smartmontools. Bad sectors are not hardware problems if I understand the howto right; are they?
The howto suggests identifying the respective bad block and overwriting it with zeros (or with whatever) using dd. It requires you to compute the location of the bad sectors; the chance of mistakes (especially since in their example the LBA numbers seem to be in hex (0x prefix), for me they seem to be decimal.) and other issues (my root partition is encrypted and I have no idea how LUKS reacts to sectors being overwritten) is substantial. So:
Does it make sense to try what they tried? (This would imply that its actually not a hardware problem. How do I find out with or without smartmontools if it is or is not?)

gay · 2013-05-20 03:07:57

Okay, it appears that bad sectors are physically ununusable. On the other hand, some people (like the redhat guys here http://www.redhat.com/archives/rhl-list … 0573.html) keep saying that it can be repaired, marked bad, or logically replaced by non-bad sectors ... Apparently it should be enough to dd zeros to the bad block to ... (to I'm not sure to do what exactly, probably to have the sector marked bad and not used again)

The more tricky question is: How do I do that. https://wiki.archlinux.org/index.php/Ba … stem_Check says to simply run "fsck -cvvk <device>" - this uses the badblocks utility to obtain a list of bad sectors and let fsck deal with it. But my filesystem is xfs, so fsck simply refuses to run professing that I should be using xfs_repair - which does not have the option to work together with badblocks. Running badblocks directly gives a list longer than one screen, but probably several hundert damaged sectors - therefore I'm not eager to try the manual option from http://smartmontools.sourceforge.net/badblockhowto.html either.

Is there another xfs utility I might try? Anything that possibly works with badblocks?
Failing that, does the following have a chance?: 1. backing up disk with ddrescue; 2. zeroing out the entire partition (should alert the disk of the bad blocks ... at least according to what the redhat guy said in the forum thread linked above) 3. dd'ing the backup bach to the partition; 4. running xfs_repair to get rid of the remaining inconsistencies

Arch Linux

#1 2013-05-18 23:14:08

Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

#2 2013-05-19 00:08:25

Re: Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

#3 2013-05-19 01:23:42

Re: Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

#4 2013-05-19 02:11:21

Re: Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

#5 2013-05-19 16:33:22

Re: Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

#6 2013-05-20 03:07:57

Re: Harddisk problem: ata1.00: end_request: I/O error, dev sda, sector ...

Board footer