You are not logged in.

#1 2013-01-10 07:59:13

hobarrera
Member
From: The Netherlands
Registered: 2011-04-12
Posts: 355
Website

Short freeze, SATA error, home remounted readonly.

I'm occassionally getting short freezes (SOME applications freeze for about 10 seconds), and then /home is remounted as readonly.

According to the dmesg, there seems to be an error with the SATA link.  I'm not even sure if this is a software issue or hardware issue. I've changed the SATA cable just in case.

I can't remount rw either, I need to reboot every time.  This happens in under an hour.

And ideas? At least I'd like to know if this is hardware related, or a software issue.  Could this be the HDD?
Thanks,

[ 2370.811150] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10202 action 0xe frozen
[ 2370.811157] ata1: irq_stat 0x00400000, PHY RDY changed
[ 2370.811162] ata1: SError: { RecovComm Persist PHYRdyChg }
[ 2370.811170] ata1: hard resetting link
[ 2370.811228] ata3: exception Emask 0x10 SAct 0x0 SErr 0x90202 action 0xe frozen
[ 2370.811235] ata3: irq_stat 0x00400000, PHY RDY changed
[ 2370.811240] ata3: SError: { RecovComm Persist PHYRdyChg 10B8B }
[ 2370.811249] ata3: hard resetting link
[ 2378.033127] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2378.063730] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 2378.067397] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 2378.067402] ata1.00: configured for UDMA/133
[ 2378.079727] ata1: EH complete
[ 2378.355973] ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[ 2378.393928] ata3.00: failed to get Identify Device Data, Emask 0x1
[ 2378.401339] ata3.00: failed to get Identify Device Data, Emask 0x1
[ 2378.401344] ata3.00: configured for UDMA/133
[ 2378.412464] ata3: EH complete
[ 2405.847926] ata5: lost interrupt (Status 0x50)
[ 2405.847955] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2405.847972] ata5.00: failed command: WRITE DMA EXT
[ 2405.847981] ata5.00: cmd 35/00:08:1e:47:e5/00:00:29:00:00/e0 tag 0 dma 4096 out
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2405.847986] ata5.00: status: { DRDY }
[ 2405.848010] ata5: soft resetting link
[ 2406.017114] ata5.00: failed to get Identify Device Data, Emask 0x1
[ 2406.024958] ata5.00: failed to get Identify Device Data, Emask 0x1
[ 2406.024962] ata5.00: configured for UDMA/33
[ 2406.024966] ata5.00: device reported invalid CHS sector 0
[ 2406.024974] sd 4:0:0:0: [sdc]  
[ 2406.024975] Result: hostbyte=0x00 driverbyte=0x08
[ 2406.024977] sd 4:0:0:0: [sdc]  
[ 2406.024977] Sense Key : 0xb [current] [descriptor]
[ 2406.024979] Descriptor sense data with sense descriptors (in hex):
[ 2406.024980]         72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 
[ 2406.024985]         00 00 00 00 
[ 2406.024988] sd 4:0:0:0: [sdc]  
[ 2406.024988] ASC=0x0 ASCQ=0x0
[ 2406.024990] sd 4:0:0:0: [sdc] CDB: 
[ 2406.024990] cdb[0]=0x2a: 2a 00 29 e5 47 1e 00 00 08 00
[ 2406.024995] end_request: I/O error, dev sdc, sector 702891806
[ 2406.025010] ata5: EH complete
[ 2406.025662] Aborting journal on device dm-4-8.
[ 2406.025666] EXT4-fs error (device dm-4) in ext4_reserve_inode_write:4538: Journal has aborted
[ 2406.026484] EXT4-fs error (device dm-4) in ext4_dirty_inode:4657: Journal has aborted
[ 2406.064652] EXT4-fs error (device dm-4) in ext4_da_writepages:2397: Journal has aborted
[ 2406.093483] EXT4-fs error (device dm-4): ext4_journal_start_sb:349: Detected aborted journal
[ 2406.093491] EXT4-fs (dm-4): Remounting filesystem read-only
[ 2406.093765] EXT4-fs (dm-4): ext4_da_writepages: jbd2_start: 1024 pages, ino 31203146; err -30
[ 2406.548656] EXT4-fs error (device dm-4) in ext4_new_inode:942: Journal has aborted
[ 2406.620219] EXT4-fs error (device dm-4) in ext4_reserve_inode_write:4538: Journal has aborted

EDIT
Here's the output of

smartctl -a -d ata /dev/sdc

:
http://sprunge.us/XGed

Looks like the HDD is failing; am I right?

Last edited by hobarrera (2013-01-10 08:05:51)

Offline

#2 2013-01-10 16:17:26

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: Short freeze, SATA error, home remounted readonly.

See all those "pre-fail" and "oldage" things... yes back that shit up immediately! 

Sometimes, you can get a false positive or two with some drives.  But in your case every single f*cking test is telling you that shit is awry!

I commend your foresight of using proper tools to try and diagnose your problem yourself though.  There has been waaaaayyy too much "Function X is broken.  Whats wrong with it, will you google this for me? I don't know how..."  So thank you for not being one of those.

Last edited by WonderWoofy (2013-01-10 16:17:47)

Offline

#3 2013-01-10 16:42:29

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: Short freeze, SATA error, home remounted readonly.

I've never seen a drive that reported anything other than "pre-fail" or "old_age" for every smart attribute. And there was nothing wrong with any of those drives. I'm pretty sure a few of them were even brand new.

IOW, don't put much stock in that characteristic. What matters is the VALUE/WORST/THRESH fields, and everything looks good there.

Edit: The status of the self-test does not look good however. Since you've already swapped out the sata cable, try another sata port.

Last edited by alphaniner (2013-01-10 16:47:03)


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#4 2013-01-10 16:57:35

hobarrera
Member
From: The Netherlands
Registered: 2011-04-12
Posts: 355
Website

Re: Short freeze, SATA error, home remounted readonly.

alphaniner wrote:

I've never seen a drive that reported anything other than "pre-fail" or "old_age" for every smart attribute. And there was nothing wrong with any of those drives. I'm pretty sure a few of them were even brand new.

IOW, don't put much stock in that characteristic. What matters is the VALUE/WORST/THRESH fields, and everything looks good there.

Edit: The status of the self-test does not look good however. Since you've already swapped out the sata cable, try another sata port.

When I changed the cable I connected it to a different port as well.
I also changed the power cable (It's a modular PSU), and connected it to a different output.

I've only just noticed that ata1, ata3, and ata5 are mentioned (though only one of them has issues).  Might this be related?

Offline

#5 2013-01-10 17:07:43

kaszak696
Member
Registered: 2009-05-26
Posts: 543

Re: Short freeze, SATA error, home remounted readonly.

Try running your partitions through badblocks, if it finds any bad sectors, there is your problem.
To prevent remounting the partition read-only you can mount it with 'errors=continue' option or type

tune2fs -e continue /dev/<partition>

to make it permanent, but this is rather dangerous and not recommended.

Last edited by kaszak696 (2013-01-10 17:12:41)


'What can be asserted without evidence can also be dismissed without evidence.' - Christopher Hitchens
'There's no such thing as addiction, there's only things that you enjoy doing more than life.' - Doug Stanhope
GitHub Junkyard

Offline

#6 2013-01-11 03:56:54

hobarrera
Member
From: The Netherlands
Registered: 2011-04-12
Posts: 355
Website

Re: Short freeze, SATA error, home remounted readonly.

kaszak696 wrote:

Try running your partitions through badblocks, if it finds any bad sectors, there is your problem.
[...]

I ran badblocks and followed this guide to fix the damanged sector (which was, fortunately, unused).

Part of the issue continues; There are short freezes, and I see a SATA error on the dmesg, but no remounting.

[ 2516.721151] ata1: exception Emask 0x10 SAct 0x0 SErr 0x10202 action 0xe frozen
[ 2516.721158] ata1: irq_stat 0x00400000, PHY RDY changed
[ 2516.721163] ata1: SError: { RecovComm Persist PHYRdyChg }
[ 2516.721171] ata1: hard resetting link
[ 2524.047687] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
[ 2524.073511] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 2524.077359] ata1.00: failed to get Identify Device Data, Emask 0x1
[ 2524.077363] ata1.00: configured for UDMA/133
[ 2524.087629] ata1: EH complete
[ 2577.540268] ata5: lost interrupt (Status 0x50)
[ 2577.540290] ata5.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 2577.540296] ata5.00: failed command: FLUSH CACHE EXT
[ 2577.540304] ata5.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 0
         res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 2577.540309] ata5.00: status: { DRDY }
[ 2577.540336] ata5: soft resetting link
[ 2577.730349] ata5.00: failed to get Identify Device Data, Emask 0x1
[ 2577.737159] ata5.00: failed to get Identify Device Data, Emask 0x1
[ 2577.737164] ata5.00: configured for UDMA/33
[ 2577.737166] ata5.00: retrying FLUSH 0xea Emask 0x4
[ 2577.737345] ata5.00: device reported invalid CHS sector 0
[ 2577.737380] ata5: EH complete

I it possible that they the freezes and the bad block were separate issues?

Offline

#7 2013-01-11 14:13:58

alphaniner
Member
From: Ancapistan
Registered: 2010-07-12
Posts: 2,810

Re: Short freeze, SATA error, home remounted readonly.

If the drive is under warranty, I'd download the manufacturer's diagnostic and test with that. If the drive fails the diagnostic you'll get a failure code to expedite warranty service (for WD and Seagate at least, not sure about others). I can also give you the URLs for WD or Seagate's warranty validation pages if you want.


But whether the Constitution really be one thing, or another, this much is certain - that it has either authorized such a government as we have had, or has been powerless to prevent it. In either case, it is unfit to exist.
-Lysander Spooner

Offline

#8 2013-01-11 15:22:52

ataraxia
Member
From: Pittsburgh
Registered: 2007-05-06
Posts: 1,553

Re: Short freeze, SATA error, home remounted readonly.

I recently added a disk to a system and got a link dropout like this immediately afterwards - but on the *old* disk. I only saw it once (and that was a week ago). A bit of research has me thinking that since there are no checksum errors, it's not a data cable (or data port) problem. I think instead it's a power problem - PHY RDY becoming unset on its own often means the disk had a power failure. In my case, I used a power connector that had been unused for almost 4 years, and I didn't think to clean the dust out first, so I think I got a power-dropout when the dust clump went zap. In your case, you may have a dying power supply, or too many devices connected to one power cable. (Or, indeed, it could just be a failing disk. Your errors aren't quite the same as mine, after all.)

Offline

#9 2013-01-11 16:55:02

hobarrera
Member
From: The Netherlands
Registered: 2011-04-12
Posts: 355
Website

Re: Short freeze, SATA error, home remounted readonly.

I tried to dd if=/dev/another-disk of=/dev/null.
Similar errors ocurred  as with the primary disk, so I'm inclined to think the issue is not the disk itself, but that the disk failures are just another symptom of some other issue.

alphaniner wrote:

If the drive is under warranty, I'd download the manufacturer's diagnostic and test with that. If the drive fails the diagnostic you'll get a failure code to expedite warranty service (for WD and Seagate at least, not sure about others). I can also give you the URLs for WD or Seagate's warranty validation pages if you want.

The warranty has expired, regrettably. sad
It's a WD Caviar Black, btw, while the other disks are Caviar Greens.
It did seem that a sector was damaged, and after fixing it with badblocks, SMART now reports everything ok. However, the link resets still occur, and with all the disks.

ataraxia wrote:

I recently added a disk to a system and got a link dropout like this immediately afterwards - but on the *old* disk. I only saw it once (and that was a week ago). A bit of research has me thinking that since there are no checksum errors, it's not a data cable (or data port) problem. I think instead it's a power problem - PHY RDY becoming unset on its own often means the disk had a power failure. In my case, I used a power connector that had been unused for almost 4 years, and I didn't think to clean the dust out first, so I think I got a power-dropout when the dust clump went zap. In your case, you may have a dying power supply, or too many devices connected to one power cable. (Or, indeed, it could just be a failing disk. Your errors aren't quite the same as mine, after all.)

I dont' expect the PSU to be insufficient; it's an 850W, with a caviar black, two caviar greens and an 8800GTX. It is possible that it's nearing it's end-of-life though (it's over three years old IIRC).


Full specs for hyperion.

Offline

Board footer

Powered by FluxBB