You are not logged in.
Since a few days I have a problem with my ata bus. Basically at random times the system registers a high load, but without iotop or htop going insane. Then dmesg reports an ata CommWake error, which results in various I/O errors and EXT4 failures. As a response EXT4 remounts / as read-only to prevend further damage, I then need to reboot the system since I can't do anything - basically everything gives I/O errors.
Here is a dmesg output I was able to capture: https://fb.hash.works/2xMQ/plain
The main failure happens here:
[ 4193.730609] ata1: SError: { CommWake }
[ 4193.730613] ata1.00: failed command: READ FPDMA QUEUED
[ 4193.730619] ata1.00: cmd 60/b8:08:48:83:39/00:00:19:00:00/40 tag 1 ncq dma 94208 in
res 40/00:00:30:08:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
[ 4193.730622] ata1.00: status: { DRDY }
[ 4193.730625] ata1.00: failed command: READ FPDMA QUEUED
[ 4193.730630] ata1.00: cmd 60/00:e8:c0:6a:7f/01:00:00:00:00/40 tag 29 ncq dma 131072 in
res 40/00:00:30:08:00/00:00:00:00:00/e0 Emask 0x4 (timeout)
[ 4193.730632] ata1.00: status: { DRDY }
[ 4193.730634] ata1.00: failed command: READ FPDMA QUEUED
[ 4193.730639] ata1.00: cmd 60/00:f0:c0:6b:7f/01:00:00:00:00/40 tag 30 ncq dma 131072 in
res 40/00:00:06:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
[ 4193.730641] ata1.00: status: { DRDY }
[ 4193.730645] ata1: hard resetting link
[ 4199.090168] ata1: link is slow to respond, please be patient (ready=0)
[ 4203.739920] ata1: COMRESET failed (errno=-16)
[ 4203.739930] ata1: hard resetting link
[ 4209.099731] ata1: link is slow to respond, please be patient (ready=0)
[ 4213.752736] ata1: COMRESET failed (errno=-16)
[ 4213.752747] ata1: hard resetting link
[ 4219.112431] ata1: link is slow to respond, please be patient (ready=0)
[ 4248.784081] ata1: COMRESET failed (errno=-16)
[ 4248.784093] ata1: limiting SATA link speed to 3.0 Gbps
[ 4248.784096] ata1: hard resetting link
[ 4253.810417] ata1: COMRESET failed (errno=-16)
[ 4253.810431] ata1: reset failed, giving up
The "failed command" varies, I've also seen "FLUSH CACHE EXT" - I think it's just the last command that it tried.
Here is what I tried to fix this problem:
Install the plain repo kernel v4.8
Downgrade the kernel to v4.7
Downgrade the kernel to v4.6
replaced the ssd (but I did keep the system [dd])
disabled and removed tlp
Note that before I changed the SSD, I also checked smartctl, no errors.
Any idea? I might try a new plain arch installation next when I find the time.
Last edited by hashworks (2017-01-09 21:49:06)
Offline
Maybe a hardware problem
- Cable/Connector
- Disk
- System board
ThinkPads have a diagnostics routine in the UEFI/BIOS – press F10 at the ThinkPad boot logo.
Offline
The diagnostic tools are returning passing tests.
But I guess yes, a bad cable or even the controller might be an option. I guess I'm gonna try to replace the cable first, if not... I really don't know if I'm able to change swap out harder parts.
Offline