You are not logged in.

#1 2024-11-16 19:18:53

jemorgan
Member
Registered: 2021-12-13
Posts: 4

Occaisonal system freezes with disk errors in dmesg

Hey there, I'm running into an issue where my system randomly locks up (can't interact with terminals, new windows, etc., however I can move the mouse around in my DE).

I'm seeing an error in dmesg that seems to correspond to when the lock-up happens. Looks like this:

[35997.592789] ata2.00: exception Emask 0x0 SAct 0x3f000000 SErr 0xd0000 action 0x6 frozen
[35997.592828] ata2: SError: { PHYRdyChg CommWake 10B8B }
[35997.592850] ata2.00: failed command: WRITE FPDMA QUEUED
[35997.592870] ata2.00: cmd 61/08:c0:b0:73:85/00:00:08:00:00/40 tag 24 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[35997.592927] ata2.00: status: { DRDY }
[35997.592943] ata2.00: failed command: WRITE FPDMA QUEUED
[35997.592963] ata2.00: cmd 61/08:c8:e8:75:85/00:00:08:00:00/40 tag 25 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[35997.593028] ata2.00: status: { DRDY }
[35997.593047] ata2.00: failed command: WRITE FPDMA QUEUED
[35997.593067] ata2.00: cmd 61/08:d0:68:76:85/00:00:08:00:00/40 tag 26 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[35997.593126] ata2.00: status: { DRDY }
[35997.593142] ata2.00: failed command: WRITE FPDMA QUEUED
[35997.593162] ata2.00: cmd 61/08:d8:28:7b:85/00:00:08:00:00/40 tag 27 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[35997.593218] ata2.00: status: { DRDY }
[35997.593231] ata2.00: failed command: WRITE FPDMA QUEUED
[35997.593247] ata2.00: cmd 61/08:e0:78:7b:85/00:00:08:00:00/40 tag 28 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[35997.593290] ata2.00: status: { DRDY }
[35997.593303] ata2.00: failed command: WRITE FPDMA QUEUED
[35997.593320] ata2.00: cmd 61/08:e8:d8:7b:85/00:00:08:00:00/40 tag 29 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[35997.593363] ata2.00: status: { DRDY }
[35997.593376] ata2: hard resetting link
[35998.062790] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[35998.068032] ata2.00: configured for UDMA/133
[35998.068040] ahci 0000:0e:00.0: port does not support device sleep
[35998.068087] ata2: EH complete
[39587.879366] ata2.00: exception Emask 0x10 SAct 0xf0000 SErr 0x4050000 action 0xe frozen
[39587.879409] ata2.00: irq_stat 0x00000040, connection status changed
[39587.879433] ata2: SError: { PHYRdyChg CommWake DevExch }
[39587.879454] ata2.00: failed command: WRITE FPDMA QUEUED
[39587.879474] ata2.00: cmd 61/08:80:78:7b:85/00:00:08:00:00/40 tag 16 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[39587.879532] ata2.00: status: { DRDY }
[39587.879548] ata2.00: failed command: WRITE FPDMA QUEUED
[39587.879568] ata2.00: cmd 61/08:88:d8:7b:85/00:00:08:00:00/40 tag 17 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[39587.879625] ata2.00: status: { DRDY }
[39587.879641] ata2.00: failed command: WRITE FPDMA QUEUED
[39587.879660] ata2.00: cmd 61/08:90:00:7c:85/00:00:08:00:00/40 tag 18 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[39587.879718] ata2.00: status: { DRDY }
[39587.879734] ata2.00: failed command: WRITE FPDMA QUEUED
[39587.879753] ata2.00: cmd 61/08:98:80:7d:85/00:00:08:00:00/40 tag 19 ncq dma 4096 out
                        res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
[39587.879811] ata2.00: status: { DRDY }
[39587.879833] ata2: hard resetting link
[39588.776254] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[39588.781462] ata2.00: configured for UDMA/133
[39588.781468] ahci 0000:0e:00.0: port does not support device sleep
[39588.781514] ata2: EH complete

As far as I can tell, the disk in question is `/dev/sdb`, which is a btrfs-formatted drive, with a subvolumes for `/` and `~/`

Fstab looks like this:

# /dev/sdb1 LABEL=arch
UUID=d9dfa3d3-a12a-44d2-b48c-c603473700e8	/         	btrfs     	rw,noatime,autodefrag,compress=zstd,commit=120,discard=async,subvol=/_active/rootvol	0 0

# /dev/sdc1
UUID=7EBB-117E      	                    /boot     	vfat      	rw,noatime,fmask=0022,dmask=0022,codepage=437,iocharset=ascii,shortname=mixed,utf8,errors=remount-ro	0 2

# /dev/sdb1 LABEL=arch
UUID=d9dfa3d3-a12a-44d2-b48c-c603473700e8	/home     	btrfs     	rw,noatime,autodefrag,compress=zstd,commit=120,discard=async,subvol=/_active/homevol	0 0

# /dev/sdb1 LABEL=arch
UUID=d9dfa3d3-a12a-44d2-b48c-c603473700e8	/mnt/defvol	btrfs     	rw,noatime,autodefrag,compress=zstd,commit=120,discard=async,subvol=/	0 0

Some smartctl output:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.7-arch1-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (    0) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

And then after waiting 10 minutes for `sudo smartctl -t long /dev/sdb`, `sudo smartctl -H /dev/sdb` reports:

smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.11.7-arch1-1] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

Anyone have any troubleshooting I can try? My understanding of the smartctl output is that the drive itself doesn't have any issues, but I'm close to just ordering a new one to see if that fixes it.

Offline

#2 2024-11-18 00:06:36

ElegantBeef
Member
Registered: 2024-11-17
Posts: 1

Re: Occaisonal system freezes with disk errors in dmesg

I have the same issue here, I have no idea the path to find the culprit. I have though found that it does not happen on the linux-lts kernel.

Not that there is any noticeable difference, but I will include my log and fstab aswell.

ata2.00: exception Emask 0x10 SAct 0x200000 SErr 0x4050000 action 0xe frozen
ata2.00: irq_stat 0x00000040, connection status changed
ata2: SError: { PHYRdyChg CommWake DevExch }
ata2.00: failed command: WRITE FPDMA QUEUED
ata2.00: cmd 61/08:a8:28:71:a9/00:00:55:00:00/40 tag 21 ncq dma 4096 out
               res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x10 (ATA bus error)
ata2.00: status: { DRDY }
UUID=f3bae795-c6a0-4c64-9d9b-3e20df34cf0e /home          btrfs   subvol=/@home,defaults,noatime,compress=zstd 0 0
UUID=f3bae795-c6a0-4c64-9d9b-3e20df34cf0e /var/cache     btrfs   subvol=/@cache,defaults,noatime,compress=zstd 0 0
UUID=f3bae795-c6a0-4c64-9d9b-3e20df34cf0e /var/log       btrfs   subvol=/@log,defaults,noatime,compress=zstd 0 0
UUID=E67B-896F                           /boot      vfat    defaults,noatime 0 2
UUID=0a0f530d-110c-46ae-bb4f-8b68db01cb34 swap           swap    defaults   0 0
tmpfs                                     /tmp           tmpfs   defaults,noatime,mode=1777 0 0
UUID=f3bae795-c6a0-4c64-9d9b-3e20df34cf0e / btrfs subvol=/@,defaults,noatime,compress=zstd 0 0
UUID=e05873fe-92ce-41a0-9429-d1a1f3c02ecb /mnt/jason/ssd2 btrfs nosuid,nodev,nofail,x-gvfs-show 0 0

Offline

#3 2024-11-18 18:05:16

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 23,289

Re: Occaisonal system freezes with disk errors in dmesg

smartctl -H is borderline useless, if you want to look at telling output a smartctl -a is necessary at the minimum (though newer smartctl versions also suggest -X for more info)

as for the ATA issues, the default power saving mode was changed somewhat recently, try whether explicitly going with max_performance helps: https://wiki.archlinux.org/title/Power_ … Management -- if you're using TLP or similar, they will configure and change this, check the configuration there.

Offline

#4 2024-11-25 17:55:26

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 693
Website

Re: Occaisonal system freezes with disk errors in dmesg

Fixing this for everyone would mean that for your drive mode a quirk is added to the linux kernel that removes the new default from your device.

Offline

Board footer

Powered by FluxBB