You are not logged in.

#1 2023-12-04 13:54:16

Potsale
Member
Registered: 2022-11-23
Posts: 2

BTRFS going readonly with errno=-5 IO failure

A few weeks ago I started having some issues with video decoding on my laptop (Framework with ) where sometimes videos would become blocky blurs of colours for a few seconds, and I assumed that there was a buggy video driver update. Then firefox tabs and discord would randomly crash, and I thought it was just due to buggy software since I've had issues with discord before. Then my disk went readonly and dmesg showed an IO error and it was clear there was a bigger issue:

 1950.963089] BTRFS error (device dm-0): tree first key mismatch detected, bytenr=526598144 parent_transid=672858 key expected=(2071,96,1620314) has=(2071,96,1620315)
[ 1950.963196] BTRFS error (device dm-0): tree first key mismatch detected, bytenr=526598144 parent_transid=672858 key expected=(2071,96,1620314) has=(2071,96,1620315)
[ 1950.963622] BTRFS error (device dm-0): tree first key mismatch detected, bytenr=526598144 parent_transid=672858 key expected=(2071,96,1620314) has=(2071,96,1620315)
...
[ 2063.619798] BTRFS error (device dm-0): tree first key mismatch detected, bytenr=526598144 parent_transid=672858 key expected=(2071,96,1620314) has=(2071,96,1620315)
[ 2063.619996] BTRFS error (device dm-0): tree first key mismatch detected, bytenr=526598144 parent_transid=672858 key expected=(2071,96,1620314) has=(2071,96,1620315)
[ 2063.620007] BTRFS error (device dm-0: state A): Transaction aborted (error -5)
[ 2063.620013] BTRFS: error (device dm-0: state A) in __btrfs_run_delayed_items:1160: errno=-5 IO failure
[ 2063.620016] BTRFS info (device dm-0: state EA): forced readonly
[ 2063.620018] BTRFS warning (device dm-0: state EA): Skipping commit of aborted transaction.
[ 2063.620019] BTRFS: error (device dm-0: state EA) in cleanup_transaction:2005: errno=-5 IO failure

I ran memtest86+ and got hundreds of errors in one pass, localized to a fairly short range of addresses. I took out one of the sticks and ran the test again and it was fine. There is filesystem damage which as I understand is normal from running BTRFS with bad RAM. A few corrupted files show up in btrfs scrub:

[ 1050.782283] BTRFS info (device dm-0): scrub: started on devid 1
[ 1055.865863] BTRFS error (device dm-0): unable to fixup (regular) error at logical 6274482176 on dev /dev/mapper/root physical 7356612608
[ 1055.865910] BTRFS warning (device dm-0): checksum error at logical 6274482176 on dev /dev/mapper/root, physical 7356612608, root 258, inode 2003029, offset 442368, length 4096, links 1 (path: username/.cache/thunderbird/x1d2aeik.default-release/cache2/entries/1315B79A321EF6E4F14F0BD5BE608524F07778E0)
[ 1067.504668] BTRFS error (device dm-0): unable to fixup (regular) error at logical 34574893056 on dev /dev/mapper/root physical 36730765312
[ 1067.505092] BTRFS warning (device dm-0): checksum error at logical 34574893056 on dev /dev/mapper/root, physical 36730765312, root 256, inode 4577526, offset 69398528, length 4096, links 1 (path: opt/zoom/cef/libcef.so)
[ 1067.674620] BTRFS error (device dm-0): unable to fixup (regular) error at logical 35006316544 on dev /dev/mapper/root physical 37162188800
[ 1067.675172] BTRFS warning (device dm-0): checksum error at logical 35006316544 on dev /dev/mapper/root, physical 37162188800, root 256, inode 4577526, offset 128602112, length 4096, links 1 (path: opt/zoom/cef/libcef.so)
[ 1068.704370] BTRFS error (device dm-0): unable to fixup (regular) error at logical 37621137408 on dev /dev/mapper/root physical 39777009664
[ 1068.705409] BTRFS warning (device dm-0): checksum error at logical 37621137408 on dev /dev/mapper/root, physical 39777009664, root 256, inode 4577855, offset 163905536, length 4096, links 1 (path: opt/zoom/zoom)
[ 1077.424947] BTRFS error (device dm-0): unable to fixup (regular) error at logical 70937477120 on dev /dev/mapper/root physical 73093349376
[ 1077.425005] BTRFS warning (device dm-0): checksum error at logical 70937477120 on dev /dev/mapper/root, physical 73093349376, root 256, inode 4826904, offset 40108032, length 4096, links 1 (path: var/lib/systemd/coredump/core.Isolated\x20Web\x20Co.1000.1bbad64063d24643b6b99e6483d8e36c.28573.1701557089000000.zst)
[ 1084.760224] BTRFS info (device dm-0): scrub: finished on devid 1 with status: 0

and there are a huge number of errors from btrfs check:

...
root 258 inode 8477613 errors 2001, no inode item, link count wrong
	unresolved ref dir 2071 index 0 namelen 40 name A3A60964B55CD43F72F64847A544633EFCE6E078 filetype 1 errors 6, no dir index, no inode ref
root 258 inode 8477614 errors 2001, no inode item, link count wrong
	unresolved ref dir 2071 index 0 namelen 40 name 28F17929B3CFF42D989E71ED5C4426412793A9E1 filetype 1 errors 6, no dir index, no inode ref
root 258 inode 8477615 errors 2001, no inode item, link count wrong
	unresolved ref dir 2071 index 0 namelen 40 name 4A43A1393979E1E442B647A38B2182A9DE85AE04 filetype 1 errors 6, no dir index, no inode ref
ERROR: errors found in fs roots
found 74129891328 bytes used, error(s) found
total csum bytes: 68063040
total tree bytes: 1019101184
total fs tree bytes: 865337344
total extent tree bytes: 62570496
btree space waste bytes: 177444345
file data blocks allocated: 128094851072
 referenced 93931008000

I am likely just going to reinstall the system since I assume there is no fixing this. However, the system is still sometimes going read-only with the IO error. Can the IO error be a result of the damaged filesystem on an otherwise fine system, or is there something more sinister going on?

Kernel: 6.6.3-arch1-1
CPU: Intel i5-1135G7
Storage: SK hynix Gold P31 (passed long SMART test)

Offline

#2 2023-12-07 03:27:34

ectospasm
Member
Registered: 2015-08-28
Posts: 301

Re: BTRFS going readonly with errno=-5 IO failure

If you're lucky, the bad RAM region likely caused the Btrfs filesystem to become corrupted and nothing more.  I hope you have backups stored somewhere else.

If you're unlucky, not only do you have bad RAM but your disk may have gone bad as well.  I'd boot off of a rescue medium (or Arch ISO) and run S.M.A.R.T. checks to see if the disk controller reports any issues with the disk.

Offline

Board footer

Powered by FluxBB