Corrupted btrfs live, how to recover?

o1ivier · 2023-01-20 09:37:45

Hello,

My btrfs partition suddenly went read-only with this error:

BTRFS critical (device dm-2): corrupt node: root=7 block=328882405376 slot=12, bad key order, current (18446744073709551606 128 32971198464) next (18446744073709543414 128 32987693056)

Full log: https://pastebin.com/raw/PRLprMgs

It happened as I was working, some applications crashed and now I'm unsure how I recover from this. I write this post before attempting anything since I feel that the next step requires a reboot and I will lost the opportunity to gather some logs.

Thanks for your help.

o1ivier · 2023-01-20 09:48:30

Looks like even in read only I have some corruptions, simple bat on some files give "Input/output error"

frostschutz · 2023-01-20 09:50:13

it mentions I/O errors, is there actually a problem with the block device?

maybe ask the btrfs mailing list for that one - https://btrfs.readthedocs.io/en/latest/ … ecker.html

Last edited by frostschutz (2023-01-20 09:58:39)

just4arch · 2023-01-20 10:38:20

Apart from a smartctl output, you might also want to run a memtest, because of

write time tree block corruption detected

There's a bitflip:

0b1111111111111111111111111111111111111111111111111111111111110110
0b1111111111111111111111111111111111111111111111111101111111110110

Beyond that, BTRFS mailing list sure is the best place for fixing a broken fs.

Last edited by just4arch (2023-01-20 10:51:24)

o1ivier · 2023-01-20 11:43:11

I was able to read/write from the boot partition (ext2) on the same drive.

I rebooted now and it looks like it recovered I don't see logs for the eventual repair that was performed. I will run a memtest over the week-end.

Also smartctl doesn't show any warning or errors.

just4arch · 2023-01-20 12:40:14

o1ivier wrote:

I was able to read/write from the boot partition (ext2) on the same drive.

ext2 doesn't have checksums and won't tell you if it's eating your data.

o1ivier wrote:

I rebooted now and it looks like it recovered I don't see logs for the eventual repair that was performed. I will run a memtest over the week-end.

When you're sure your memory is ok, "btrfs check" (without repair! until the devs say so) for an unmounted fs, to see if anything else crept in.
You can run a "btrfs scrub" on the (mounted) volume for good measure.

In case you haven't already, this would be a good time to make a fresh backup - and maybe diff it with an older one to find possible corruption.

pacman -Qkk | grep -v ', 0 altered files'

might reveal damage to system files, in case the memory turns out bad.

o1ivier · 2023-01-20 21:18:09

Thanks! this is very helpful. Do you know how I can run `btrfs check` from initramfs (so avoiding it being mounted)?

I usually only backup /home, I might regret that.

The search for altered files with pacman gives a lot of results and "btrfs scrub" found 174 uncorrectable errors.

I'm starting the memtest.

just4arch · 2023-01-21 09:16:41

o1ivier wrote:

Thanks! this is very helpful. Do you know how I can run `btrfs check` from initramfs (so avoiding it being mounted)?

So that assumes you have at least / on btrfs...
I've got the btrfs executable in my initramfs

# grep btrfs /etc/mkinitcpio.conf
BINARIES=(btrfs)

so booting with "systemd.unit=emergency.target" "should" get you there (haven't had to test this in a very long time tough).
Next easiest way, just boot the Arch live iso and go from there.
If you have a working (and decently fast) network, Arch Linux Netboot is another way to get into a rescue environment.

o1ivier wrote:

I usually only backup /home, I might regret that.

Arch is quick and easy to install/setup/fix, you might be able to get away with the package list and your important config files.

o1ivier wrote:

The search for altered files with pacman gives a lot of results and "btrfs scrub" found 174 uncorrectable errors.

You haven't shared the outputs again, so some things to keep in mind:
Not every entry is "bad", eg. files in /etc/ are config files that are supposed to be adapted; libs and other stuff can be critical and you'll need to reinstall the packages.
Based on the info you provided, I'll have to assume a "single" data profile, but scrub can only fix errors if you have a mirror or parity setup with btrfs (ie. DUP or something else than RAID0), at least you know, which files definitively need to be restored from backup / reinstalled.

o1ivier wrote:

I'm starting the memtest.

This should be the first step!
Everything else above relies on memory being ok.
Wonky memory could lead to false positives or more damage along the way!
Make sure to leave it running for long enough - over night seems to be a common recommendation.

You don't have your memory overclocked by any chance?
If yes, do the errors go away when you turn it back a notch or two?

o1ivier · 2023-01-21 15:13:11

Hey, I came back for an update and you have the same suspicion as I do. So after 2 hours of memtest I already had a few errors, I then disabled XMP and let it run the whole night without errors this time. I'll run "btrfs check" now.

just4arch wrote:

so booting with "systemd.unit=emergency.target" "should" get you there (haven't had to test this in a very long time tough).

Thanks that the part that I missed. I already have the btrfs binary in my initramfs.

just4arch wrote:

at least you know, which files definitively need to be restored from backup / reinstalled

I'm not sure if I understand correctly but I don't see any list of files:

UUID:             2dad72a9-7433-4f68-9793-21f6f71aeca2
Scrub started:    Sat Jan 21 16:06:32 2023
Status:           finished
Duration:         0:04:05
Total to scrub:   362.54GiB
Rate:             1.48GiB/s
Error summary:    read=210
  Corrected:      0
  Uncorrectable:  210
  Unverified:     0

just4arch · 2023-01-21 16:58:19

o1ivier wrote:

just4arch wrote:
at least you know, which files definitively need to be restored from backup / reinstalled
I'm not sure if I understand correctly but I don't see any list of files:

IIRC dmesg should have at least the inodes mentioned (journal if you rebooted meanwhile); my google-fu took me to https://superuser.com/questions/858237/ … ble-errors

Arch Linux

#1 2023-01-20 09:37:45

Corrupted btrfs live, how to recover?

#2 2023-01-20 09:48:30

Re: Corrupted btrfs live, how to recover?

#3 2023-01-20 09:50:13

Re: Corrupted btrfs live, how to recover?

#4 2023-01-20 10:38:20

Re: Corrupted btrfs live, how to recover?

#5 2023-01-20 11:43:11

Re: Corrupted btrfs live, how to recover?

#6 2023-01-20 12:40:14

Re: Corrupted btrfs live, how to recover?

#7 2023-01-20 21:18:09

Re: Corrupted btrfs live, how to recover?

#8 2023-01-21 09:16:41

Re: Corrupted btrfs live, how to recover?

#9 2023-01-21 15:13:11

Re: Corrupted btrfs live, how to recover?

#10 2023-01-21 16:58:19

Re: Corrupted btrfs live, how to recover?

Board footer