Lockup while scrubbing btrfs RAID1; FS corrupted (no special kernel)

nstgc · 2015-02-20 16:34:07

I run a weekly script that scrubs, balances, and defrags my btrfs volumes. Today when I ran it, my system locked up. I am not using any sort of special kernel, just the standard Arch Linux kernel (3.18.6). I booted into my fall back installation with the dual purpose of scrubbing (yes, even though it may cause a panic) and checking journalctl. The last journelctl entry was 45 minutes prior to the lock up. I've so far successfully scrubbed my root volume (RAID10) and am working on the RAID1 volume for which the scrub that locked up the volume of running on. So far I'm not having any issues, however the fallback is running Linux 3.18.5 (I'm not very good about keeping it up-to-date). Do know, however, that I ran this same script last week, with this same kernel, without any issues on from btrfs.

[edit] All scrubs came back okay, however when I went to run the file checker:

$ sudo btrfs check /dev/sdd4
Checking filesystem on /dev/sdd4
UUID: b396dd50-e3f1-4b74-8cac-7d8d79a75386
checking extents
checking free space cache
checking fs roots
root 258 inode 73868 errors 400, nbytes wrong
found 18772846807 bytes used err is 1
total csum bytes: 33812424
total tree bytes: 1312423936
total fs tree bytes: 1185464320
total extent tree bytes: 75694080
btree space waste bytes: 241507236
file data blocks allocated: 254553923584
 referenced 78074642432
Btrfs v3.18.2

[edit2] Looking through https://bugzilla.kernel.org/show_bug.cgi?id=68411 led me to try the following:

 $ sudo btrfs inspect-internal inode-resolve -v 73868 /btrfs/aroot
ioctl ret=-1, error: No such file or directory

and

$ sudo find arch/ -xdev -inum 73868

the later of which returned nothing.

[edit3] I then tried to repair, but that didn't seem to work:

$ sudo btrfs check --repair /dev/sdd4
enabling repair mode
Checking filesystem on /dev/sdd4
UUID: b396dd50-e3f1-4b74-8cac-7d8d79a75386
checking extents
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
root 258 inode 73868 errors 400, nbytes wrong
checking csums
checking root refs
found 22522150802 bytes used err is 0
total csum bytes: 33812424
total tree bytes: 1413529600
total fs tree bytes: 1286193152
total extent tree bytes: 76070912
btree space waste bytes: 257807313
file data blocks allocated: 255889588224
 referenced 79780196352
Btrfs v3.18.2

[edit4] I then removed all subvolumes which were active at the time of the crash. This helped last time I had FS corruption. It did fix that one errror, but I produced another:

$ sudo btrfs check /dev/sdd4                          
Checking filesystem on /dev/sdd4
UUID: b396dd50-e3f1-4b74-8cac-7d8d79a75386
checking extents
Errors found in extent allocation tree or chunk allocation
checking free space cache
checking fs roots
checking csums
checking root refs
found 31155927271 bytes used err is 0
total csum bytes: 32777608
total tree bytes: 1265008640
total fs tree bytes: 1140654080
total extent tree bytes: 74055680
btree space waste bytes: 237550448
file data blocks allocated: 233499582464
 referenced 72320802816
Btrfs v3.18.2

I then tried the repair again.

$ sudo btrfs check --repair /dev/sdd4
enabling repair mode
Checking filesystem on /dev/sdd4
UUID: b396dd50-e3f1-4b74-8cac-7d8d79a75386
checking extents
Errors found in extent allocation tree or chunk allocation
Fixed 0 roots.
checking free space cache
cache and super generation don't match, space cache will be invalidated
checking fs roots
checking csums
checking root refs
found 31155927271 bytes used err is 0
total csum bytes: 32777608
total tree bytes: 1265008640
total fs tree bytes: 1140654080
total extent tree bytes: 74055680
btree space waste bytes: 237550448
file data blocks allocated: 233499582464
 referenced 72320802816
Btrfs v3.18.2

[edit5]

Okay, so I then scrubbed that volume (the whole thing, not just sdd4), and the FS looks clean:

$ sudo btrfs check /dev/sdd4
Checking filesystem on /dev/sdd4
UUID: b396dd50-e3f1-4b74-8cac-7d8d79a75386
checking extents
checking free space cache
checking fs roots
checking csums
checking root refs
found 30400285779 bytes used err is 0
total csum bytes: 31253824
total tree bytes: 1255374848
total fs tree bytes: 1134903296
total extent tree bytes: 71335936
btree space waste bytes: 237802283
file data blocks allocated: 232246550528
 referenced 70958514176
Btrfs v3.18.2

[edit7] All of my volumes seem to be okay now, however I haven't tried to actually boot the system. If anyone can advice any further tests I could run or logs to read, please let me know. I'd rather not attempt to boot this until I'm sure I won't be doing additional hard. [/edit7]

[edit8] I rsynced from backups and then things took a turn for the worst:

$ sudo btrfs scrub start -Bd aroot
scrub device /dev/sdd4 (id 1) done
        scrub started at Fri Feb 20 20:43:49 2015 and finished after 141 seconds
        total bytes scrubbed: 16.69GiB with 0 errors
scrub device /dev/sdf4 (id 2) done
        scrub started at Fri Feb 20 20:43:49 2015 and finished after 149 seconds
        total bytes scrubbed: 17.36GiB with 0 errors
scrub device /dev/sdb2 (id 3) done
        scrub started at Fri Feb 20 20:43:49 2015 and finished after 145 seconds
        total bytes scrubbed: 16.69GiB with 0 errors
scrub device /dev/sde4 (id 5) done
        scrub started at Fri Feb 20 20:43:49 2015 and finished after 138 seconds
        total bytes scrubbed: 17.36GiB with 1 errors
        error details: read=1
        corrected errors: 1, uncorrectable errors: 0, unverified errors: 31
WARNING: errors detected during scrubbing, corrected.

[edit9] It's been several weeks and nothing seems to be broken. This issue isn't solved since I don't know why this happened, nor am I certain that everything is okay, but for the time being it's under control.

Last edited by nstgc (2015-03-08 16:32:26)

Arch Linux

#1 2015-02-20 16:34:07

Lockup while scrubbing btrfs RAID1; FS corrupted (no special kernel)

Board footer