You are not logged in.

#1 2017-04-26 17:10:57

nstgc
Member
Registered: 2014-03-17
Posts: 393

No space left on device after sending a btrfs subvol

tl;dr The first paragraphs pretty much boil down to "I used btrfs send, things went well last night, and I did it again this morning without any errors". Paragraph two: "didn't work right THIS TIME, even though it worked out before and nothing should have changed since then, and I figured out the cause was that the volume was full". Paragraph three: "all indicators show that the volume is indeed full, but I can't figure out where the extra data is". There is quite a bit of exposition, which is probably useless.

A few days ago I decided I wouldn't wait to out the odd behavior I've observered with btrfs send/receive. I already have a thread for that "issue" which as far as I can tell is completely unrelated (https://bbs.archlinux.org/viewtopic.php?id=222798). So I made hashs for some subvols and sent one off to an external HDD. Went great, no issues. Turned my computer off for the night, next morning I do the same again with another subvol. Went great, no errors. I power cycle my computer to make sure everything sync'ed and the RAM was clear before checking the hashes, something I didn't do the night before (I don't sleep well if my computer is running even though it is in another room). I'm sure there is a better way to do this, but I know power cycling works (unmounting and powering down the HDD should be enough).

In anycase, I start up tmux run rhash, switch to TTY2 and startx. I'm given some error message about being unable to lock something or other and Xuathorize this or that. "huh, that's odd" and I assume it has to do with tmux, or maybe just some odd error. Hash checks out, I exit tmux and try to startx in TTY1 with the same result. I log out of all getty and then log back in at TTY1 before trying again. When this fails I power cycle my computer again. The result is the same. run "vim .xinitrc" to check if there is a problem with that, find that it's the same as before. Now, I'm not going to post the content of X logs or .xinitrc because this is not X11 issue. It was at this point that I was tipped off that there was some issue with writing files. If I was smarter I'd probably have figured that out from the X errors, but I'm not. vim informed me that it couldn't create some file when I exited with :q. I then checked the permissions to make sure that I had them and then tried "touch can-i-touch-this" which gave me an error explicitly telling me that there is no space left on device. I checked "dmesg -l err" and all I get are the normal reminders that I'm using non-ECC ram. I then "dmesg |grep BTRFS" which turns up the normal stuff

$ dmesg|grep BTRFS
[    2.350183] BTRFS: device label jabod devid 4 transid 40513 /dev/sdb4
[    2.391250] BTRFS: device label aroot devid 4 transid 491850 /dev/sdb1
[    2.437215] BTRFS: device label raid1 devid 5 transid 109239 /dev/sdb3
[    2.530354] BTRFS: device label jabod devid 5 transid 40513 /dev/sde4
[    2.538486] BTRFS: device label aroot devid 1 transid 491850 /dev/sde1
[    2.599525] BTRFS: device label raid1 devid 1 transid 109239 /dev/sde3
[    2.656890] BTRFS: device label jabod devid 1 transid 40513 /dev/sdd3
[    2.679776] BTRFS: device label raid1 devid 3 transid 109239 /dev/sdd1
[    2.969960] BTRFS: device fsid c12367ba-8f18-479d-aed1-ad3ec90a1f7b devid 1 transid 7508 /dev/sda2
[    3.835786] BTRFS: device label raid0 devid 9 transid 156268 /dev/bcache16
[    3.861430] BTRFS: device label raid0 devid 10 transid 156268 /dev/bcache0
[    4.137247] BTRFS: device label aroot devid 6 transid 491850 /dev/sdf1
[    4.171357] BTRFS: device label jabod devid 6 transid 40513 /dev/sdf4
[    4.214275] BTRFS info (device sdf1): disk space caching is enabled
[    4.214277] BTRFS info (device sdf1): has skinny extents
[    4.294558] BTRFS: device label raid1 devid 2 transid 109239 /dev/sdf3
[    4.296716] BTRFS: device label raid0 devid 8 transid 156268 /dev/bcache32
[    6.604313] BTRFS info (device sdf1): use lzo compression
[    6.604316] BTRFS info (device sdf1): disk space caching is enabled
[    8.801559] BTRFS info (device sdf3): disk space caching is enabled
[    8.801562] BTRFS info (device sdf3): has skinny extents
[    8.879064] BTRFS info (device sdf4): use zlib compression
[    8.879066] BTRFS info (device sdf4): disk space caching is enabled
[    8.889455] BTRFS info (device bcache32): disk space caching is enabled
[    8.962225] BTRFS info (device bcache32): detected SSD devices, enabling SSD mode

"dmesg -e|more" and read through the whole thing. Nothing there.

Next, check space usage.

$ sudo btrfs fi sh /btrfs/raid1
Label: 'raid1'  uuid: 99fd7889-de7a-4b30-9745-8ccb2b1ee75d
        Total devices 4 FS bytes used 2.25TiB
        devid    1 size 1.32TiB used 1.32TiB path /dev/sde3
        devid    2 size 1.32TiB used 1.32TiB path /dev/sdf3
        devid    3 size 650.00GiB used 650.00GiB path /dev/sdd1
        devid    5 size 1.32TiB used 1.32TiB path /dev/sdb3
$ sudo btrfs fi df /btrfs/raid1
Data, RAID1: total=2.29TiB, used=2.25TiB
System, RAID1: total=32.00MiB, used=384.00KiB
Metadata, RAID1: total=4.00GiB, used=3.50GiB
GlobalReserve, single: total=512.00MiB, used=16.00KiB

Yup. It's full. My first thought was that I didn't actually mount the external drive, however, had this been the case, aroot, which containts /mnt, would be full instead, not raid1. That's part of why I have the two segregated. The script I used that contains the one-liner is

#!/bin/bash

# btrfs send -p /btrfs/$1/snapshots/$3 /btrfs/$1/snapshots/$2 |btrfs receive /btrfs/extBu/$1
btrfs send -v /btrfs/$1/snapshots/ro/$2/$3.snap | btrfs receive /mnt/ExtBU/$1/$2/

which is what I used yesterday. The only difference is that I turned the verbosity up a tad with the "-v" switch. The point being that I didn't send the subvol back onto raid1. Even if it did, I would have expected an error during the send or missing files on the receiving end. I then use "btrfs fi du -s /btrfs/raid1/*" to check that each subvol has the correct amount of data in it, and that there wasn't a spike in exclusive data. I haven't been able to confirm if the snapshots from April 16th back are okay, which weird, but I can't imagine that they would have exploded and the other, more recent ones, be untouched. I ran an  rsync dry-run from raid1 to another backup drive that is updated daily via rsync. "-i" was specified and the output was sent to a file which I will comb through whenever it's done. It's already taken several times longer than it normally does, and it's not even writing anything. I checked the file the output is being sent to and so far it's empty, which is normal since i have delays set (runs slower, but easier to read).I don't think this will show anything, and if it does, it just means I delete stuff most likely. So, i'm going to go ahead and post this now and do something constructive with my life. If in a few hours rsync --dry-run hasn't finished, I'm going to interrupt it and check each subvol individually.

Please, if you have any idea as to what is going on, or know of this happening to someone else (even if there wasn't a posted solution), please speak up. I've searched Google, there haven't been any recent updates (I didn't run an upgrade yesterday...holding my system static while I make this back up set).

$ uname -rv
4.10.11-1-ARCH #1 SMP PREEMPT Tue Apr 18 08:39:42 CEST 2017
$ pacman -Q|grep btrfs
btrfs-progs 4.10.2-1

edit: Forgot to mention that I did try to balance the data with incremented m and d usages, starting at 1%. That either balanced nothing, or returned an error about disk space. Also, rsync still hasn't finished, but the lights on my desktop are showing HDD activity. I'm connected via ssh from my laptop.

edit2: I also ran "btrfs sub find-new $SUBVOL 0 |tail" to see what might have been added, but nothings showing as abnormal. Also, rsync is still running, but it's come up with a few results, none of which are out of the ordinary. This particular volume is set up to contain data that doesn't get written to too frequently. That is to say, no cache files or system stuff. So stuff I'm working on (documents, saved games, etc.) which is easy to sift through.

edit3: Rsync finally finished. Unfortunately, it shows exactly what I had expected -- a whole lot of nothing. No mass file creations or alterations.

edit4: I did some housekeeping, and freed up some space. I was then able to run a balance. Before that I ran a scrub which came back error free. Still no clue where that space went or why. To be clear, there doesn't seem to be an explanation as to where the free space went, nor does there seem to be any issues with the volume aside from the missing space. I haven't, however, run a fsck, but I doubt that would reveal anything.

Last edited by nstgc (2017-04-27 17:09:37)

Offline

Board footer

Powered by FluxBB