Btrfs defrag eats space

Miblo · 2014-06-12 20:54:26

tl;dr – btrfs defrag has stolen my space. Defrag with -clzo makes things even worse. Is there any way to get it back?

I think $(btfrs filesystem defragment -r -v /home/) is the culprit here. The other day I noticed conky showing 149GiB free space (1.64TiB used) on my /home partition and luckily found a screenshot from just a few days earlier with conky showing 610GiB free (1.19TiB used). There's no chance I downloaded / generated 461GiB in one week, although I figured that the defrag I recently ran – the first since formatting the partition – could have been in that time frame. A $(btrfs fi df /home) last night showed:

Data, single: total=1.65TiB, used=1.65TiB
System, DUP: total=8.00MiB, used=184.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=4.00GiB, used=2.90GiB
Metadata, single: total=8.00MiB, used=0.00

Those Data, single values made me think that compression had somehow not been enabled for the partition, despite it being mounted from the start with "compress=lzo" in /etc/fstab:

UUID=5ae7b53a-7438-46ef-baef-274aad3c5cd6   /home       btrfs defaults,compress=lzo    0 0

Straight after piping this output of $(btrfs fi df /home) to a file, I set $(sudo btrfs fi defragment -r -v -clzo /home/) going and let it run until a few hours ago when I killed it with my free space having dwindled to 59.2GiB. $(btrfs fi df /home) now says:

Data, single: total=1.78TiB, used=1.72TiB
System, DUP: total=8.00MiB, used=200.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=4.00GiB, used=2.95GiB
Metadata, single: total=8.00MiB, used=0.00

So the Data, single values are now different, presumably due to compression, but obviously they are both higher than they were before starting the defrag. According to the guys in this thread, enforcing compression can yield greater file sizes, but can this explain the huge leap in used space from before and after both defrag processes and is there a way to get my space back?

Am I missing something here, guys?

nagaseiori · 2014-06-13 03:07:08

You can try balancing the filesystem to recover the 'lost' space.

Spider.007 · 2014-06-13 06:54:28

Are you using snapshots? The cow's are dedupped when you defrag; so filesystem-usage increases

http://docs.oracle.com/cd/E37670_01/E37355/html/ol_use_case1_btrfs.html wrote:

Defragmenting a file or a subvolume that has a copy-on-write copy results breaks the link between the file and its copy. For example, if you defragment a subvolume that has a snapshot, the disk usage by the subvolume and its snapshot will increase because the snapshot is no longer a copy-on-write image of the subvolume.

Miblo · 2014-06-13 19:07:00

No, no snapshots, Spider.007. It's not a subvolume either, just one of the two Btrfs partitions on the disk. I'm pretty sure I haven't fallen foul of that re-duping, though. The btrfs-filesystem manpage warned me:

man btrfs-filesystem wrote:

Warning
defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of data, don’t use it if you use snapshots, have de-duplicated your data or made copies with cp --reflink.

…but since I'm way past 2.6.37 I thought this wouldn't be an issue and $(stat) seems to confirm that:

┭─┤19:40:53│matt@archon64:~
┵───╼ stat .zshrc git/miblo/dotfiles/zshrc
  File: ‘.zshrc’
  Size: 6072      	Blocks: 16         IO Block: 4096   regular file
Device: 25h/37d	Inode: 67399       Links: 2
Access: (0644/-rw-r--r--)  Uid: (  500/    matt)   Gid: (  100/   users)
Access: 2014-06-13 03:19:17.197839453 +0100
Modify: 2014-05-24 18:59:42.770537918 +0100
Change: 2014-05-24 18:59:42.843540501 +0100
 Birth: -
  File: ‘git/miblo/dotfiles/zshrc’
  Size: 6072      	Blocks: 16         IO Block: 4096   regular file
Device: 25h/37d	Inode: 67399       Links: 2
Access: (0644/-rw-r--r--)  Uid: (  500/    matt)   Gid: (  100/   users)
Access: 2014-06-13 03:19:17.197839453 +0100
Modify: 2014-05-24 18:59:42.770537918 +0100
Change: 2014-05-24 18:59:42.843540501 +0100
 Birth: -

The Device / Inode is the important thing here, isn't it?

I'll try balancing it, nagaseiori. That was my next guess and I had started a balance process last night but decided to cancel it when I realised it wasn't a quick thing. I'll start another one at a reasonable time this evening so that it can hopefully complete in good time for me to do stuff tomorrow. (Single-core CPU, guys.) If it frees some of the space, do you think it would be worth risking a third defrag without -clzo or possibly even after removing "compress=lzo" from /etc/fstab?

P.S. Pretty off-topic (sorry, OP!), but I wonder if there's a tool which compares, say, the sha1sum of directories / files in an fs and automatically points all duplicates to the same inode.

edit: Sorry, of course there's the awesomely named FSlint. Learning that may be the next task.
edit2: …or fdupes.

Last edited by Miblo (2014-06-13 20:01:03)

Miblo · 2014-06-16 02:05:31

Okay, the $(sudo btrfs balance start -v /home) process took 1½ days to turn

Data, single: total=1.76TiB, used=1.72TiB
System, DUP: total=8.00MiB, used=200.00KiB
System, single: total=4.00MiB, used=0.00
Metadata, DUP: total=4.00GiB, used=2.93GiB
Metadata, single: total=8.00MiB, used=0.00

into

Data, single: total=1.72TiB, used=1.72TiB
System, DUP: total=32.00MiB, used=196.00KiB
Metadata, DUP: total=3.50GiB, used=2.59GiB

supposedly reducing my total usage by ~40.96GiB although conky only shows a GiB or two more free space than before.

The next thing I'll try is mounting it with "clear_cache" as suggested by ooo here. eduardo.eae says that it didn't help him and he also unfortunately didn't say if / how he solved his problem (or what could have triggered it), but we seem to be in the same situation now.

Hopefully this is at least slightly useful to somebody in that I've confirmed that balancing the fs didn't solve this problem.

Just wondering, though, since $(stat) says that those two hardlinked files of mine point to the same inode, am I right to assume that they (and all other hardlinked files) survived the defrag with their de-duplication in tact? I mean, that's all that de-duplication is, isn't it – pointing two files to the same inode?

ooo · 2014-06-16 11:01:15

did you use -dusage flag with balance, as suggested by btrfs wiki? https://btrfs.wiki.kernel.org/index.php … 3E16GiB.29

Miblo · 2014-06-18 00:49:47

Ah! No, I didn't see that tip, ooo. Nor have I had chance to try mounting with "clear_cache" due to a treasure hunt on watmm, for which I need my trusty Arch box. I reckon now's as good a time as any to start balancing again, though.

$(sudo btrfs balance start -v /home -dusage=5) here we go…

edit:

┭─┤1:53:40│matt@archon64:~
┵───╼ sudo btrfs balance start -v /home -dusage=5
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=5
Done, had to relocate 0 out of 1782 chunks
┭─┤1:53:47│matt@archon64:~
┵───╼

Looks like I had my chance. "clear_cache" it is, I suppose.

Last edited by Miblo (2014-06-18 00:57:28)

progandy · 2014-06-18 01:23:36

Miblo wrote:

Ah! No, I didn't see that tip, ooo. Nor have I had chance to try mounting with "clear_cache" due to a treasure hunt on watmm, for which I need my trusty Arch box. I reckon now's as good a time as any to start balancing again, though.
$(sudo btrfs balance start -v /home -dusage=5) here we go…
edit:

There is another wiki entry for btrfs balance:

https://btrfs.wiki.kernel.org/index.php/Balance_Filters contains:
If your data chunks are misbalanced, look at how much space is really used in percentage and feed that to -dusage, asking btrfs to rebalance all chunks that are not at that threshold (bigger number means more work):
btrfs balance start -dusage=55 /mnt/btrfs

Edit: Your btrfs claims that you have 1.72 GiB of used data. Can you check if you can get that same number with du or ncdu?

Last edited by progandy (2014-06-18 16:08:48)

Bevan · 2014-06-18 16:05:24

Miblo wrote:

I'm pretty sure I haven't fallen foul of that re-duping, though. The btrfs-filesystem manpage warned me:
man btrfs-filesystem wrote:
Warning
defragmenting with kernels up to 2.6.37 will unlink COW-ed copies of data, don’t use it if you use snapshots, have de-duplicated your data or made copies with cp --reflink.
…but since I'm way past 2.6.37 I thought this wouldn't be an issue and $(stat) seems to confirm that:
┭─┤19:40:53│matt@archon64:~
┵───╼ stat .zshrc git/miblo/dotfiles/zshrc
  File: ‘.zshrc’
  Size: 6072      	Blocks: 16         IO Block: 4096   regular file
Device: 25h/37d	Inode: 67399       Links: 2
Access: (0644/-rw-r--r--)  Uid: (  500/    matt)   Gid: (  100/   users)
Access: 2014-06-13 03:19:17.197839453 +0100
Modify: 2014-05-24 18:59:42.770537918 +0100
Change: 2014-05-24 18:59:42.843540501 +0100
 Birth: -
  File: ‘git/miblo/dotfiles/zshrc’
  Size: 6072      	Blocks: 16         IO Block: 4096   regular file
Device: 25h/37d	Inode: 67399       Links: 2
Access: (0644/-rw-r--r--)  Uid: (  500/    matt)   Gid: (  100/   users)
Access: 2014-06-13 03:19:17.197839453 +0100
Modify: 2014-05-24 18:59:42.770537918 +0100
Change: 2014-05-24 18:59:42.843540501 +0100
 Birth: -
The Device / Inode is the important thing here, isn't it?

I think what we see here is just a hardlink (two paths for the same inode). That's something different than reflinks that utilize COW and allow changing the two files independently.

At the moment defrag causes duplication of COWed data. COW aware defrag was disabled because it caused issues: http://comments.gmane.org/gmane.comp.fi … trfs/32144. So if you used "cp --reflink" or snapshots this would explain the loss of free space.

Miblo · 2014-06-21 19:21:01

progandy wrote:

Miblo wrote:
Ah! No, I didn't see that tip, ooo. Nor have I had chance to try mounting with "clear_cache" due to a treasure hunt on watmm, for which I need my trusty Arch box. I reckon now's as good a time as any to start balancing again, though.
$(sudo btrfs balance start -v /home -dusage=5) here we go…
edit:
There is another wiki entry for btrfs balance:
https://btrfs.wiki.kernel.org/index.php/Balance_Filters contains:
If your data chunks are misbalanced, look at how much space is really used in percentage and feed that to -dusage, asking btrfs to rebalance all chunks that are not at that threshold (bigger number means more work):
btrfs balance start -dusage=55 /mnt/btrfs
Edit: Your btrfs claims that you have 1.72 GiB of used data. Can you check if you can get that same number with du or ncdu?

Thanks for this info, progandy. That treasure hunt and the Steam Summer Sale have (legitimately) increased my disc usage to:

# btrfs fi df /home
Data, single: total=1.75TiB, used=1.75TiB
System, DUP: total=32.00MiB, used=200.00KiB
Metadata, DUP: total=3.50GiB, used=2.71GiB

# du -sh /home
1.6T	/home

…with $(du -sh /home) reckoning ~89% of the 1.79TiB total partition size.

# btrfs balance start -v -dusage=89 /home

Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=89
Done, had to relocate 1 out of 1802 chunks

# btrfs fi df /home

Data, single: total=1.75TiB, used=1.75TiB
System, DUP: total=32.00MiB, used=200.00KiB
Metadata, DUP: total=3.50GiB, used=2.71GiB

If I pass 91 to the balance command – du's total ÷ btrfs fi df's total used – then one more chunk is relocated but no significant amount of space freed.

Bevan wrote:

Miblo wrote:
[…]
The Device / Inode is the important thing here, isn't it?
I think what we see here is just a hardlink (two paths for the same inode). That's something different than reflinks that utilize COW and allow changing the two files independently.
At the moment defrag causes duplication of COWed data. COW aware defrag was disabled because it caused issues: http://comments.gmane.org/gmane.comp.fi … trfs/32144. So if you used "cp --reflink" or snapshots this would explain the loss of free space.

Aaah, okay! Cheers for clarifying this, Bevan. So now I understand that I probably did fall foul of file duplication, although I didn't think I'd used $(cp --reflink=auto) on all that much stuff. Perhaps I did. What I don't understand, though, is why the total usage increased after both defrags. Surely everything would have been duplicated after the first defrag, so the second defrag (with -clzo, mind, if that forced compression) just needed to shuffle the stuff around.

I tried mounting it with "clear_cache" in /etc/fstab but that made no difference that I could see.

For full clarity, this is what I'm working on:

# lsinitcpio -a /boot/initramfs-linux-ck.img

==> Image: /boot/initramfs-linux-ck.img
==> Created with mkinitcpio 17
==> Kernel: 3.14.8-1-ck
==> Size: 6.85 MiB
==> Compressed with: gzip
  -> Uncompressed size: 18.43 MiB (.371 ratio)
  -> Estimated extraction time: 0.395s

==> Included modules:
  ata_generic			  firewire-sbp2			  ohci-pci			  snd-pcm
  atkbd				  floppy			  pata_acpi			  snd-pcm-oss [explicit]
  cdrom				  gameport			  pata_amd			  snd-rawmidi
  crc16				  hid				  radeon [explicit]		  snd-seq-device
  crc-itu-t			  hid-generic			  sata_nv			  snd-timer
  crc-t10dif			  hwmon				  scsi_mod			  soundcore
  crct10dif_common		  i2c-algo-bit			  sd_mod			  sr_mod
  drm				  i2c-core			  serio				  ttm
  drm_kms_helper		  i8042				  snd				  usb-common
  ehci-hcd			  jbd2				  snd-cmipci [explicit]		  usbcore
  ehci-pci			  libata			  snd-hwdep			  usbhid
  ext4				  libps2			  snd-mixer-oss			  usb-storage
  firewire-core			  mbcache			  snd-mpu401-uart
  firewire-ohci			  ohci-hcd			  snd-opl3-lib

==> Included binaries:
  fsck			  kmod			  mount			  systemd-tmpfiles
  fsck.ext4		  modprobe		  systemctl		  udevadm

==> Hook run order:
  consolefont
  keymap

/boot/grub/grub.cfg

[…]
menuentry "Arch Linux ck-kx with BFS" {
set root=(hd0,1)
linux /vmlinuz-linux-ck root=UUID=3bd79b6e-ac03-45e0-a07a-185437d7f5ec rw quiet elevator=bfq
initrd /initramfs-linux-ck.img
}
[…]

/etc/fstab | grep home

UUID=5ae7b53a-7438-46ef-baef-274aad3c5cd6   /home       btrfs defaults,compress=lzo    0 0

/proc/mounts | grep home

/dev/sdc2 /home btrfs rw,relatime,compress=lzo,space_cache 0 0

# lsblk /dev/sdc2

NAME MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sdc2   8:34   0  1.8T  0 part /home

# hwinfo

[…]
21: PCI 08.0: 0101 IDE interface
  [Created at pci.328]
  Unique ID: RE4e.c3_dLwF0BQA
  SysFS ID: /devices/pci0000:00/0000:00:08.0
  SysFS BusID: 0000:00:08.0
  Hardware Class: storage
  Model: "nVidia CK804 Serial ATA Controller"
  Vendor: pci 0x10de "nVidia Corporation"
  Device: pci 0x0055 "CK804 Serial ATA Controller"
  SubVendor: pci 0x147b "ABIT Computer Corp."
  SubDevice: pci 0x1c1a "KN8-Ultra Mainboard"
  Revision: 0xf3
  Driver: "sata_nv"
  Driver Modules: "sata_nv"
  I/O Ports: 0x9e0-0x9e7 (rw)
  I/O Ports: 0xbe0-0xbe3 (rw)
  I/O Ports: 0x960-0x967 (rw)
  I/O Ports: 0xb60-0xb63 (rw)
  I/O Ports: 0xe800-0xe80f (rw)
  Memory Range: 0xfe02e000-0xfe02efff (rw,non-prefetchable)
  IRQ: 20 (1238607 events)
  Module Alias: "pci:v000010DEd00000055sv0000147Bsd00001C1Abc01sc01i85"
  Driver Info #0:
    Driver Status: sata_nv is active
    Driver Activation Cmd: "modprobe sata_nv"
  Driver Info #1:
    Driver Status: pata_acpi is active
    Driver Activation Cmd: "modprobe pata_acpi"
  Driver Info #2:
    Driver Status: ata_generic is active
    Driver Activation Cmd: "modprobe ata_generic"
  Config Status: cfg=new, avail=yes, need=no, active=unknown

[…]

45: IDE 400.0: 10600 Disk
  [Created at block.245]
  Unique ID: _kuT.pfIVXMgRrnE
  Parent ID: RE4e.c3_dLwF0BQA
  SysFS ID: /class/block/sdc
  SysFS BusID: 4:0:0:0
  SysFS Device Link: /devices/pci0000:00/0000:00:08.0/ata5/host4/target4:0:0/4:0:0:0
  Hardware Class: disk
  Model: "ST2000DM001-1CH1"
  Device: "ST2000DM001-1CH1"
  Revision: "CC27"
  Serial ID: "Z3409SNL"
  Driver: "sata_nv", "sd"
  Driver Modules: "sata_nv"
  Device File: /dev/sdc
  Device Files: /dev/sdc, /dev/disk/by-id/ata-ST2000DM001-1CH164_Z3409SNL, /dev/disk/by-id/wwn-0x5000c5006468ec15
  Device Number: block 8:32-8:47
  Geometry (Logical): CHS 243201/255/63
  Size: 3907029168 sectors a 512 bytes
  Capacity: 1863 GB (2000398934016 bytes)
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #21 (IDE interface)

[…]

47: None 00.0: 11300 Partition
  [Created at block.414]
  Unique ID: DjND.SE1wIdpsiiC
  Parent ID: _kuT.pfIVXMgRrnE
  SysFS ID: /class/block/sdc/sdc2
  Hardware Class: partition
  Model: "Partition"
  Device File: /dev/sdc2
  Device Files: /dev/sdc2, /dev/disk/by-id/ata-ST2000DM001-1CH164_Z3409SNL-part2, /dev/disk/by-id/wwn-0x5000c5006468ec15-part2, /dev/disk/by-label/home, /dev/disk/by-partuuid/7ee2078d-68a8-4916-ab2a-b8f9d4fe0991, /dev/disk/by-uuid/5ae7b53a-7438-46ef-baef-274aad3c5cd6
  Config Status: cfg=new, avail=yes, need=no, active=unknown
  Attached to: #45 (Disk)
[…]

Any tips on mount options or kernel modules or something? Or any other weirdness sticking out?

Maybe my next attack should be to buy and copy the lot to another drive and see if that gets some of the space back, then run something like $(fdupes) on this new drive to deduplicate everything once and for all. Also, how can I see if a file is deduplicated with CoW? Something like what $(stat) does for inodes. I'm struggling to find out how to do this.

Cheers for all of your help, guys, I really appreciate it.

Miblo · 2014-07-02 00:44:47

Success!

graysky wrote:

[…]if copy off/reformat/copy back does the problem in accounting as well come back?[…]

Thank you for this, graysky.

I bought an identical 2TiB drive, partitioned and formatted it identically, mounted it just with "compress=lzo" (i.e. without "defaults") in /etc/fstab and sync'd my data across with $(rsync -uav /home/ /newhome).

/home                                            /newhome

Data, single: total=1.76TiB, used=1.76TiB        Data, single: total=1.48TiB, used=1.47TiB
System, DUP: total=32.00MiB, used=200.00KiB      System, DUP: total=8.00MiB, used=176.00KiB
Metadata, DUP: total=3.50GiB, used=2.77GiB       System, single: total=4.00MiB, used=0.00
                                                 Metadata, DUP: total=3.50GiB, used=2.54GiB
                                                 Metadata, single: total=8.00MiB, used=0.00

Strangely my system feels snappier after rotating the homes around: mpd launched quicker, ranger read directories quicker and I think Firefox seemed to load my plethora of tabs quicker too. The new partition also mounts quicker than the old one:

$ sudo systemd-analyze time; sudo systemd-analyze blame
Startup finished in 670ms (kernel) + 3.717s (initrd) + 24.221s (userspace) = 28.608s
         14.998s oldhome.mount
          5.755s home.mount
          3.062s oldvar.mount
          2.085s systemd-vconsole-setup.service
          1.496s ntpd.service
          1.357s alsa-restore.service
          1.349s connman.service
          1.048s systemd-fsck@dev-disk-by\x2duuid-c7606741\x2d46c7\x2d4320\x2d887f\x2dc24fa9e70f5f.service
          1.011s systemd-tmpfiles-setup-dev.service
           801ms systemd-remount-fs.service
           779ms systemd-random-seed.service
           718ms udisks.service
           484ms polkit.service
           459ms systemd-udev-trigger.service
           456ms connman-vpn.service
           446ms dev-mqueue.mount
           437ms systemd-journal-flush.service
           429ms dev-hugepages.mount
           423ms var.mount
           415ms systemd-logind.service
           398ms sys-kernel-config.mount
           390ms systemd-user-sessions.service
           383ms tmp.mount
           360ms boot.mount
           333ms kmod-static-nodes.service
           239ms systemd-tmpfiles-clean.service
           232ms user@500.service
           216ms sys-kernel-debug.mount
           171ms systemd-udevd.service
           147ms upower.service
           137ms systemd-sysctl.service
            37ms systemd-tmpfiles-setup.service
            35ms systemd-update-utmp.service

I'll comment out the old partitions in /etc/fstab to see if the shorter mount times of the new ones persist. If so, perhaps that old filesystem managed to get totally botched somehow. It would have been created with an older kernel and btrfs-progs, if that has anything to do with it, besides all of the commands I threw at it. Otherwise, could fragmentation be the cause of the longer mount (and read/write) times?

Next step is either running fdupes on my new home to try and maximise free space, or formatting my old home (and var) and RAID1'ing or maybe even RAID0'ing them with the new.

Relieved to get my space back.

masc · 2014-11-07 13:10:41

ran into a similar situation, trying to defrag/compress my backup volume with plenty of (uncompressed) data already in it.

running dedup after defrag (with -clzo) resolved it nicely for me.

Arch Linux

#1 2014-06-12 20:54:26

Btrfs defrag eats space

#2 2014-06-13 03:07:08

Re: Btrfs defrag eats space

#3 2014-06-13 06:54:28

Re: Btrfs defrag eats space

#4 2014-06-13 19:07:00

Re: Btrfs defrag eats space

#5 2014-06-16 02:05:31

Re: Btrfs defrag eats space

#6 2014-06-16 11:01:15

Re: Btrfs defrag eats space

#7 2014-06-18 00:49:47

Re: Btrfs defrag eats space

#8 2014-06-18 01:23:36

Re: Btrfs defrag eats space

#9 2014-06-18 16:05:24

Re: Btrfs defrag eats space

#10 2014-06-21 19:21:01

Re: Btrfs defrag eats space

#11 2014-07-02 00:44:47

Re: Btrfs defrag eats space

#12 2014-11-07 13:10:41

Re: Btrfs defrag eats space

Board footer