Btrfs send not behaving in any expected way after deduplicating data.

nstgc · 2017-02-05 16:23:18

Okay, that's a fair bad title. If I think of a better one I'll change it. Or if one is suggested. This post is broken up into "usage case and expectation" "test method and result" and then "questions".

I want to send the bare minimum to an external HDD both to save space and time transfer. My thought is to use rmlint, or similar, to clone duplicates files in a recent subvolume against an older read-only snapshot, read-only snap shot that, then send with the older snapshot has parent. I expect that the deduplication won't be broken. That the data sent will be minuscule in comparison to the size of the files.

An alternative expectation would be that it doesn't work at all and the data is sent as if no cloning took place.

To test this I made two copies, cp1 and cp2, of an Arch image, and ensured that btrfs fi du showed that both files had exclusive data. I then make a subvolume, sv, and copy cp1 into it. I then make a read-only snapshot of sv, sv.r. I then copy cp2 into sv, remove cp1 from it, make another read-only snapshot, sv.r2, and send it to a file with "btrfs send -p sv.r sv.r2 -f pre-rmlint". As expected pre-rmlint is a tad larger than any one of the images. The purpose of removing cp1 was to ensure that when cloning data, sv/cp2 wouldn't be cloned against sv/cp1, but rather cloned against sv.r/cp1. Using rmlint I run "rmlint -T df --config=sh:handler-clone --keep-all-tagged sv // sv.r", which generates a bash script. I run the bash script, and confirm that cp2 has no exclusive data. Making another read-only snapshot, sv.r3, I send with "btrfs send -p sv.r sv.r3 -f post-rmlint". The result defies both expectations as it is somewhat smaller than the data contained. post-rmlint is 584450740 versus the original image size of 667942912 as reported by "ls -l".

My questions, then, are "is there something wrong with my testing methodogy", "what is send doing", and "what can I do to produce the desired behavior".

nstgc · 2017-02-08 21:16:56

So, previously, I didn't receive the file. I only sent it and looked at the output files of send. So today I started over again, in case I made some mistake without realizing it, received the file, then du'ed the result. It shows that only 80MiB of the received file that was cloned from the read-only snapshot was shared (cp1 is was is being cloned and cp2 is the clone). On the other hand all of the received, cp1, is shown as shared.

I deleted all snapshots, defragmented the files, and then tried again, thinking that may have something to do with it. Same result. I deleted the received subvolume containing cp2 and this caused the received cp1 to now show that it is 100% exclusive, so it looks like there WAS some kind of data sharing, but perhaps it wasn't being reported correctly.

Any ideas?

Soukyuu · 2017-02-09 17:38:46

I myself never really understood how send/receive works, and I think neither did most people out there.
All I can suggest is trying to get help with this on the btrfs irc channel on freenode, there are quite a few developers on there and they usually fast to answer questions.

ratcheer · 2017-02-09 19:15:49

Are you doing full send/receive, or incremental? Also, I'm not an expert on this, but I don't think you should try to de-dup the snapshot. It is your backup, why would you want to remove stuff from it? I regularly use btrfs incremental send/receive for my read-only snapshots, and they have been completely reliable for me.

Tim

nstgc · 2017-02-09 20:09:16

Soukyuu wrote:

I myself never really understood how send/receive works, and I think neither did most people out there.
All I can suggest is trying to get help with this on the btrfs irc channel on freenode, there are quite a few developers on there and they usually fast to answer questions.

I think I am just going to send something to the mailing list. I've seen dumber questions than mine asked there. Of course, I've never sent anything to a mailing list so I'm a bit apprehensive.

ratcheer wrote:

Are you doing full send/receive, or incremental? Also, I'm not an expert on this, but I don't think you should try to de-dup the snapshot. It is your backup, why would you want to remove stuff from it? I regularly use btrfs incremental send/receive for my read-only snapshots, and they have been completely reliable for me.
Tim

In my first post what I was doing was and incremental send to a file. In the second, I did a full send followed by an incremental.

I only need to store one copy in the back up. Why should I have multiple copies on the same thing in my back up when I already have three other copes (one on an unmount ext4 partition on drive that isn't part of any volume then two copies in a raid1 volume)? This isn't me clearing out my HDDs to make space. To me, that sort of redundancy is more wasteful than anything. As for the chances of a collision, I'm using a 256b hash for detection, then btrfs, apparently, checks during the actually deduping process, and then, once it's all done, I run a premade sha512. Because I'm not OCD. Nope.

I'm not concerning myself with reliability. Not that I don't CARE about reliability. An unreliable back up is a liability. I want to cram as much as possible. If I can fit just one extra snap shot in that is a huge boon for me, since this back up is meant to be all about versioning.

Last edited by nstgc (2017-02-09 20:12:35)

ratcheer · 2017-02-10 00:09:37

Incremental send/receive produces fully minimized snapshots on the receiving end. Only the blocks that have changed since the previous incremental are stored on the receiving filesystem. If you want to keep a minimal set of snapshots, you can delete the older ones in the order of oldest first, then next oldest, etc. The blocks to maintain the full snapshot are maintained. It looks like there is a full set of files and directories in the most recent snapshots, but like with Apple's Time Machine, that is only in appearance.

For example, the original (oldest) snapshot is all the blocks of the source subvolume, at that point in time. Then, if an incremental is sent and say 237 blocks had changed since the original snapshot, the new snapshot would contain 237 physical blocks, but if you look at it, it would look like the entire subvolume, with all directories and files. If you look at a file that has been changed since the original snapshot, you get the original file, but if you look at it in the newer snapshot, you see the changed version of the file as of the point in time the newer snapshot was made.

But, if you delete an intermediate snapshot, there can be no expectation of consistency from first to last. The incrementals will be consistent as long as you keep a contiguous set from the oldest remaining one through the most recent one.

Tim

Last edited by ratcheer (2017-02-10 00:12:14)

nstgc · 2017-02-10 03:06:33

ratcheer wrote:

Incremental send/receive produces fully minimized snapshots on the receiving end.

Indeed that is the idea, however in practice, as my test shows, it is not the reality. Btrfs is sending more data then exists in the subvolume being sent, and I'm not just talking about the instructions needed to recreate the subvolume.

Only the blocks that have changed since the previous incremental are stored on the receiving filesystem. If you want to keep a minimal set of snapshots, you can delete the older ones in the order of oldest first, then next oldest, etc. The blocks to maintain the full snapshot are maintained. It looks like there is a full set of files and directories in the most recent snapshots, but like with Apple's Time Machine, that is only in appearance.

I was using btrfs fi du to verify whether or not full copies were transferred. Indeed, it doesn't just appear as if there are two different copies of a file, but in actuality there is also two copies of the data as well, save for 80MiB.

For example, the original (oldest) snapshot is all the blocks of the source subvolume, at that point in time. Then, if an incremental is sent and say 237 blocks had changed since the original snapshot, the new snapshot would contain 237 physical blocks, but if you look at it, it would look like the entire subvolume, with all directories and files. If you look at a file that has been changed since the original snapshot, you get the original file, but if you look at it in the newer snapshot, you see the changed version of the file as of the point in time the newer snapshot was made.

Actually, to my surprise, this is not true. That is indeed what all documentation I've seen indicates, but that is not what is happening. Gen numbers don't seem to actually come into play. To test this your self make two copies of the same data. One will have a newer gen number than the other, but regardless of which you choose to use as the parent, you will end up sending whatever is missing, even if it's older. In fact, you don't even need the subvolumes to be related at all! I'm sure that is contrary to every piece of documentation you've seen, but it is what my tests have indicated. Of course, for my use case this is irrelevant. I will always be using an older snapshot as parent.

But, if you delete an intermediate snapshot, there can be no expectation of consistency from first to last. The incremental will be consistent as long as you keep a contiguous set from the oldest remaining one through the most recent one.
Tim

Didn't test this.

So, yeah, I'm only asking about one discrepancy, but as you can tell from reading the previous part of this post, there is more that runs counter to what the btrfs documentation is telling us besides send not actually sending the subvolume as it exists on the source side.

If you don't believe me, please try it yourself. With any luck you can't reproduce the behavior I'm seeing! I might just have some weird bug or I may be doing something wrong.

Edit: in case I'm just not explaining myself properly (very probable) I'll post a script that demonstrates the behavior I'm seeing which differs from what is expected. This all might be a miss understanding.

Last edited by nstgc (2017-02-10 03:47:18)

rdeckard · 2017-02-10 13:56:56

Here's a script that uses btrfs incremental snapshots with btrfs send/receive:

https://github.com/wesbarnett/snap-sync

It uses snapper for some parts of the process just to keep track of things. Ultimately its just a process of:

1. Create "snap1" snapshot on local disk.
2. Send to external disk via: "btrfs send snap1 | btrfs receive backuplocation"
3. Modify some data on local disk.
4. Delete "snap1" on local disk.
5. Create "snap2" snapshot on local disk.
6. Send only changes to external disk via: "btrfs send snap2 -c snap1 | btrfs receive backuplocation"

So now you have two snapshots on your local disk as well as on your external disk named "snap1" and "snap2".

Furthermore you can:

1. Delete "snap2" on local disk.
2. Modify some data on local disk.
3. Create new snapshot "snap3" on local disk.
4. Send changes between "snap2" and "snap3" to external disk via: "btrfs send snap3 -c snap2 | btrfs receive backuplocation"

I have not looked at the exact numbers of disk usage when doing this process, but I know that the first time I send a snapshot to an external disk, it takes a *long* time, and that when I send incremental snapshots after that it takes very little time.

Last edited by rdeckard (2017-02-10 14:00:04)

ratcheer · 2017-02-10 15:36:49

Ok, this is still somewhat confusing, but here goes.

I created a new subvolume and copied several files into it, 69,100 blocks. I created a first snapshot, which also contained 69,100 blocks. I used send/receive to send it to another disk drive. The received snapshot contained 69,100 blocks.

I added one file to the subvolume. Now, the subvolume contained 72,928 blocks. I made a second snapshot of the subvolume. The second snapshot had 72,928 blocks.

Before sending the incremental, I measured the blocks used by the receiving subvolume; it was 311,894,456. Then I did the incremental send/receive of snapshot1 and snapshot 2. After doing this, the receiving subvolume shows as using 311,967,384 blocks, or 72,928 more than before. This seems to support what you are complaining about. However, as the prior poster noted, the second send/receive took much less time than the first. Hmmm...

Digging deeper, I investigated the man page of "btrfs filesystem du". Here is what I found:

           Calculate disk usage of the target files using FIEMAP. For individual files, it will report a count of total bytes, and
           exclusive (not shared) bytes. We also calculate a set shared value which is described below.

           Each argument to btrfs fi du will have a set shared value calculated for it. We define each set as those files found by a
           recursive search of an argument. The set shared value then is a sum of all shared space referenced by the set.

           set shared takes into account overlapping shared extents, hence it isn’t as simple as adding up shared extents.

So, I ran this and what I found was:

  67.47MiB       0.00B           -  /media/seagpart1/btrfsbackups/snap1
  71.21MiB     3.74MiB           -  /media/seagpart1/btrfsbackups/snap2

That 3.74 MiB not shared value is exactly the size of the new file I added to the source subvolume before making the second snapshot.

So, I believe that seeing the second snapshot on the receiving filesystem as using the full amount of space is still an illusion of the btrfs filesystem. It is showing the second snapshot as the full original size of the source snapshot to avoid confusion about whether it is all there, but in reality, only the difference is not shared between the two snapshots on the receiving filesystem. I believe the mystery is resolved.

Tim

Arch Linux

#1 2017-02-05 16:23:18

Btrfs send not behaving in any expected way after deduplicating data.

#2 2017-02-08 21:16:56

Re: Btrfs send not behaving in any expected way after deduplicating data.

#3 2017-02-09 17:38:46

Re: Btrfs send not behaving in any expected way after deduplicating data.

#4 2017-02-09 19:15:49

Re: Btrfs send not behaving in any expected way after deduplicating data.

#5 2017-02-09 20:09:16

Re: Btrfs send not behaving in any expected way after deduplicating data.

#6 2017-02-10 00:09:37

Re: Btrfs send not behaving in any expected way after deduplicating data.

#7 2017-02-10 03:06:33

Re: Btrfs send not behaving in any expected way after deduplicating data.

#8 2017-02-10 13:56:56

Re: Btrfs send not behaving in any expected way after deduplicating data.

#9 2017-02-10 15:36:49

Re: Btrfs send not behaving in any expected way after deduplicating data.

Board footer