You are not logged in.

#1 2015-11-27 14:29:03

FolkloreAddict
Member
Registered: 2015-11-27
Posts: 12

Best filesystem for specific backup routine

I'm looking for the most suitable filesystem for my backup. This is what I do:

I have an external harddrive that is always powered off. Once a night it is activated, mounted and rsync copies all the changes from the internal drive to the external one. So there's pretty much a complete Arch Linux installation on there.

Once a week, I upload changes on the external harddrive to cloud storage:

  • I split the entire harddrive into 100 MiB chunks (by reading /dev/sdb directly).

  • I make an MD5 checksum per chunk.

  • If the MD5 changed from last week I compress, encrypt and upload the chunk.

This is working fine as is but at the moment I'm using ext4 as the filesystem. There are two big problems with that:

  • The journal causes writes to the drive in areas that don't have "actual data changes". This is easily fixed by disabling the journal, I guess.

  • The superblock is changed all the time (including afaik by simply mounting and unmounting the partition). And even with sparse superblocks turned on, there are still quite a few copies of it all over the partition and each one causes at least 100 MiB to be uploaded.

I will probably reduce the chunk size to 25 MiB or something similar to reduce the amount of data needing to be uploaded but I'm still wondering if there isn't a filesystem better suited to this specific scenario. What I'm looking for would be:

  • As little areas of the disk touched as possible when something changes.

  • Has to be able to save POSIX permissions, symlinks, etc.

  • I don't really need many data integrity features like journals and redundant copies. It is only a backup itself, only mounted for minutes per day and I obviously have backups in case of hardware failure etc. That being said, it would still be bad if everything goes down the drain without me noticing.

  • The drive is only 500 GiB big. That shouldn't be a problem with most modern filesystems.

  • Maximum file size I need is probably 5 GiB. I'm guessing more like 1 GiB at most. If that's a factor I can just search for big files and see what I need.

  • Filenames should be allowed to have a decent length.

  • Native Linux (Arch obviously) support would be nice.

I'm familiar with FAT32, NTFS and ext4 (I actually implemented readonly-access to those three). I would just use FAT32 if it could be made to store POSIX permissions and symlinks.

Another thought was a Flash optimized filesystem since they probably keep writes to a minimum.

I don't really care if the filesystem is 40 years old as long as I get the data out like I put it in.

Last edited by FolkloreAddict (2015-11-27 14:33:47)

Offline

#2 2015-11-27 15:21:51

respiranto
Member
Registered: 2015-05-15
Posts: 479
Website

Re: Best filesystem for specific backup routine

Reading the wiki article on filesystems [0], NILFS2 seems an appropriate option for your setup.

It is, because the data "is only appended to and never overwritten", which is useful in your scenario if you insist on copying raw data from the disk.

Which is something I don't understand.
Is there any reason not to simply copy regular files?

[0] https://wiki.archlinux.org/index.php/File_systems

Offline

#3 2015-11-27 16:33:26

FolkloreAddict
Member
Registered: 2015-11-27
Posts: 12

Re: Best filesystem for specific backup routine

The reasons for not simply copying the files:

  • I want to be independent of the cloud storage I use (in fact I mirror it to multiple services). I don't think many of them can backup POSIX permissions and symlinks (at least free ones).

  • Big files and many small files are problematic for me. Having only 100 MiB files to worry about makes some stuff easier.

NILFS2 might be interesting. I'll definitely look into it. Changing a byte in a huge file probably causes a lot of overhead but that probably never happens anyway. I'm not sure though what happens when the disk is full. But I'll read into it.

In the meantime I was looking at JFS. It only has two copies of the superblock at the beginning of the partition and the journal can be on a different drive.

Offline

#4 2015-11-27 16:47:20

respiranto
Member
Registered: 2015-05-15
Posts: 479
Website

Re: Best filesystem for specific backup routine

FolkloreAddict wrote:
  • I want to be independent of the cloud storage I use (in fact I mirror it to multiple services). I don't think many of them can backup POSIX permissions and symlinks (at least free ones).

You could in theory create a special file (or several) containing all the permissions. - A local backup of that file might be useful.

  • Big files and many small files are problematic for me. Having only 100 MiB files to worry about makes some stuff easier.

May I ask why?
If mostly big files are a problem you could still split them up, which would require some additional effort on restoring the backup though.
You would probably have to maintain a list of such files as well to distinguish them from regular files (or you simply choose an unambiguous extension).

Offline

#5 2015-11-27 16:56:03

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: Best filesystem for specific backup routine

Not to dodge the question, but have you investigated Duplicity? It seems to accomplish the same thing you are doing manually while avoiding some of the side effects you are experiencing (for example, you can choose your tempfile location where the processing is done). It also doesn't really care what kind of filesystem is on the receiving end of the encrypted chunks. I keep mine on Amazon S3.

Scott

Last edited by firecat53 (2015-11-27 16:57:19)

Offline

#6 2015-11-27 17:04:23

FolkloreAddict
Member
Registered: 2015-11-27
Posts: 12

Re: Best filesystem for specific backup routine

respiranto wrote:

May I ask why?
If mostly big files are a problem you could still split them up[...]

And that's what I did. And then I backup one big file (aka the harddrive) and I don't have to worry about saving permissions in a file or distinguishing some files from others.

I'm not saying that's the best or only way to do it but it works pretty well.

The biggest drawback in my opinion is that you can't simply restore single files from your backup. That's why I wrote a script that downloads only the parts needed, understands the filesystem and extracts the data for a certain file. That's obviously not work normal people would do but I enjoyed the opportunity to learn about different filesystems. smile

I'm perfectly happy with leaving it like it is but if just choosing a different filesystem cuts the required upload in half or less, that would be a nice addition.

Offline

#7 2015-11-27 17:14:23

FolkloreAddict
Member
Registered: 2015-11-27
Posts: 12

Re: Best filesystem for specific backup routine

firecat53 wrote:

Not to dodge the question, but have you investigated Duplicity?

Yes, I found Duplicity and some similar tools. Duplicity - according to the wiki - only supports a few destination protocols. While the ones offered are amazing and probably available for paid services, I use curl to upload and download the parts to web interfaces.

Offline

#8 2015-11-27 17:19:10

dice
Member
From: Germany
Registered: 2014-02-10
Posts: 413

Re: Best filesystem for specific backup routine

FolkloreAddict wrote:

And then I backup one big file (aka the harddrive) and I don't have to worry about saving permissions in a file or distinguishing some files from others.

You could use a tar archive for that. You don't need to upload all the 'empty unused space' from your hdd to the cloud storage.

FolkloreAddict wrote:

The biggest drawback in my opinion is that you can't simply restore single files from your backup. That's why I wrote a script that downloads only the parts needed, understands the filesystem and extracts the data for a certain file. That's obviously not work normal people would do but I enjoyed the opportunity to learn about different filesystems. smile

tar has the ability extract specific files from the archive built in.
You would have the additional benefit of using a stable trustworthy tool which is unlikely to fail.
Or are there reasons against tar?


I put at button on it. Yes. I wish to press it, but I'm not sure what will happen if I do.  (Gune | Titan A.E.)

Offline

#9 2015-11-27 17:33:29

FolkloreAddict
Member
Registered: 2015-11-27
Posts: 12

Re: Best filesystem for specific backup routine

dice wrote:

You could use a tar archive for that. You don't need to upload all the 'empty unused space' from your hdd to the cloud storage.

Sure, for one backup tar would be the same. In the next backup the tar file is completely different and all parts of it would have to be re-uploaded. I could only pack changed files of course but then I have to deal with going through all the incremental backups. I didn't like that very much.

The "empty unused space" was zeroed out at the beginning and compresses to a few kilobytes. Space "empty"/unused after deletion of files doesn't change so doesn't need re-uploading.

dice wrote:

tar has the ability extract specific files from the archive built in.

Not really. AFAIK tar has to read (so in my case download) the entire file to extract one file. And even if it didn't how would tar tell me which parts of the tar file I have to download to extract the one file I need?

Offline

#10 2015-11-27 18:44:41

firecat53
Member
From: Lake Stevens, WA, USA
Registered: 2007-05-14
Posts: 1,542
Website

Re: Best filesystem for specific backup routine

FolkloreAddict wrote:
firecat53 wrote:

Not to dodge the question, but have you investigated Duplicity?

Yes, I found Duplicity and some similar tools. Duplicity - according to the wiki - only supports a few destination protocols. While the ones offered are amazing and probably available for paid services, I use curl to upload and download the parts to web interfaces.

If you have enough local storage space, you can just duplicity backup to a local storage location (your external hard drive?) and then move the pieces remotely with whatever tool you want (rsync, curl, etc). I used to use rsync for exactly this when I was backing up to a shared hosting location.

Scott

Offline

#11 2015-11-27 20:25:30

null
Member
Registered: 2009-05-06
Posts: 398

Re: Best filesystem for specific backup routine

If you'd use ZFS (I think btrfs would work too) you can just create snapshots and save those to your backup space. At least with ZFS you'd have complete access to your filesystem in every snapshot timeframe and you'd only have to backup the new snapshots which are just the changes that happended since the last one.

Except for native AL support it fits all your points. And there is a user repo which gets refreshed very fast with the new kernel modules or you just could build them yourselve wink

ZFS can also compress and deduplicate your data (and also offers options you claim you don't need like redundant copys etc)

Offline

#12 2015-11-27 21:07:12

graysky
Wiki Maintainer
From: :wq
Registered: 2008-12-01
Posts: 10,595
Website

Re: Best filesystem for specific backup routine


CPU-optimized Linux-ck packages @ Repo-ck  • AUR packagesZsh and other configs

Offline

#13 2015-11-30 19:13:52

Leonid.I
Member
From: Aethyr
Registered: 2009-03-22
Posts: 999

Re: Best filesystem for specific backup routine

Right... Well, not really. Rule #1 of backups -- thety _must_ be simple. ZFS/BTRFS/... require a modern kernel, and utilities. What if your backup FS goes kaboom? Would you be able to recover it?

So, I'd stay with ext4. I also think that your backup strategy is wrong, but that's off-topic.


Arch Linux is more than just GNU/Linux -- it's an adventure
pkill -9 systemd

Offline

Board footer

Powered by FluxBB