You are not logged in.

#1 2011-03-15 16:04:32

TomB17
Member
Registered: 2009-09-02
Posts: 102

checksumming filesystems

Are BTRFS, ZFS, or ZFS Fuse ready for production use?

I care a whole lot about the integrity of my data.  I've had problems with XFS and EXT4 over the md device driver that turned out to be a bug in the Silicon Image 3132 chipset.  It quietly corrupts data.

These checksumming filesystems sound like the answer but are they ready?

BTRFS is pretty new but might have the most mature code of the three on the linux platform.

The paint is still wet on native Linux ZFS but I have to assume the original source code was rock solid before they ported it so it shouldn't take long to stabilize.

ZFS Fuse is said to be slow but I don't care much about speed.  Can I trust my data to it?

My main storage is on two very large RAID 5 arrays.  Currently, one is on EXT4 and the other is on XFS.  They are rsync'd regularly to for redundancy.  Is it reasonable to convert the primary array to ZFS?

Offline

#2 2011-03-15 16:19:59

Inxsible
Forum Fellow
From: Chicago
Registered: 2008-06-09
Posts: 9,183

Re: checksumming filesystems

Since this is not really a support request, I am moving to GNU/Linux discussion.


Also search the web and research a bit. The btrfs website itself mentions that it is under heavy development --- and as with all developing software -- Expect Breakage  !!


Forum Rules

There's no such thing as a stupid question, but there sure are a lot of inquisitive idiots !

Offline

#3 2011-03-15 16:39:50

TomB17
Member
Registered: 2009-09-02
Posts: 102

Re: checksumming filesystems

Thanks for moving the thread.  I don't fully understand how threads are categorized here and I apologize for that.

As for reading, I've spent countless hours reading about ZFS over the last two weeks.  I've played around with the "aur/zfs-linux-git" package a bit too.  That was a really compelling experience until I realized you can't mount filesystems created with it.  lol!

I've read every comment in every thread on this site and others I have found with Google.

I've been following www.zfsonlinux.org, which the AUR package seems to be based on.  I understand there is another native ZFS project by KQ Infotech that is further along but based on an older version of ZFS.  Everyone seems excited about the ZFSonLinux project, including Phoronix ( http://www.phoronix.com/scan.php?page=a … arks&num=1 ).

To me, ZFS looks like one of the best breakthroughs in the Linux filesystem world ever.  Of course, I care only about data integrity.

Some of the more keen GNU guys have been downplaying the power of ZFS and suggesting BTRFS.  Some people seem to be using BTRFS without trouble.  I haven't read any complaints.

... so here I am with a ton of storage I'm about to format and I need to figure out what to do.  I guess I'd like to read someone say, "I've been using ZFS Fuse for ages with total reliability and an easy upgrade to native ZFS when it matures."  ... or... "Filesystem checksumming can be added to existing Linux filesystems with..."

I'm not as worried about corruption in memory.  I know it happens but I'm really focused on the disk store.

Last edited by TomB17 (2011-03-15 16:40:44)

Offline

#4 2011-03-15 16:46:28

Inxsible
Forum Fellow
From: Chicago
Registered: 2008-06-09
Posts: 9,183

Re: checksumming filesystems

TomB17 wrote:

I don't fully understand how threads are categorized here and I apologize for that.

1) All threads that require support for existing issues on any system that they currently have, go into the Technical Issues And Assistance section.
2) If you want opinions or just a general discussion about something, then they go into Contributions and Discussions section.
3) If the discussion is related to ArchLinux specifically, they go into the Arch Centric section.
4) and finally, if you have issues about a particular package or a PKGBUILD for a package in AUR or anything related to the [testing] repo, they go under Pacman Upgrades, Packaging & AUR section.


Obviously, depending on the problem at hand, you will have to put them in the correct forum under the Section.


Hope that helps.

Also as a side note : It is always better to provide what you have tried/researched in the first post itself. This solves two problems. Pricks like me don't tell you to *research* (frankly because I can't read minds and know what you have already tried/researched smile )  and other users don't suggest ideas that you have already tried and haven't worked for you.


Forum Rules

There's no such thing as a stupid question, but there sure are a lot of inquisitive idiots !

Offline

#5 2011-03-15 16:56:48

TomB17
Member
Registered: 2009-09-02
Posts: 102

Re: checksumming filesystems

Thank you.  smile

Offline

#6 2011-03-16 17:44:10

TomB17
Member
Registered: 2009-09-02
Posts: 102

Re: checksumming filesystems

I see nobody is interested in commenting.  These discussions frequently seem to turn into jihads based on platform politics so maybe this is why there are a small number of huge threads with vitriol and uninformed commentary.  I'm going to try to boil it down to some pretty lame basics that should be obvious, but aren't entirely.

This is based on weeks of reading, a few facts, lots of assumptions, and some experience.  I invite comments or criticism.

Preface: This research started when I realized my 14TB disk array was quietly corrupting my files.  ...  almost all of my files.  It was shocking that neither RAID nor filesystem were able to effect any positive influence on the integrity of my data.

Here's a CERN paper on Data Integrity.  It seems to be a key paper that has people buzzing, either in disbelief, or panic.  Consider me in the later reactive category.  lol!

http://indico.cern.ch/getFile.py/access … nfId=13797


The options:

md_mod/XFS -> Fast.  Venerable.  This setup is popular and will protect data against some types of drive failure but leaves data subject to bit rot and various forms of quiet corruption.  Your files can easily become corrupt with no warning or errors of any kind.

hardware RAID/XFS -> Fast (typically a touch slower than md_mod but far less taxing on the CPU than md_mod).  This setup will protect data against some types of drive failure but leaves data subject to bit rot and various forms of quiet corruption.  Your files can easily become corrupt with no warning or errors of any kind.

BTRFS - New.  Interesting.  Fast for a checksumming filesystem.  Still in the birthing process, it should be sufficiently stable for critical application use in 4~5 years.

Native ZFS - Very new.  Fast for a checksumming filesystem.  Highly venerable pedigree.  Too new to fully trust but with it's impeccable heritage and relatively small amount of linux glue code versus actual filesystem code, it should be well stable in 12~18 months.

ZFS Fuse - Mature.  Adequate speed.  This filesystem seems to have been stable since roughly 2008.  Given the bullet proof Sun roots of the filesystem and relatively long term use, it seems reasonable that the Linux integration code is stable at this time.  While ZFS Fuse will not pick up bit rot and quiet corruption dynamically, it can be tested occasionally with "zfs scrub <pool>".



Risk assessment criteria:

- data integrity


Risk assessment in order of least risk to most risk

- ZFS Fuse
- md_mod/XFS
- hardware RAID/XFS
- native ZFS
- BTRFS


Conclusion:

If the criteria is purely data integrity in a large data store, it looks to me like ZFS Fuse is the system to go with.

Offline

#7 2011-03-16 23:57:11

scorpyn
Member
From: Sweden
Registered: 2008-01-29
Posts: 66

Re: checksumming filesystems

NILFS2 supports checksumming, but it only appears to be used to make it possible to detect errors, not fix them. Also, NILFS2 is on the "not finished" list.

Since you list md_mod/XFS I assume you've considered LVM (dm_mod)? Doesn't seem to support checksumming, but it supports mirroring iirc...

Last edited by scorpyn (2011-03-16 23:57:48)

Offline

#8 2011-03-20 11:48:31

TomB17
Member
Registered: 2009-09-02
Posts: 102

Re: checksumming filesystems

Thanks, scorpyn.  I've never paid much attention to NILFS2.  In fact, I wasn't really sure what niche it filled, until I looked it up.

I'm reasonably happy with ZFS-FUSE.  It's not fast but the speed is acceptable.  Time will tell if the reliability features prove themselves.

Offline

#9 2011-03-29 02:36:46

sand_man
Member
From: Australia
Registered: 2008-06-10
Posts: 2,164

Re: checksumming filesystems

I miss using ZFS on FreeBSD so tonight I am going to migrate my data from ext4 to zfs-fuse at least until native ZFS becomes more stable.
At this stage the native ZFS doesn't even compile on kernels newer than 2.6.36 which is not a good thing for Arch.
I'm also not really concerned about the speed as long as it's not like reading/writing to a floppy disk tongue


neutral

Offline

#10 2011-12-27 00:27:29

Zflan
Member
Registered: 2011-12-26
Posts: 1

Re: checksumming filesystems

Linux raid 1 and 5 have a real self-check capability also. I just put in place crons like "echo check > /sys/block/md0/md/sync_action" for my raids and suitable /sys/block/md*/md/mismatch_cnt monitors. This should at least notify me about any bitrot - no need for ZFS after all... (using lvm+raid1/5+crypt+ext4/xfs)

Offline

#11 2012-02-10 02:14:08

thetrivialstuff
Member
Registered: 2006-05-10
Posts: 191

Re: checksumming filesystems

Zflan wrote:

Linux raid 1 and 5 have a real self-check capability also. I just put in place crons like "echo check > /sys/block/md0/md/sync_action" for my raids and suitable /sys/block/md*/md/mismatch_cnt monitors. This should at least notify me about any bitrot - no need for ZFS after all... (using lvm+raid1/5+crypt+ext4/xfs)

Actually, echo check > ... sync_action will NOT detect silent bit-rot on RAID level 1 -- all it does for RAID 1 is read-scan the disk and see if the disk returns read errors. If a software problem were to corrupt one of the mirrors, mdadm would not detect it because the drive would say, "yeah I can read that sector" -- it doesn't actually compare the mirror *data*.

See: https://raid.wiki.kernel.org/articles/d … corruption

You can of course check RAID 1 consistency manually (by doing some massive binary diff'ing or md5sum or whatever), but then if you do discover that some byte is not the same across both disks, what do you do? You'll be looking at it at the level of the raw disk, below the filesystem, so you won't even know what file that data is part of without doing some complicated digging and some math, and even then, how are you going to decide whether disk 1 or disk 2 is right? Believe me, I've wasted a lot of time thinking about this :P

I would recommend going with RAID 5 if you want data integrity to be the RAID layer's responsibility, but there are some caveats there too:

- In RAID 5, no single disk has data that you can just dd to an image and mount -- it's all interleaved with the ECC stripes, I think -- which gives me the willies
- You gotta be really really sure you're pulling the right one when you replace a disk -- pulling the wrong one might not be recoverable
- Be careful of rebuild times on big arrays -- it's possible for a second disk to fail during the time it takes to rebuild onto a spare. This is especially true if all your disks in the initial batch were from the same manufacturing run, because the added continuous stress of a rebuild might be enough to push another one over the edge :)

~Felix.

Offline

#12 2012-05-01 09:07:23

paddlaren
Member
Registered: 2010-02-20
Posts: 7

Re: checksumming filesystems

This thread started some years ago. Can anyone help me with the present status of ZFS on Linux, both user space and kernel space.

Same goes for brtfs, how is the user sapce tools going and can I detect silent corruption in brtfs, and if using RAID bellow, can I find the uncorrupted content?
I have just stumbled on NILFS2 that seems to have checksums but I have found not article addressing how to detect corruption.

Thinking of it, are there any relevant working methods to recover from silent corruption unless using ZFS where the raid and file system is integrated?

BR
Erik

Last edited by paddlaren (2012-05-01 09:07:49)

Offline

#13 2012-05-01 18:40:19

thetrivialstuff
Member
Registered: 2006-05-10
Posts: 191

Re: checksumming filesystems

NILFS2 does have checksumming, but it checksums checkpoints, not files -- so in effect, it'll be able to tell if an entire batch of written data is invalid, and won't let you read from any affected files, but it's not so great for file recovery. Here's my story of recovering a badly corrupted NILFS2 filesystem after a cheap SSD started silently corrupting writes:

http://www.mail-archive.com/linux-nilfs … 01061.html

That'll give you a sense of what's possible and not possible with NILFS2. Note that NILFS2 is not that great for SSD's, because its superblocks get written to any time there's a write anywhere on the filesystem -- so even if your SSD does wear levelling, I'd still be hesitant to use NILFS on it.

btrfs: I am now using btrfs on my netbook. Just for fun, I threw it on the same cheap SSD that ate most of my data before. btrfs is fairly stable, but it still lacks many filesystem repair tools. While it does do file-level checksumming, my real life experience says that you're very unlikely to see bit rot happen to individual files and not also hit the superblocks -- and with btrfs, if the superblocks / log pointers / whatever the important bits are called get corrupted this way, all your files become inaccessible because you can't even mount. This is what happened to the test btrfs system on the cheap SSD. Obviously I had backups, so no harm done, but you should pretty much assume that if there's bit rot, it's going to hit the superblocks.

Now, all of my tests were done without any kind of RAID -- btrfs RAID might well be good enough to keep your data accessible if only one device starts doing silent corruption. I wouldn't be confident without testing that first, on a real malfunctioning device. Messing with loopback devices and dd is all well and good, but when a real device malfunctions, it tends to cause much more severe problems (e.g. btrfs will cause a kernel panic if a device disappears or times out) smile

~Felix.

Offline

Board footer

Powered by FluxBB