You are not logged in.
I have a btrfs filesystem with 3 drivers with raid5 but when I do a "btrfs fi usage /mountpoint" it prints out the following.
WARNING: RAID56 detected, not implemented
I am using arch stock kernel 4.1.6-1-ARCH and the btrfs-progs v4.2 so they are the latest and according to the kernel wiki about btrfs raid56 is implemented fully by kernel 3.19. So why does btrfs-progs print out this message? Is it a leftover message that the devs forgot to remove or does it really mean that my raid5 is not really redundant and I'm SOL?
Last edited by Nektarios (2015-09-13 12:06:18)
Offline
From what I remember, the RAID5/6 itself is implemented, but parsing out the correct fs usage is not.
[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]
Offline
I also have a RAID5 volume. I get the same error, but it seems to be have correct amount of space (n-1). But, yeah, it isn't showing it correctly for me either.
Offline
I just destroyed my data. BTRFS Raid5 is completely unstable.
I created the raid5 with 3 disks and copied all my crucial data on it and it worked. After a while my 3rd disk spit out some hardware errors (my luck) so I had to delete it and degrade the array but when I did that it got stuck and I had to reboot.
I disconnected the bad drive and tried to mount the degraded array with the remaining drives bu it doesn't mount without the 3rd one (hurray!!). Now I try to btrfs restore -i whatever I can
Warning for others: Don't use btrfs raid56.
Offline
Don't use btrfs
Aye.
Although I want to try it for an additional backup disk sometime.
Offline
Anyone knows how one can recover from a btrfs meltdown like mine?
I get this error (trying to mount/recover files):
parent transid verify failed on 504707121152 wanted 2797 found 1515
Is it completely fried or is there hope?
Offline
If there is hope you should ask the btrfs IRC channel or mailing list.
I don't have btrfs experience but with this kind of filesystem, you're basically left with what the btrfs tools give you. Traditional recovery software won't know how to handle copy-on-write, RAID, checksum structures and the like.
Offline
Too much panic in here. First, there is an official warning about RAID5/6 being relatively new and untested. If you decided to use it anyway, well... you indeed might have destroyed your data, but that's not directly btrfs's fault.
Second, I've seen that error on the irc channel a few times, and it was possible to recover from it - though I don't remember exactly how. The error means that your data is probably safe, but the journal wasn't updated to the newest version because of the crash. As frostschutz said, try asking for help on the official irc channel.
Third, if all else fails, you can use PhotoRec, which doesn't care about filesystems and just searches the whole disk(s) for the files themselves - provided you didn't use compression as one of the mount options. Or just recover from one of your backups.
I am using the stable btrfs features (simple and RAID1) and so far I didn't have any "meltdowns". In fact, it already saved my behind a few times, something a non copy-on-write filesystem couldn't have done, or not as efficiently.
[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]
Offline
I don't have any advice in this situation of yours, but I've been using nothing but btrfs for my on-line data (that which is always accesible) and while I have had some rough spots, particularly in the early part of this month, I haven't had btrfs eat my data yet. YET.
You should keep backups, regardless of what FS you use, but especially if you use btrfs.
Offline
I managed to restore almost 98%+ of my data (mtimes and user/group attrs were lost on everything though) by using the btrfs restore tool. Regrardless, I'll never touch btrfs raid for at least 5 years. It's not only unstable, it can potentially corrupt your data beyond repair.
ZFS with it's true ondisk consistency is the better FS here clearly.
I'll just use mdraid which never destroyed my data (yet).
Offline
I managed to restore almost 98%+ of my data (mtimes and user/group attrs were lost on everything though) by using the btrfs restore tool. Regrardless, I'll never touch btrfs raid for at least 5 years. It's not only unstable, it can potentially corrupt your data beyond repair.
ZFS with it's true ondisk consistency is the better FS here clearly.
I'll just use mdraid which never destroyed my data (yet).
I don't get it... why do you care about uid/gid on a backup? Anyway, you can only blame yourself: do your homework and don't use experimental features with in production
Arch Linux is more than just GNU/Linux -- it's an adventure
pkill -9 systemd
Offline
Regrardless, I'll never touch btrfs raid for at least 5 years. It's not only unstable, it can potentially corrupt your data beyond repair.
Any filesystem can, under certain conditions. Of course, using experimental code is asking to meet those conditions.
ZFS with it's true ondisk consistency is the better FS here clearly.
In case of RAID5/6, yes, as long as btrfs still suffers from the write hole. Otherwise, they are on par. Though, I am quite certain, you would have run into the same problem with ZFS in your case. Hard rebooting a system while it's doing something with the RAID array is one of the most dangerous things you can do, period. That's why REISUB exists.
I'll just use mdraid which never destroyed my data (yet).
mdraid only really protects you from clean disk failures, see here. Also, unless you use ZFS's Z-RAID, you will have the same problem as with btrfs RAID5/6: the write hole.
Last edited by Soukyuu (2015-09-14 22:27:47)
[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]
Offline
I'm pretty sure ZFS wouldn't have any problem in this case as it is always consistent on-disk. Especially when you do stuff with the raids. You can shutdown, power off, hit it with a hammer or burn the house down while it creates/syncs/reassembles/scrubs/whatever the data will always be consistent. Maybe it will lose a couple of last transactions but that's about it. This is because they developed it with one thing above all else: real on disk consistency with real atomic operations. As long as you got write barriers correctly it's impossible to screw the data up.
In my case it was a bug in btrfs that just destroyed everything. It didn't lost one or two commits, it lost 500+ commits and totally destroyed it's root tree.
Btrfs (in it's current state not in our imagination) is either unstable, buggy, badly designed and destroys data or it is perfectly safe and always keep safe your data. Choose one or the other. I don't understand how you can claim both at the same time.
Offline
That theoretical consistancy comes from the same source -- the use of a B-tree. Btrfs has that as well. In theory the file system itself is always consistent.
Also, in case you either forgot or didn't see it: https://www.archlinux.org/news/data-cor … d-is-used/
[edit] Semi-related: https://www.phoronix.com/scan.php?page= … -Linux-4.3
Last edited by nstgc (2015-09-15 22:35:12)
Offline
That theoretical consistancy comes from the same source -- the use of a B-tree. Btrfs has that as well. In theory the file system itself is always consistent.
Also, in case you either forgot or didn't see it: https://www.archlinux.org/news/data-cor … d-is-used/
[edit] Semi-related: https://www.phoronix.com/scan.php?page= … -Linux-4.3
Then must be the way btrfs implements it as it did exactly the opposite of keeping my data consistent and safe.
Also I didn't use discard at all. I used raid 5 on 3 disks with the option compress and space_cache.
BTW I finished testing my "failed" hard disk for over 9 hours of running badblocks testing (reading and writing all the time) and there were not a single error spit out, even smart health and smart tests all got out perfectly healthy. This might mean that this happened completely due to some btrfs software bug and had nothing to do with the hardware.
Last edited by Nektarios (2015-09-15 22:41:56)
Offline
Then must be the way btrfs implements it as it did exactly the opposite of keeping my data consistent and safe.
It didn't break your data. It simply notified you that the journal was not up to date, and the volume needs checking. Which is no different from ext4 or any other filesystem telling you to run fsck, except for btrfs you do not run fsck but do this. What you've run into is a case of user not reading documentation on 1) RAID5/6 being experimental 2) how to recover from a corrupted journal after a hard shutdown (which YOU caused!)
You can't blame btrfs for any data lost in this case.
Last edited by Soukyuu (2015-09-15 22:59:07)
[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]
Offline
Nektarios wrote:Then must be the way btrfs implements it as it did exactly the opposite of keeping my data consistent and safe.
It didn't break your data. It simply notified you that the journal was not up to date, and the volume needs checking. Which is no different from ext4 or any other filesystem telling you to run fsck, except for btrfs you do not run fsck but do this. What you've run into is a case of user not reading documentation on 1) RAID5/6 being experimental 2) how to recover from a corrupted journal after a hard shutdown (which YOU caused!)
You can't blame btrfs for any data lost in this case.
It destroyed data because there were incosistencies with the trees and the commits and it refused to mount. Therefore I couldn't get my data back exactly as they were. This in my book is called data destroyed.
And btrfs was designed to be ondisk consistent especially in hard shutdown cases. In theory you will only lose the last transactions but it should never destroy the tree or can't mount. This could only happen in a multi-disk hardware problem.
Despite all of that yes I was at fault for using an experimental feature. I don't claim otherwise.
Last edited by Nektarios (2015-09-16 14:54:10)
Offline
Why do you think it destroyed the tree? The "can't mount" is a safeguard against mounting with an inconsistent journal - it's a bit convoluted than other FSes, but essentially you just have to run the equivalent of fsck yourself to get it back on track. I imagine that in the future it will be made automatic as other FSes, and it's clearly a disadvantage of btrfs currently. I've had my share of hard shutdowns myself btw, and so far nothing broke - not even the journal. Maybe I was lucky, or maybe you were unlucky, I don't know. The only reason why I reacted to your post like I did is because your post seemed to imply btrfs is unstable and unusable - which is not the case.
edit: out of curiosity, what exactly did you lose? Files or metadata? How old were they? You were pretty vague on that part.
Last edited by Soukyuu (2015-09-16 16:11:22)
[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]
Offline
Nektarios, That news page I liked to was a warning for users of EXT4 and mdadm. Btrfs users were not affected. I, for instance, have TRIM/discard enabled on a RAID0 btrfs array and was fine. My point is that stuff happens regardless. Btrfs is in flux, so it is to be expected that occationally bad things will happen. This winter I had some serious issues with btrfs that resulted in me losing 2 hours worth of data. The issue, as far as I can tell, was that the time stamps on or in (still not sure) the metadata didn't match with the redundent data. This happened several times and I was able to fix it all but one time when I ended up losing 2 hours worth of data. If I had been using mdadm, would I have had this same problem? Probably since the cause was data was written to one disk, but not to another due to a hardware failure. mdadm would have just copied over one block or the other. Since no drive actually failed (I'm still not sure of the cuase but I think a SATA port is defective) mdadm wouldn't be able to know which copy to use and could have used the wrong copy. In such a case you could have undetected file corruption. With btrfs it said "one of these things is not like the other, one of these things, doesn't belong". It may not have known which one was correct, but it knew something was wrong, allowing me the wise and powerful administrator (well, maybe not wise, but I have power), to intervene.
I am very happy with how btrfs handled that situation. It's true that with a different set up, I may not have had any issue at all, but I doubt it.
Offline
The point nstgc states is also the point raised in the article I linked to before. Though if nstgc had RAID1 instead of RAID0, btrfs would have said "one of these things is not like the other, and it's this one! Let me fix that for you..." - because of the checksumming and redundant data. Of course, ZFS is capable of this as well. That's what both of those FSes have over mdam + x.
[ Arch x86_64 | linux | Framework 13 | AMD Ryzen™ 5 7640U | 32GB RAM | KDE Plasma Wayland ]
Offline
No, the RAID 0 I mentioned was unaffected. It was my OS volume with RAID 10 that was hit. It did not automatically fix it, but I was able to run damage control. The errors I was getting were telling me that something was off. Running a scrub made btrfs fix it, presumably using the most recently dated bit of bits. I still was able to identify the subvolume effected, though I couldn't narrow down the exact files hit.
Offline