You are not logged in.
Ted released a patch in this post which he believes may fix the issue. He goes on to write, "...we know that my patch definitely restores the behaviour previous to commit eeecef0af5, so it can't hurt, but we do want to make 100% sure that it really fixes the problem. "
I have patched this into 3.6.3 just fine.
Last edited by graysky (2012-10-24 19:06:38)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Ted released a patch in this post which he believes may fix the issue. He goes on to write, "...we know that my patch definitely restores the behaviour previous to commit eeecef0af5, so it can't hurt, but we do want to make 100% sure that it really fixes the problem. "
I have patched this into 3.6.3 just fine.
Do you mean you patched it into your Linux-CK kernel repo you maintain?
Offline
@headkase - No. I haven't updated the AUR or the repo with Ted's restore commit eeecef0af5 patch.
It should be fine since, in Ted's own words, "...we know that my patch definitely restores the behaviour previous to commit eeecef0af5, so it can't hurt, but we do want to make 100% sure that it really fixes the problem. " But I am still reluctant to push it since this seems to be an evolving situation... perhaps I'll change my mind. I dunno.
Last edited by graysky (2012-10-24 19:26:34)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Ok, thank you for the clarification graysky.
Offline
Consulting higher powers
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
https://lkml.org/lkml/2012/10/23/690
Ted Tso is pushing a new patch for upgrade but beware with the lastest kernel and changes that were backported
Offline
Please search before posting: https://bbs.archlinux.org/viewtopic.php?id=151341
Offline
--Deleted due to thread merge--
Last edited by graysky (2012-10-24 23:18:46)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Is 3.5.6 also affected by this bug or did the kernel developers backport it to 3.5.7 only? If 3.5.6 is also bad, I guess I'd need to downgrade even further.
Offline
Janarto's thread merged.
There's no such thing as a stupid question, but there sure are a lot of inquisitive idiots !
Offline
Janarto's thread merged.
Ah, now it looks like I double posted ;p
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Sorry for a maybe obvious question but - what would the symptoms be? Would you just not be able to boot? Or would the checking fail? Or would you not even realise what has happened untill later?
bitcoin: 1G62YGRFkMDwhGr5T5YGovfsxLx44eZo7U
Offline
Sorry for a maybe obvious question but - what would the symptoms be? Would you just not be able to boot? Or would the checking fail? Or would you not even realise what has happened untill later?
Your filesystem would be corrupted with unpredictable results. It might be recovered and you might have lost your high-resolution scan of the Mona Lisa for good.
Offline
Janarto's thread merged.
Thanks, sorry for the inconvenience
Offline
Is 3.5.6 also affected by this bug or did the kernel developers backport it to 3.5.7 only? If 3.5.6 is also bad, I guess I'd need to downgrade even further.
3.5.6 does not have said commit "jbd2: don't write superblock when if its empty", therefore I would assume it is is fine. If you look at the git log [1], the tag for 3.5.6 is before that commit.
Offline
Thanks, mcover! I suppose I'll stick with 3.5.6 in that case until 3.7 comes out. I had originally downgraded from 3.6 because of network issues, which may have been entirely unrelated to the 3.6 kernel of course. But I'm still glad I reverted early. Better safe than sorry where the filesystem is concerned. I've seen quite a few hard drives that were totaled by ReiserFS 3, so I was really hoping EXT4 would prove rock-solid by comparison.
Offline
It looks like my original analysis may not have been correct. At least, Eric and I haven't been able to figure out a way to trigger the problem based on my hypothesis of what had been going wrong. Still, the commit in question *does* change things, and so it's still the most likely culprit. (There were no ext4-related changes between v3.6..v3.6.1 and v3.6.2..v3.6.3, and I've looked at all of the changes between v3.6.2 and v3.6.3; all of the other changes look innocuous.) I have a patch (sent around 1:23 am Eastern on Wed., Oct. 24th to the ext4 list on the relevant mail thread) which should revert the problematic change in behavior, as well as put it a check which looks for the original conditions which might have triggered the patch, and prints a warning plus a stack trace so we can really understand what is going on. I don't want to consider this fixed until we have a reproduction case, so we can state with 100% certainty that we understand how it was triggered, and so we know that the proposed patch really does fix things.
That being said, please note that Fedora 17 is apparently on 3.6.2, and so far we only have two users who have reported the problem (or more specifically, both have reproduced file system corruptions with very similar symptoms, one running v3.6.2 and one running v3.6.3). The fact that they have reported the problem on very different hardware (one using a USB stick, the other using a Software RAID-5 setup), means it's not likely a hardware induced problem. However, this could potentially just be bad luck, since the fs corruption that was reported could have been explained by a random hardware glitch. With two users reporting it, though we have to treat it as potentially a real bug, and so I've gone back and re-audited all of the ext4 related commits that went into the v3.6.x stable kernel series.
If you think you have a related, similar bug, please check which kernel version you are using, and get the EXT4 error messages from the syslogs, and report it to me and the ext4 list. And if you can reproduce it reliably, I definitely want to hear from you. :-)
Thanks!!
-- Ted
FYI.
Offline
will this (possable) bug affect all ext4 filesystems mounted or just root? I say this because my / is btrfs but my /home is ext4.. others might be in a likewise place. time to backup.
Offline
When i woke up my system was unresponsive for the most part and when i tried to log in in a vt i got some I/O error for /dev/sda.
A hard rebooot and the recovery of the journal at startup made the above go away, but i think it must be cause of this.
As there is nothing in logs, how do you check for corrupted data?
There shouldn't be any reason to learn more editor types than emacs or vi -- mg (1)
[You learn that sarcasm does not often work well in international forums. That is why we avoid it. -- ewaller (arch linux forum moderator)
Offline
I'm lucky sticking at 3.5.6 because of the compcache(zram) issue
Just waiting for the two bugs being solved.
PGP key: 30D7CB92
Key fingerprint: B597 1F2C 5C10 A9A0 8C60 030F 786C 63F3 30D7 CB92
Offline
I don't power down my laptop anyway. Tried btrfs a month ago but the rollback feature did not work quite right following the wiki.
Offline
switching to linux-lts was a simple workaround for the time being for me
If I have the gift of prophecy and can fathom all mysteries and all knowledge, and if I have a faith that can move mountains, but have not love, I am nothing. 1 Corinthians 13:2
Offline
switching to linux-lts was a simple workaround for the time being for me
My shutdowns with systemd and linux-lts are much much slower than with 3.6.2, any chance you are on systemd and have slower shutdowns with the lts?
bitcoin: 1G62YGRFkMDwhGr5T5YGovfsxLx44eZo7U
Offline
As it seems its not as scary as we first thought.
Offline
As it seems its not as scary as we first thought.
Yes, filesystem corruption is a really nasty prospect and it's a good thing this problem has been identified, thankfully though it seems very hard to get bitten by this.
Offline