You are not logged in.

#1 2009-03-20 16:16:10

Aidenn
Member
Registered: 2006-03-20
Posts: 57

[SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

So, without further ado, my rig: Intel e7400, 4GB RAM, WD Caviar Blue 500GB SATA2, running fully updated Arch64/testing.

Note that I also run a P4 2,4GHz, 2GB RAM with a Seagate 120GB disk with a fully updated Arch32/testing @ EXT4 and it has no problems whatsoever.

My partitions are as follows:

sda1-sda3: Windows 7 beta
sda5: / and bootable, EXT4
sda6: swap
sda7: /home, EXT4
sda8: /srv, also EXT4

The disk is brand new, I did a low-level check, everything is fine. No bad blocks no nothing. W7 works  fine too.

I was running the machine happily for a while, but yesterday during booting I lost my sda7. The journal went somewhere and never came back. So, I rebuilt it with tune2fs and everything was fine.

Then a reboot came (with shutdown -r) and the journal was lost yet again (fsck wouldn't auto-rebuild it because it complained of a short read of journal superblock).

I rebuilt it yet again using tune2fs.

I wanted to check if this is reboot-related, so I shutdown -r again.

sda7 came up fine, as well as sda5 and 8. Hovewer, after logging in I got this:

EXT4-fs error (device sda5): _ext4_get_inode_loc: <5>sd 0:0:0:0: [sda] 976773168 512-byte hardware sectors (500 GB/465 GiB)

So I did another reboot and then all hell broke loose.

Bootup said:

EXT4-fs error (device sda5): _ext4_get_inode_loc: <6>ata1: EH complete
EXT4-fs error (device sda5): _ext4_get_inode_loc: unable to read inode block - inode=125441, block=8470

And fsck.ext4 -p /dev/sda5 said:

error reading block 8398 blah blah short read blah (didn't write it down completely)

There was data loss this time, on sda5 no less. I lost a bit of pacman's database and a couple of /usr/bins (they went to lost+found after a thorough fsck.ext4 -vfy /dev/sda5 with ignoring and rewriting. Note that fsck.ext4 -p failed, I had to do it by hand).

After a long pacman -Sdf $(cat ./pkglist) (I really wouldn't want to do it manually ;) the system was working again. I also lost my murrine-svn from AUR and had to rebuild it.

So, I reboot again (I know, I'm inviting it ;) and this time I noticed that sda5 DIDN'T remount read-only, said / is busy. After the reboot sda5's journal went to /dev/null (though this time fsck rebuilt it automatically, there was no need for tune2fs).

So, in short... what the heck? I don't get it at all. Maybe someone here will shed some light on this?

Last edited by Aidenn (2009-03-26 14:51:26)

Offline

#2 2009-03-22 11:16:24

Edmond
Member
Registered: 2008-10-01
Posts: 17

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

I cannot help you with your problem; but it's pretty terrifying to hear from yet another data loss w/ ext4. I lost data too, without a system crash or anything. And half an hour ago, I fsck'd /home, just out of curiousity, and swoosh, half a directory was dumped in lost+found. I think I will have to downgrade to ext3. sad

Last edited by Edmond (2009-03-22 11:17:05)

Offline

#3 2009-03-22 12:57:04

Aidenn
Member
Registered: 2006-03-20
Posts: 57

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

I did a couple reboots since then, but I'm cautious. I wrote a script that unmounts sda7 and 8 BEFORE everything else happens and inserted a pause between remounting root and other shutdown scripts (and afterwards, so I can verify it remounted correctly. Maybe my rig is simply too fast since it never happened on my P4 with EXT4). Now it's all fine, no corruption. Of course, I'm still wary of the issue, but for now it works nicely.

Offline

#4 2009-03-22 13:07:17

Mektub
Member
From: Lisbon /Portugal
Registered: 2008-01-02
Posts: 647

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Aidenn wrote:

I did a couple reboots since then, but I'm cautious. I wrote a script that unmounts sda7 and 8 BEFORE everything else happens and inserted a pause between remounting root and other shutdown scripts (and afterwards, so I can verify it remounted correctly. Maybe my rig is simply too fast since it never happened on my P4 with EXT4). Now it's all fine, no corruption. Of course, I'm still wary of the issue, but for now it works nicely.

And how about mounting the file systems with the nodelalloc option ?

This is mentioned in another thread. Thats what I am using, hoping that it helps, after having a strange corruption on my laptop.

Mektub


Follow me on twitter: https://twitter.com/johnbina

Offline

#5 2009-03-22 13:59:12

Aidenn
Member
Registered: 2006-03-20
Posts: 57

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

I didn't, but isn't that fix for people experiencing hangups with proprietary graphics drivers or something like that? I remember reading it had a big performance hit too. I'll try it when it happens again though.

Offline

#6 2009-03-22 14:10:18

Mektub
Member
From: Lisbon /Portugal
Registered: 2008-01-02
Posts: 647

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Aidenn wrote:

I didn't, but isn't that fix for people experiencing hangups with proprietary graphics drivers or something like that? I remember reading it had a big performance hit too. I'll try it when it happens again though.

Its not graphic related, and yes, there is a performance hit but not that big.

Take a look at:


http://bbs.archlinux.org/viewtopic.php?id=67884
http://bbs.archlinux.org/viewtopic.php?id=67704
http://bbs.archlinux.org/viewtopic.php?id=67306
http://bbs.archlinux.org/viewtopic.php?id=59654


Mektub


Follow me on twitter: https://twitter.com/johnbina

Offline

#7 2009-03-22 15:01:42

Skripka
Member
From: 2X1280X1024
Registered: 2009-02-19
Posts: 555

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Aidenn wrote:

I didn't, but isn't that fix for people experiencing hangups with proprietary graphics drivers or something like that? I remember reading it had a big performance hit too. I'll try it when it happens again though.

No, it *is* for just this kind of situation.

The above happens because most softwares are not currently written with Ext4's delayed allocation in scheme in mind....and as a result in an improper shutdown of an Ext4 partition, bad things happen---because said softwares do not expect delayed allocation.

Offline

#8 2009-03-22 15:19:09

Aidenn
Member
Registered: 2006-03-20
Posts: 57

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

How about sync && shutdown -h then?

Offline

#9 2009-03-22 16:26:15

Ranguvar
Member
Registered: 2008-08-12
Posts: 2,544

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

shutdown should work fine, because it should unmount the filesystem. When the filesystem is told to be unmounted, it writes all data. The delayed allocation problem occurs when the system is not cleanly shut down.

Offline

#10 2009-03-22 16:41:42

Aidenn
Member
Registered: 2006-03-20
Posts: 57

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

See, the problem is my system was ALWAYS cleanly shut down. That's what baffles me.

Offline

#11 2009-03-22 20:43:02

Gonzakpo
Member
Registered: 2008-05-17
Posts: 45

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Here's an explanation of the problem written by the ext4 developer "Theodore Ts'o": https://bugs.edge.launchpad.net/ubuntu/ … omments/45

Offline

#12 2009-03-22 20:49:54

skottish
Forum Fellow
From: Here
Registered: 2006-06-16
Posts: 7,942

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Aidenn wrote:

See, the problem is my system was ALWAYS cleanly shut down. That's what baffles me.

Let's stay focused on these words. Aidenn has stated more than once that the system is being shut down properly. The other 135 threads relating to ext4 are different than this one.

Offline

#13 2009-03-22 22:24:52

Aidenn
Member
Registered: 2006-03-20
Posts: 57

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Yeah, I'm mostly worried about that / remounting which stated that / was busy. I know EXT4 isn't ready for prime time yet, but aside from this one accident, it has performed exceedingly well for me.

-- edit --

False alarm, guys. It's my disk failing. Did another low-level check and it's full of bad sectors now. And every time I check it new ones appear. Sorry to mislead you.

Last edited by Aidenn (2009-03-23 15:47:18)

Offline

#14 2009-03-23 21:21:43

Edmond
Member
Registered: 2008-10-01
Posts: 17

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Aidenn wrote:

False alarm, guys. It's my disk failing. Did another low-level check and it's full of bad sectors now. And every time I check it new ones appear. Sorry to mislead you.

What do you mean by "low-level check"? SMART test?

Offline

#15 2009-03-23 22:31:14

Aidenn
Member
Registered: 2006-03-20
Posts: 57

Re: [SOLVED - B0RKED DISK] Yet Another EXT4 Corruption Thread

Yeah, and extended test from WD Data Lifeguard Diagnostics suite.

It seems that the drive was failing from the beginning, but was using spare sectors to mask the increasing number of bad sectors and ran out of spares recently. Anyway, now the drive won't even boot, making lots of weird noises during POST and kernel probing (ending with failure).

Damn, I never thought I'd see a brand new borked drive. I'm having it replaced tomorrow.

-- edit --

As suspected, after replacing the drive no more problems, did extensive reboot-testing. It's not important, but I thought a proper closure would be nice. Thanks for the support.

Last edited by Aidenn (2009-03-26 14:49:57)

Offline

Board footer

Powered by FluxBB