You are not logged in.

#1 2013-03-25 04:21:31

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

[Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

The stock linux-3.8.4-1 (x86_64) fails to boot with the following unrecoverable error:

Loading ../initramfs-linix.img......ready.
Probing EDD (edd=off to disable)... ok
early console in decompress_kernel

Decompressing Linux... Parsing ELF... done.
Booting the kernel.
:: running early hook [udev]
:: running hook [udev]
:: Triggering uevents...
Waiting 10 seconds for device /dev/md127 ...
:: performing fsck on '/dev/md127' ...
fsck.ext2: Invalid argument while trying to open /dev/md127
/dev/md127:
The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
    
ERROR: fsck failed on '/dev/md127'
:: mounting '/dev/md127' on real root
mount: you must specify the filesystem type
You are now being dropped into an emergancy shell.
sh. can't access tty; job control turned off
[rootfs /]#

My hooks are:

HOOKS="base udev autodetect modconf block mdadm_udev filesystems keyboard fsck"

I am however able to boot the linux-lts & my own custom kernels. Custom kernel is built from the abs linux package.

The / partition is on a raid 0 with an ext4 filesystem. fsck -f /dev/md127 comes back clean & the raid & two hard drives are healthy.

Last edited by straykat (2013-07-11 09:27:45)

Offline

#2 2013-07-11 09:21:15

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

Any progress on that?
I have the same issue with RAID 1.
I don't know in your case, but in my case, this error occurs 'randomly': sometimes yes, sometimes no.
I suspect the kernel does not detect RAID partitions "at time", but I don't know.
Perhaps this [https://bugs.archlinux.org/task/32558?project=1&cat[0]=31&string=mdadm] bug could help us.

* I run syslinux:

MENU LABEL Arch Linux
LINUX ../vmlinuz-linux
APPEND root=/dev/md127 ro md=126,/dev/sdb2,/dev/sdc2 md=127,/dev/sdb1,/dev/sdc1 md=3,/dev/sdb4,/dev/sdc4 vga=773
INITRD ../initramfs-linux.img

* In mkinitcpio.conf:

MODULES="ext4 raid1"
# The RAID 1 are ext4 formatted
HOOKS = "base udev autodetect modconf block mdadm mdadm_dev filesystems keyboard fsck"

* In /etc/mdadm.conf

DEVICE /dev/sdb*
DEVICE /dev/sdc*
ARRAY /dev/md127 metadata=0.90 UUID=... devices=/dev/sdb1,/dev/sdc1
ARRAY /dev/md126 metadata=0.90 UUID=... devices=/dev/sdb2,/dev/sdc2
ARRAY /dev/md3 metadata=0.90 UUID=... devices=/dev/sdb4,/dev/sdc4

Hints?
* The order of hooks?
* Put absolute path in vmlinuz?

Thanks,
Xan.

Last edited by xanb (2013-07-11 09:26:26)


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#3 2013-07-11 09:53:36

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb,
        My apologies, but after months of no response to my post or the bug report I had forgotten about this post.
I have in the time since I opened this post built a new multi core rig without RAID so I am no longer able to reproduce this bug.

The closed bug report link: https://bugs.archlinux.org/task/34568

This may help you?

I have marked this as unsolved as I am aware of others that have this bug.

Offline

#4 2013-07-11 10:24:22

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

Well, a pain
The curious is that previosly I have installed ubuntu on this system with the same config and all was fine, but I want to switch to arch because I want to pass from update from release to release and I have had positive experience in my laptops... but in arch I have problems..

I hope someone could help (me/us) ;-)


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#5 2013-07-11 12:00:57

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb,
        Looking at your hooks, you have two hooks for RAID in your mkinitcpio; mdadm & mdadm_dev.

mdadm is now redundant & not advisable.
mdadm_dev is the current hook for RAID.

Remove the mdadm hook & reinstall the kernel (pacman -S linux) or in your konsole of choice: mkinitcpio -p linux.

Some useful wiki reading:
https://wiki.archlinux.org/index.php/RAID
https://wiki.archlinux.org/index.php/Mkinitcpio

Offline

#6 2013-07-11 12:13:18

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

Mmm... thanks, I will do.

By the other hand, I detected that blkid and mdadm.conf have different UUID. I change mdadm.conf UUID to reflect the blkid UUID (I read somewhere that blkid is the important).

I will count how many fsck errors I receive 8-|


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#7 2013-07-11 12:14:31

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

straykat wrote:

xanb,
        Looking at your hooks, you have two hooks for RAID in your mkinitcpio; mdadm & mdadm_dev.

mdadm is now redundant & not advisable.
mdadm_dev is the current hook for RAID.

Can you update the wiki page of mkinitcpio?. It could be useful for other people.

Regards,
Xan.


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#8 2013-07-11 12:47:47

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb wrote:

Can you update the wiki page of mkinitcpio?

The mkinitcpio wiki is correct & does in fact say of the mdadm_dev "Upstream prefers this method of assembly."
Also the Common hooks table "Runtime" box for mdadm_dev states "This is the preferred method of mdadm assembly (rather than using the above mdadm hook)."

This means that if you use RAID the preferred hook is mdadm_dev but if you want to use the the older mdadm you can but if you break it you get to keep the pieces.

The RAID wiki only refers to the mdadm_dev hook.

Last edited by straykat (2013-07-11 13:15:58)

Offline

#9 2013-07-11 13:06:46

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb wrote:

By the other hand, I detected that blkid and mdadm.conf have different UUID. I change mdadm.conf UUID to reflect the blkid UUID (I read somewhere that blkid is the important).

I am clueless when it comes to UUID & much prefer the readable "device name" e.g. /dev/sda1 or in RAID /dev/md0 etc. That way I know the partition "address" by looking at the "device name" as they are the same.

As far as I am aware, when using the mdadm_dev hook udev does all the heavy lifting so there is no need to get your hands dirty with mdadm.conf. I guess you could if you wanted to, but I have not ever needed to.

Last edited by straykat (2013-07-11 13:14:10)

Offline

#10 2013-07-11 13:35:38

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

straykat wrote:

xanb,
        Looking at your hooks, you have two hooks for RAID in your mkinitcpio; mdadm & mdadm_dev.

mdadm is now redundant & not advisable.
mdadm_dev is the current hook for RAID.

Remove the mdadm hook & reinstall the kernel (pacman -S linux) or in your konsole of choice: mkinitcpio -p linux.

Some useful wiki reading:
https://wiki.archlinux.org/index.php/RAID
https://wiki.archlinux.org/index.php/Mkinitcpio

No, after, check it and remove mdadm, I can't boot (always recovery shell). With mdadm sometimes boot, sometimes nope. I check also with only mdadm and no boot. It seems mdadm and mdadm_udev helps each other.

Xan.

Last edited by xanb (2013-07-11 13:52:43)


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#11 2013-07-11 13:39:10

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

How can I triage this bug?


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#12 2013-07-12 00:02:51

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb wrote:

How can I triage this bug?

The place for a bug triage is: https://bugs.archlinux.org/.

For a sanity check, why do you use RAID1 on a laptop? Does the laptop have two (2) hard drives?

Looking at your config files for syslinux & mdadm.conf you have configurations for both md127 which is RAID0/stripped which your syslinux is configured to boot & md126 which is RAID1/mirrored. You also have md3 in your configuration? There is no such RAID as md3.

Your RAID also appears to be spread across two of your hard drives, /dev/sdb & /dev/sdc. Where does your first hard drive /dev/sda configure in your set up?

Last edited by straykat (2013-07-12 02:35:25)

Offline

#13 2013-07-12 09:09:18

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

straykat wrote:
xanb wrote:

How can I triage this bug?

The place for a bug triage is: https://bugs.archlinux.org/.

For a sanity check, why do you use RAID1 on a laptop? Does the laptop have two (2) hard drives?

Looking at your config files for syslinux & mdadm.conf you have configurations for both md127 which is RAID0/stripped which your syslinux is configured to boot & md126 which is RAID1/mirrored. You also have md3 in your configuration? There is no such RAID as md3.

Your RAID also appears to be spread across two of your hard drives, /dev/sdb & /dev/sdc. Where does your first hard drive /dev/sda configure in your set up?

Sorry: pc instead of laptop.

* md127 is /, md126 is boot. md3 is a partition for data. Not mounted by now in fstab
* sda is another disk with Windows.


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#14 2013-07-12 09:47:13

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#15 2013-07-12 10:55:41

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb,
        Non of your configs make any sense. /boot needs to be on a plain ext4 fs (non RAID). You have /dev/md3 set up as RAID across 2 hard drives in your mdadm.conf, but again, there is no such RAID as md3.

Assuming sdb & sdc hard drives are of the same size, I would recommend your partition system look like this:

/dev/sda1: Windows

/dev/sdb1: /boot. Size: 500mb give or take depending on your kernels,
/dev/sdc1: empty. Size: equals /dev/sdb1,

/dev/sdb2: RAID0 (md127). Size: rest of disc minus swap partition at the end of the disc,
/dev/sdc2: RAID0 (md127). Size: rest of disc minus swap partition at the end of the disc,

/dev/sdb3: swap. Size: what you would normally allocate divided by 2 (halved),
/dev/sdc3: swap. Size: same as sdb3.

When you have swap across two partitions like this you can, with your /etc/fstab, configure them to either fill one swap first or have them work in parallel.

Now, having said all that, my advice to you is, with only the two hard drives, use sdb as / (root) & swap & sdc as /home. This will spread your hard drives I/O across both drives to give you a higher I/O without the issues associated with RAID.

Offline

#16 2013-07-12 12:16:28

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

straykat wrote:

xanb,
        Non of your configs make any sense. /boot needs to be on a plain ext4 fs (non RAID). You have /dev/md3 set up as RAID across 2 hard drives in your mdadm.conf, but again, there is no such RAID as md3.

And as I mentioned in the bug report (please don't use Flyspray as a support forum), you duplicate your config on the kernel cmdline, and you have both mdadm and mdadm_udev in your hooks. Doing this should (in theory) work, but it also means that your drives are assembled once by udev, and them disassembled and reassembled by mdassemble. It's not a good situation.

You really could just get rid of the /etc/mdadm.conf as well. You aren't doing anything exotic, and the incremental assembly via udev rules should just be able to figure this out without any external guidance.

Offline

#17 2013-07-12 14:08:02

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

straykat wrote:

xanb,
        Non of your configs make any sense. /boot needs to be on a plain ext4 fs (non RAID). You have /dev/md3 set up as RAID across 2 hard drives in your mdadm.conf, but again, there is no such RAID as md3.

Not. Recently, I use the same configuration (RAID in /boot) with ubuntu. There is no reason I read that I can't use RAID 1 on boot. Other things are RAID 4, 5 .... Can you give me uptodated sources of this assertion?


Thanks for all,
Xan.


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#18 2013-07-12 14:12:30

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

falconindy wrote:
straykat wrote:

xanb,
        Non of your configs make any sense. /boot needs to be on a plain ext4 fs (non RAID). You have /dev/md3 set up as RAID across 2 hard drives in your mdadm.conf, but again, there is no such RAID as md3.

And as I mentioned in the bug report (please don't use Flyspray as a support forum), you duplicate your config on the kernel cmdline, and you have both mdadm and mdadm_udev in your hooks. Doing this should (in theory) work, but it also means that your drives are assembled once by udev, and them disassembled and reassembled by mdassemble. It's not a good situation.

You really could just get rid of the /etc/mdadm.conf as well. You aren't doing anything exotic, and the incremental assembly via udev rules should just be able to figure this out without any external guidance.

Sorry falconindy if you seem I use flyspray for that. Simply I file a bug because I think it's.

* By the other hand, I add md3 in my mdadm.conf.  In the bug it's
* As I answer in bug, I tried mdadm or mdadm_udev alone but the ratio of error is major

Regards,
Xan.


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#19 2013-07-12 15:05:22

falconindy
Developer
From: New York, USA
Registered: 2009-10-22
Posts: 4,111
Website

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

Once again, here's what you need to do:

- remove the md= configuration from your kernel commandline
- remove /etc/mdadm.conf
- remove the mdadm hook from your /etc/mkinitcpio.conf
- use the mdadm_udev hook for assembly in early userspace

If, at this point, you're able to replicate some sort of assembly failure, then you need to take this upstream with the MD folks. You are not running anything exotic. I have multiple VMs which I test on a regular basis which have similar setups and are not problematic.

Offline

#20 2013-07-12 19:17:52

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

The configuration that you suggest causes the same error (with emergency shell) *all* the time. What can I do?. Are you sure I should remove mdadm.conf?

Last edited by xanb (2013-07-12 19:18:31)


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#21 2013-07-12 20:15:27

WonderWoofy
Member
From: Los Gatos, CA
Registered: 2012-05-19
Posts: 8,414

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

Did you regenerate your initramfs?

Offline

#22 2013-07-13 01:21:31

straykat
Member
From: Queensland, Australia
Registered: 2009-12-06
Posts: 60

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

xanb,
        For sanity to prevail, I am going to suggest the following:

Back up your data

Remove the RAIDs from your 2 linux hard drives. The Arch RAID wiki has instructions on this or (my prefered method) wipe your two linux drives with dd. If you are not sure on dd "Parted Magic" has a GUI for it.

Create a new partition table on both of your linux hard drives.

Partition your 2 linux hard drives as follows;
/dev/sdb1 as / (root) with ext4 fs
/dev/sdb2 as swap
/dev/sdc1 as /home with ext4 fs.

Reinstall Arch Linux on / (/dev/sdb1)

I would also recommend reading up in the Arch wiki, Kernel wiki etc on RAID & decide why you would use it on your system & which one would give you any benefit with your current hardware set up. Also learn more on how to create & maintain a RAID on an Arch system.

Keep in mind that KISS is always the way to go with any computer system as complexity will always increase the amount of mistakes & gotchas.

Offline

#23 2013-07-13 09:11:30

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

WonderWoofy wrote:

Did you regenerate your initramfs?

Yes,

mkinitcpio -p linux

Owning one OpenRC (artoo way) and other three systemd machines

Offline

#24 2013-07-13 09:15:28

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

straykat wrote:

xanb,
        For sanity to prevail, I am going to suggest the following:

Back up your data

Remove the RAIDs from your 2 linux hard drives. The Arch RAID wiki has instructions on this or (my prefered method) wipe your two linux drives with dd. If you are not sure on dd "Parted Magic" has a GUI for it.

Create a new partition table on both of your linux hard drives.

Partition your 2 linux hard drives as follows;
/dev/sdb1 as / (root) with ext4 fs
/dev/sdb2 as swap
/dev/sdc1 as /home with ext4 fs.

Reinstall Arch Linux on / (/dev/sdb1)

I would also recommend reading up in the Arch wiki, Kernel wiki etc on RAID & decide why you would use it on your system & which one would give you any benefit with your current hardware set up. Also learn more on how to create & maintain a RAID on an Arch system.

Keep in mind that KISS is always the way to go with any computer system as complexity will always increase the amount of mistakes & gotchas.

No. I don't want to behave as ostrich, hide the head ;-) This is my configuration, this is want I want. If archlinux fails, we have a bug. Another question is if the triage of the bug is difficult. No all bugs are good for triagging, but it's real life. If always we change the configuration, we will never detect new bugs.

Keep in mind that I don't want to blame, it's always "good force" comments.

Regards,
Xan.


Owning one OpenRC (artoo way) and other three systemd machines

Offline

#25 2013-07-13 10:10:05

xanb
Member
Registered: 2012-07-24
Posts: 418

Re: [Unsolved] Kernel linux 3.8.4-1 fails with fsck boot error on raid 0

With the help of Daan van Rossum and the reference of this bug, I think I have the solution. With that, I get *only* 1 emergency shell of 10 times (only the first)[see note].

* remove /etc/mdadm.conf
* tune2fs -L <mylabel> /dev/md127 (this is / RAID 1)
* in Syslinux, remove UUID and put

APPEND root=LABEL=<mylabel> ro

(remember, <mylabel> is label of root partition)
* In /etc/mkinitcpio.conf only put mdadm_udev in hooks (before filesystems)
* Re-run if necessary mkinitcpio -p linux
* Reboot

[note] before removing the mdadm.conf, I received this error. With mdadm.conf removed, I have not errors. I remove mdadm.conf, because I see that the numbers of partitions (/dev/md*) that I have mounted does not coincide with whose I have in mdadm.conf

Regards and thanks,
Xan.

PS: If someone will find a better solution, please, post it.
@falconindy: perhaps it's a *vanilla* bug over syslinux. Because syslinux should behave equal with UUID and with LABEL. Can you audit the code or inform the main developer (I don't know how to brief the technical details).


Owning one OpenRC (artoo way) and other three systemd machines

Offline

Board footer

Powered by FluxBB