You are not logged in.

#1 2015-10-19 23:04:03

Teaspoon
Member
Registered: 2012-08-29
Posts: 17

[solved] Grub won't boot off degraded ZFS raid

I built a system last night with a zfs root on a four-device raidz1 pool, except that I'm missing a cable for powering the fourth disk. To get things up and running while I wait for a chance to get to the shop for the cable, I built it using a sparse file for the fourth device and then offlined the. In the archiso environment I can import the pool with no errors and it shows up in zpool status as degraded-but-usable. I did forget to disable features like hole_birth when creating the pool, which caused some late-night headaches trying to get grub to install, but grub-git from the AUR fixed those and it detects and installs OK.

When I try to boot the machine, however, grub pops up "error: couldn't find a necessary member device of multi-device filesystem." and dumps me to rescue mode. I assume this is because only three disks of the four-disk zpool are available... obviously the quick fix is to get my adaptor so that I can set up the fourth disk and let grub find a complete array, but this makes me worry about how painful it'll be to deal with disk failures later on. If grub can't boot off a degraded-but-usable zpool, I won't be able to get into an environment where I can put the replacement disk in the array.

My googlings have revealed others having the same problem with mdraid and btrfs raid1 volumes that are missing a disk, too. What I haven't seen yet is success stories about how they got it running anyway.

Is there a way to tell grub that it's OK to use a degraded raid? The other option I can think of is to have a rescue stick on hand and make sure I keep it up to date with support for my zpool version, but that sounds like exactly the sort of thing that I'd lose or forget to update.

Last edited by Teaspoon (2015-10-19 23:55:01)

Offline

#2 2015-10-19 23:33:49

Teaspoon
Member
Registered: 2012-08-29
Posts: 17

Re: [solved] Grub won't boot off degraded ZFS raid

I tried to boot using manual commands in grub-rescue, but the "linux" command is not recognised. When I tried to "insmod linux" (or "insmod (hd0,1)/@/boot/grub/i386-pc/linuxmod") I got the same "couldn't find a necessary member device of multi-device filesystem" error.

So I guess that attempting to insmod is what's actually throwing these errors. Maybe I could convince grub-install to embed the necessary modules into its image?

Offline

#3 2015-10-19 23:54:50

Teaspoon
Member
Registered: 2012-08-29
Posts: 17

Re: [solved] Grub won't boot off degraded ZFS raid

For posterity (because it drives me up the wall when I see other people with problems and can't find the fix!)...

set root=(hd0,gpt1)
set prefix=/@/boot/grub
insmod linux
linux /@/boot/vmlinuz-linux zfs=zpoolNameGoesHere
initrd /@/boot/intel-ucode.img /@/boot/initramfs-linux.img
boot

And we have lift-off!
Today I Learned: that grub-rescue might need the prefix set manually before insmod can work.

Offline

#4 2016-02-29 23:31:01

Teaspoon
Member
Registered: 2012-08-29
Posts: 17

Re: [solved] Grub won't boot off degraded ZFS raid

A lovely bit of thread-necromancy...

Today I figured out that the reason grub was never finding the replacement disk was that BIOS was configured to skip that port during disk detection. I noticed that "ls" at the grub command line only showed hd0-2 and there should've been a hd3. I've been having to go and manually type commands into grub on this machine after every power outage or kernel upgrade.

I'm assuming the ports were disabled to prevent it from waiting for the polling to time out during POST for faster boots.

Let this be a lesson... when a fellow enthusiast donates a spare machine that he'd built for a really specific purpose, reset the BIOS!

Offline

Board footer

Powered by FluxBB