You are not logged in.

#1 2012-11-17 20:16:29

mihanson
Member
Registered: 2009-04-27
Posts: 10

[SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

I'm having a problem with kernel panics on reboot or shutdown using an ext4 formatted NBD rootfs.  I have had the client shutdown/reboot cleanly twice in about 200 smile tries, so I believe it's a service shutdown ordering issue. This is a pure systemd install that was created following the Diskless Network Boot NFS root wiki guide. The client(s) are deticated MythTV frontends. I'm using PXE/TFTP/DHCP to serve up the kernel to my diskless client and the NBD rootfs is on the same machine as the PXE/TFTP/DHCP server. My /etc/nbd-server/config looks like this:

[generic]
[frontend]
    exportname = /home/frontend.img
    copyonwrite = true

My pxelinux.cfg/default looks like:

default linux

label linux
kernel vmlinuz-linux
append initrd=initramfs-linux.img root=/dev/nbd0 rootfstype=ext4 nbd_host=192.168.1.2 nbd_name=frontend ip=:::::eth0:dhcp elevator=deadline panic=60

I'm using dnsmasq to provide the dhcp and tftp service to the client. It's configuration is:
/etc/dnsmasq.conf

# port=0 disables DNS service
port=0
user=dnsmasq
dhcp-range=192.168.1.0,static
dhcp-hostsfile=/etc/dnsmasq.d/hosts
dhcp-optsfile=/etc/dnsmasq.d/options
dhcp-boot=pxelinux.0
enable-tftp
tftp-root=/var/tftp/mythtv-frontend
tftp-lowercase

/etc/dnsmasq.d/hosts

00:01:2E:26:4E:17,br,192.168.1.5,3h
00:01:2E:26:3C:9C,lr,192.168.1.9,3h

/etc/dnsmasq.d/options

option:router,192.168.1.1
option:dns-server,192.168.1.1

The clients work very well when they're up and running, but shutdown and reboot are the problem.  I am using dhcp@eth0.service for network configuration on the clients.  I customized the dhcp@.service file, most notably to add the -p switch to keep the interface up when dhcpcd service is shutdown.
/etc/systemd/system/dhcpcd\@.service

[Unit]
Description=dhcpcd on %I
Wants=network.target
Before=network.target
BindTo=sys-subsystem-net-devices-%i.device
After=sys-subsystem-net-devices-%i.device

[Service]
Type=forking
PIDFile=/run/dhcpcd-%I.pid
ExecStart=/sbin/dhcpcd -A -q -w -p %I

[Install]
Alias=multi-user.target.wants/dhcpcd@eth0.service

I have tried adding DefaultDependencies=false and KillMode=none to dhcpcd@.service, but no change.

My /etc/fstab on the client:

# 
# /etc/fstab: static file system information
#
# <file system>	<dir>	<type>	<options>	<dump>	<pass>
tmpfs		/tmp	tmpfs	nodev,nosuid	0	0
#/dev/nbd0	/	ext4	defaults	0	0
none		/	none
none		/dev/pts devpts	gid=5,mode=620	0	0

# Network Filesystems
192.168.1.2:/var/mythtv	/var/mythtv	nfs	noauto,_netdev,x-systemd.automount,x-systemd.device-timeout=15,users,noatime,timeo=14,intr,proto=tcp,hard,actimeo=0,rsize=1048576,wsize=1048576	0	0

I managed to capute the kernel panic using netcat and a second machine on the same network.

$  nc -u -l -p 6969
[  483.999803] block nbd0: Receive control failed (result -4)
[  484.083319] block nbd0: Attempted send on closed socket
[  484.083404] block nbd0: Attempted send on closed socket
[  484.083410] end_request: I/O error, dev nbd0, sector 4457736
[  484.083431] Aborting journal on device nbd0-8.
[  484.083444] block nbd0: Attempted send on closed socket
[  484.083447] end_request: I/O error, dev nbd0, sector 4456448
[  484.083451] Buffer I/O error on device nbd0, logical block 557056
[  484.083461] JBD2: Error -5 detected when updating journal superblock for nbd0-8.
[  484.169080] end_request: I/O error, dev nbd0, sector 909320
[  484.172374] Buffer I/O error on device nbd0, logical block 113665
[  484.191955] block nbd0: Attempted send on closed socket
[  484.203569] end_request: I/O error, dev nbd0, sector 0
[  484.206883] Buffer I/O error on device nbd0, logical block 0
[  484.227320] EXT4-fs error (device nbd0): ext4_journal_start_sb:348: Detected aborted journal
[  484.239625] EXT4-fs (nbd0): Remounting filesystem read-only
[  484.252005] EXT4-fs (nbd0): previous I/O error to superblock detected
[  484.264496] block nbd0: Attempted send on closed socket
[  484.276987] end_request: I/O error, dev nbd0, sector 0
[  484.280307] Buffer I/O error on device nbd0, logical block 0
[  484.301868] EXT4-fs (nbd0): ext4_da_writepages: jbd2_start: 9223372036854775807 pages, ino 11589;
err -30
[  484.316202] EXT4-fs (nbd0): previous I/O error to superblock detected
[  484.329364] ------------[ cut here ]------------
[  484.332607] kernel BUG at fs/buffer.c:2873!
[  484.332607] invalid opcode: 0000 [#1] PREEMPT SMP
[  484.332607] Modules linked in: netconsole configfs sd_mod ata_generic pata_acpi rfcomm bnep btusb
bluetooth rfkill snd_hda_codec_hdmi nfsd auth_rpcgss nvidia(PO) snd_hda_codec_realtek hid_generic
usbhid hid coretemp evdev uinput ahci libahci snd_hda_intel libata microcode snd_hda_codec ohci_hcd
snd_hwdep psmouse snd_pcm pcspkr serio_raw ehci_hcd snd_page_alloc scsi_mod shpchp snd_timer
pci_hotplug i2c_nforce2 usbcore snd soundcore usb_common i2c_core wmi button processor lirc_serial(C)
lirc_dev ext4 crc16 jbd2 mbcache nbd forcedeth nfsv3 nfs_acl nfs lockd sunrpc fscache
[  484.332607] CPU 0
[  484.332607] Pid: 1, comm: systemd-shutdow Tainted: P         C O 3.6.6-1-ARCH #1 To Be Filled By
O.E.M. To Be Filled By O.E.M./To be filled by O.E.M.
[  484.332607] RIP: 0010:[<ffffffff811afdbd>]  [<ffffffff811afdbd>] submit_bh+0x11d/0x130
[  484.332607] RSP: 0018:ffff88005bdc9c58  EFLAGS: 00010246
[  484.332607] RAX: 0000000000040005 RBX: ffff880058d95208 RCX: 0000000000000019
[  484.332607] RDX: 0000000000000000 RSI: ffff880058d95208 RDI: 0000000000000211
[  484.332607] RBP: ffff88005bdc9c78 R08: 000000000a000020 R09: 0000000000000000
[  484.332607] R10: ffff88005c43f2e0 R11: 0000000000000001 R12: 0000000000000211
[  484.332607] R13: 0000000000000211 R14: 0000000000001820 R15: ffff8800576ee800
[  484.332607] FS:  00007fab70ed9700(0000) GS:ffff88005e600000(0000) knlGS:0000000000000000
[  484.332607] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  484.332607] CR2: 00007fb7dd43f824 CR3: 000000005ae75000 CR4: 00000000000007f0
[  484.332607] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  484.332607] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  484.332607] Process systemd-shutdow (pid: 1, threadinfo ffff88005bdc8000, task ffff88005bdc0000)
[  484.332607] Stack:
[  484.332607]  0000000000000000 ffff880058d95208 0000000000000211 ffff880058d95208
[  484.332607]  ffff88005bdc9ca8 ffffffff811b1842 ffff880058d95208 0000000000001820
[  484.332607]  0000000000041525 ffff88005e6ae400 ffff88005bdc9cb8 ffffffff811b18d3
[  484.332607] Call Trace:
[  484.332607]  [<ffffffff811b1842>] __sync_dirty_buffer+0x52/0xd0
[  484.332607]  [<ffffffff811b18d3>] sync_dirty_buffer+0x13/0x20
[  484.332607]  [<ffffffffa011d550>] ext4_commit_super+0x1e0/0x250 [ext4]
[  484.332607]  [<ffffffffa011d773>] save_error_info+0x23/0x30 [ext4]
[  484.332607]  [<ffffffffa011ea46>] __ext4_abort+0x36/0x120 [ext4]
[  484.332607]  [<ffffffffa011fe1d>] ext4_remount+0x3dd/0x620 [ext4]
[  484.332607]  [<ffffffff81182d41>] do_remount_sb+0x81/0x1b0
[  484.332607]  [<ffffffff8119fef1>] do_mount+0x591/0x8e0
[  484.332607]  [<ffffffff8118ad73>] ? getname_flags+0x53/0xf0
[  484.332607]  [<ffffffff811a02cd>] sys_mount+0x8d/0xe0
[  484.332607]  [<ffffffff81499f2d>] system_call_fastpath+0x1a/0x1f
[  484.332607] Code: 08 00 41 8b 5c 24 18 4c 89 e7 e8 af 5d 00 00 48 83 c4 08 c1 e3 18 c1 fb 1f 83 e3
a1 89 d8 5b 41 5c 41 5d 5d c3 0f 0b 0f 0b 0f 0b <0f> 0b 0f 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00
00 00 55 48
[  484.332607] RIP  [<ffffffff811afdbd>] submit_bh+0x11d/0x130
[  484.332607]  RSP <ffff88005bdc9c58>
[  485.055679] ---[ end trace ec1c710e1d665e8b ]---
[  485.076530] Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
[  485.076530]
[  485.078355] Rebooting in 60 seconds...

As you can see there's a socket being closed prematurely. I'm having trouble figuring out 1) What service/socket/mount systemd file I need to change to keep that socket open and 2) what do I need to change in that file.  Can anyone lend some advice?

Last edited by mihanson (2012-11-21 17:39:26)

Offline

#2 2012-11-20 22:15:24

alkajo
Member
Registered: 2011-12-08
Posts: 6

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

i am having the same problem. It also tells me that systemd-shutdown tainted. If you do figure it out please let me know.

Offline

#3 2012-11-20 22:35:45

65kid
Member
From: Germany
Registered: 2011-01-26
Posts: 663

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

I wrote the mkinitcpio-nbd hook. I have to admit that I haven't used it myself in ages and I never tested it with systemd, but I'm pretty sure I know what the problem is:

The mkinitcpio hook calls nbd-client to set up the nbd device and the process has to keep running for the device to stay connected. The problem is that systemd kills all processes before jumping back into the initramfs, including nbd-client. This is not a systemd problem, this is what the init system is supposed to do and also what initscripts did - it's just that I worked around this in initscripts with this.

I currently have no idea how to tell systemd not to kill nbd-client (I'm sure it is possible somehow). I'm going to have a look into it when I find the time.

Offline

#4 2012-11-20 23:24:32

mihanson
Member
Registered: 2009-04-27
Posts: 10

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

I have yet to try this workaround just yet, but if you prefix nbd-client with a '@' (no quotes) systemd will not murder nbd-client.  It's explained here and alluded to here.  So in my/our case we would need to make changes to the mkinicpio-nbd hook (Which is installed in /usr/lib/initcpio/hooks/ ) and rebuild the image, install in your tftp server root and reboot. If that fails, we can try to re-build the nbd package by adding --prefix=@ to the configure line.  Again, I have not tried either of these methods yet, nor are they the ideal workarounds.  Ideally, nbd-{client,server} would honor argv[0], but currently it does not.

Offline

#5 2012-11-21 13:11:44

65kid
Member
From: Germany
Registered: 2011-01-26
Posts: 663

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

mihanson wrote:

I have yet to try this workaround just yet, but if you prefix nbd-client with a '@' (no quotes) systemd will not murder nbd-client.  It's explained here and alluded to here.  So in my/our case we would need to make changes to the mkinicpio-nbd hook (Which is installed in /usr/lib/initcpio/hooks/ ) and rebuild the image, install in your tftp server root and reboot. If that fails, we can try to re-build the nbd package by adding --prefix=@ to the configure line.  Again, I have not tried either of these methods yet, nor are they the ideal workarounds.  Ideally, nbd-{client,server} would honor argv[0], but currently it does not.

yep, that's it, thanks.
I just uploaded mkinitcpio-nbd 0.4 which hopefully fixes this. Note that I haven't tested this since I'm too lazy too set everything up, but the workaround is pretty simple so I guess this should work. wink
Nevertheless, make sure you make a backup of your initramfs before rebuilding it with the new version in case I screwed something up.

Feedback welcome. smile

Offline

#6 2012-11-21 16:25:26

mihanson
Member
Registered: 2009-04-27
Posts: 10

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

Ok, I have tested and think I have a working solution.  The package nbd needs to be rebuilt with this new configure line:

./configure --prefix=/usr --sysconfdir=/etc --enable-syslog --program-prefix=@

otherwise the initial ram disk can't find the binary @nbd-client.  mkinitcpio-nbd needs rebuilding.  The v3.2 nbd_hook source file needs this change:

--- mkinitcpio-nbd-0.3.2/nbd_hook.orig	2012-07-07 18:41:59.000000000 +0000
+++ mkinitcpio-nbd-0.3.2/nbd_hook	2012-11-20 21:05:04.976076074 +0000
@@ -11,6 +11,6 @@
 		msg "loading module..."
 		modprobe nbd
 		msg "connecting..."
-		nbd-client ${nbd_host} ${nbd_port} /dev/nbd0 -persist -name ${nbd_name}
+		@nbd-client ${nbd_host} ${nbd_port} /dev/nbd0 -persist -name ${nbd_name}
 	fi
 }

and the nbd_install from v3.2 needs this change:

--- mkinitcpio-nbd-0.3.2/nbd_install.orig	2012-07-07 18:41:59.000000000 +0000
+++ mkinitcpio-nbd-0.3.2/nbd_install	2012-11-21 05:14:28.984380241 +0000
@@ -2,7 +2,7 @@
 
 build() {
     add_module "nbd"
-    add_binary "nbd-client"
+    add_binary "@nbd-client"
     add_runscript
 }

and it all works as expected.

@65kid: You already have the nbd_hook correct, but your nbd_install needs adjustment because nbd-client is actually installed in /usr/sbin and the newly built nbd with the '@' prefix removes nbd-client, replacing it with @nbd-client.  Were you trying to add both nbd-client and @nbd-client binaries to the image or were you going for argv[0] of nbd-client?  I could probably make a package for the AUR with just @nbd-client so nbd-client and @nbd-client could co-exist.

Offline

#7 2012-11-21 16:31:14

65kid
Member
From: Germany
Registered: 2011-01-26
Posts: 663

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

mihanson wrote:

@65kid: You already have the nbd_hook correct, but your nbd_install needs adjustment because nbd-client is actually installed in /usr/sbin and the newly built nbd with the '@' prefix removes nbd-client, replacing it with @nbd-client.  Were you trying to add both nbd-client and @nbd-client binaries to the image or were you going for argv[0] of nbd-client?  I could probably make a package for the AUR with just @nbd-client so nbd-client and @nbd-client could co-exist.

no need to rebuild nbd, I think you are misunderstanding the add_binary function.

from my install function:

add_binary nbd-client /usr/bin/@nbd-client

this will not install nbd-client and @nbd-client, it will install nbd-client into the initramfs as /usr/bin/@nbd-client.

Offline

#8 2012-11-21 17:38:53

mihanson
Member
Registered: 2009-04-27
Posts: 10

Re: [SOLVED] Kernel Panic with NBD RootFS on reboot and shutdown

@65kid: my mistake. You are correct.  Your fix works as-is.

To summarize, for anyone who may be reading this:

1. Install mkinitcpio-nbd>=0.4 in your nbd image.
2. In your nbd image, add the nbd hook after the net hook in etc/mkinitcpio.conf
3. In your nbd image, regenerate your initramfs

# mkinitcpio -p linux

4. Copy the contents of your <nbd-image>/boot/ to your tftp server.

# cp -a <nbd-image>/boot/* /path/to/tftp/server/

5. Paydirt. smile

I'll mark this as solved.  Thanks 65kid!

Offline

Board footer

Powered by FluxBB