You are not logged in.

#1 2018-11-17 15:55:34

Torxed
Member
Registered: 2013-01-10
Posts: 202

Trying to get kdump to work, doesn't load crash kernel on crash

When I do

# echo c > /prov/sysreq-trigger

I expect to enter a kernel where I can dump

/proc/vmcore

But for whatever reason, that crash kernel never "boots" meaning I can't get access to vmcore.

I've followed: https://wiki.archlinux.org/index.php/Kdump meticulously.
Followed a bunch of other resources such as: https://unix.stackexchange.com/question … arch-linux

What I've ended up with is a service script (tried manually as well) that looks like this:

[Unit]
Description=Load dump capture kernel
After=local-fs.target

[Service]
ExecStart=kexec -p /boot/vmlinuz-wifi --initrd=/boot/initramfs-linux419.img --apend="root=/dev/mapper/luksdev systemd.unit=kdump-save.service single irqpoll maxcpus=1 reset_devices"
Type=oneshot

[Install]
WantedBy=multi-user.target
# cat /proc/cmdline
initrd=\intel-ucode.img initrd=\initramfs-linux419.img cryptodevice=UUID=<uuid>:luksdev root=/dev/mapper/luksdev rw crashkernel=256M intel_pstate=no_hwp i915.enable_guc=3
# ls -l /boot/vmlinuz-wifi
-rwxr-xr-x 1 root root 5686144 Nov 17 13:14 /boot/vmlinuz-wifi

# ls -l /boot/initramfs-linux419.img
-rwxr-xr-x 1 root root 9351628 Nov 17 14:04 /boot/initramfs-linux419.img
# cat /sys/kernel/kexec_crash_loaded
1

The kernel was compiled with (which appears to be default in the arch kernel?):

CONFIG_DEBUG_INFO=y
CONFIG_CRASH_DUMP=y
CONFIG_PROC_VMCORE=y
# journalctl -u kdump
-- Reboot --
Nov 17 16:43:45 Archinstall systemd[1]: Starting Load dump capture kernel...
Nov 17 16:43:46 Archinstall systemd[1]: Started Load dump capture kernel.

I've rebooted a couple of times, tried manual approaches to loading in the kernel dump etc.
Any ideas where to begin debugging the debugger-mess? I've run out of ideas (and experience in kernel debugging) but suspect the culprint to be the luksdev device, will the crash kernel retain the unlocked disk?

Last edited by Torxed (2018-11-17 16:10:16)

Offline

#2 2018-11-17 17:37:42

Torxed
Member
Registered: 2013-01-10
Posts: 202

Re: Trying to get kdump to work, doesn't load crash kernel on crash

So as soon as you ask for help, it starts working (sorta).
I finally managed to get it to boot some how, most likely one of the config flags got screwed up even tho I double-checked them.

Had to change the `root=...` as well to match that of the output of `cat /proc/cmdline` (which should have been obvious).

The problem now, is that the luksdev passphrase input doesn't register any input..
Not sure if the HID got broken in the boot process or something. Not that I know what I'm talking about or anything..

Offline

#3 2018-11-17 17:40:19

loqs
Member
Registered: 2014-03-06
Posts: 18,363

Re: Trying to get kdump to work, doesn't load crash kernel on crash

How did you produce the .config for the custom kernel?

Offline

#4 2018-11-17 17:42:37

Torxed
Member
Registered: 2013-01-10
Posts: 202

Re: Trying to get kdump to work, doesn't load crash kernel on crash

loqs wrote:

How did you produce the .config for the custom kernel?

I went with option B:

make localmodconfig

I can boot the kernel that was built, but it won't boot in recovery situation.
Gets stuck on keyboard input.

If I use a external keyboard (laptop btw), I get the following during crash-kernel-boot:

xhci_hcd 0000:00:14.0: Error while assining device slot ID
xhci_hcd 0000:00:14.0: Max number of devices this xHCI host supports is 64
usb usb1-port1: couldn't allocate usb_device

Last edited by Torxed (2018-11-17 17:46:53)

Offline

#5 2018-11-17 17:49:55

loqs
Member
Registered: 2014-03-06
Posts: 18,363

Re: Trying to get kdump to work, doesn't load crash kernel on crash

Never used kdump so this could be completely wrong

xhci_hcd 0000:00:14.0: Error while assining device slot ID
xhci_hcd 0000:00:14.0: Max number of devices this xHCI host supports is 64

That output seems to be because xhci_hcd has read spurious data (from the firmware?) and is trying to add numerous USB controllers.

Offline

#6 2018-11-17 17:52:59

Torxed
Member
Registered: 2013-01-10
Posts: 202

Re: Trying to get kdump to work, doesn't load crash kernel on crash

loqs wrote:

Never used kdump so this could be completely wrong

xhci_hcd 0000:00:14.0: Error while assining device slot ID
xhci_hcd 0000:00:14.0: Max number of devices this xHCI host supports is 64

That output seems to be because xhci_hcd has read spurious data (from the firmware?) and is trying to add numerous USB controllers.

It's probably because of the no-name brand keyboard, which probably does more than it should hehe.
Never the less, the built-in keyboard in the laptop should be working out of the box.

Offline

#7 2018-11-17 18:04:09

loqs
Member
Registered: 2014-03-06
Posts: 18,363

Re: Trying to get kdump to work, doesn't load crash kernel on crash

The system uses the same kernel on boot and kexec but the built-in keyboard only fails on kexec or it fails on boot as well?

Offline

#8 2018-11-17 18:39:07

Torxed
Member
Registered: 2013-01-10
Posts: 202

Re: Trying to get kdump to work, doesn't load crash kernel on crash

loqs wrote:

The system uses the same kernel on boot and kexec but the built-in keyboard only fails on kexec or it fails on boot as well?

Only fails on kexec :)

Offline

#9 2018-11-17 18:56:58

loqs
Member
Registered: 2014-03-06
Posts: 18,363

Re: Trying to get kdump to work, doesn't load crash kernel on crash

Torxed wrote:
loqs wrote:

The system uses the same kernel on boot and kexec but the built-in keyboard only fails on kexec or it fails on boot as well?

Only fails on kexec smile

That would indicate to me that keyboard support is correct but the state the system is in on kexec is causing the issue.  (No idea how to resolve it)
Edit:
The initrd's are the same as well?

Last edited by loqs (2018-11-17 18:57:29)

Offline

#10 2018-11-17 19:36:44

Torxed
Member
Registered: 2013-01-10
Posts: 202

Re: Trying to get kdump to work, doesn't load crash kernel on crash

loqs wrote:

The initrd's are the same as well?

That's correct. I made sure they were via:

# echo $(cat /proc/cmdline) >> /etc/systemd/system/kdump.service

And added the initrd + kernel options to the `ExecStart` line.<br>
So whatever systemd-boot uses to boot my system entry should be 100% identical to whatever kexec executes.

After a few boots, I got this error message:

ACPI BIOS Error (bug): A valid RSDP was not found (20180810/tbxfroot-210)
[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSD 38d is b0)
efi: EFI_MEMMAP is not enabled.

Which lead me to: https://askubuntu.com/questions/613241/ … ot-working

Unfortunately the information given isn't helpful but the title says exactly the same thing I'm struggling with.
And most resources say that those messages can be ignored, since it's more of a BIOS glitch that shouldn't have any OS impact in the laptop i'm using.

Also tried telling kexec to boot my "normal kernel" to see if it was something specific to my own build.
Turns out, that get stuck on the same step, the HID being dead (when booted via kexec, works otherwise)

Last edited by Torxed (2018-11-17 23:14:22)

Offline

Board footer

Powered by FluxBB