You are not logged in.
Hi,
I've been plagued by loads and loads of kernel panics for months now. In fact, just when writing this very sentence i had another one..
I'm about certain that it's the build-in GPU (Ryzen 4800U) as the panics occur a LOT when i'm attaching external monitors + video.
And yes, kernel panic. Blinking caps lock key if i'm lucky. Sometimes also just a hard freeze and sometimes an instant reboot. Fun..
So now i'm trying to figure out what exactly is wrong. For that i'm following the kdump article here: https://wiki.archlinux.org/title/Kdump
Sadly i'm really having troubles understanding that article. I assume it's written by someone who had it working and explained "their way" but to me it seems to assume things and mixes other things that make it darn impossible to follow.
As an example. It states that "it is easiest to modify your default initramfs", which i do. I change my default initramfs file. Now where is that "-kdump" kernel?
I get it's "intent" (i think), it probably means to "copy" from the "default" to create a new one named something like "default-kdump" (a new preset).
Even if i go by that assumption, i'm still not there at all yet.
The way i read it is:
- change mkinitcpio.conf to mkinitcpio-kdump.conf and add the changes from that article
- default preset (using mkinitcpio-kdump.conf)
- default-kdump preset (also using mkinitcpio-kdump.conf)
Now creating the kernel (say mkinitcpio -p linux-kdump) gives me an error:
specified kernel image does not exist: `/boot/vmlinuz-linux-kdump'
Creating the normal preset does work.
I googled this and honestly can't find how to tell mkinitcpio to create a kernel with a different name suffix. Is that really so difficult or am i just really bad at googling?
Again, the article just "assumes" you get that working, that appears to be not as straightforward as it you'd think.
But, for the sake of trying persistently, i can build it with the default name and rename it. Then i'm stuck in the next issue.
You have a couple systemd files here.
- kdump.service
- kdump-save.service
- kdump.service (tweaked one that internally also uses kdump-save.service)
Which combination of those files do i need to have?
Clarity! Please!
Suffice to say, i don't get a dump output..
I hope someone could writeup a proper way of kernel crash dump debugging. All i need to know is which damned component is so crash happy on my pc to nuke it..
Best regards,
Mark
Offline
I didn't kwew about kdump, it sound interesting, but obviously I can't help about it.
However, if you have another phisical machine running linux in your lan, you may consider netconsole.
https://wiki.archlinux.org/title/Genera … netconsole
Help me to improve ssh-rdp !
Retroarch User? Try my koko-aio shader !
Offline
I don't really know anything about kdump either, but it seems you skipped the first section "compiling kernel".
Offline
how to tell mkinitcpio to create a kernel
mkinitcpio does not create kernels but the initramfs…
Now where is that "-kdump" kernel?
https://wiki.archlinux.org/title/Kdump#Compiling_kernel
You can also just copy the existing kernel, but will lack the debug symbols for https://wiki.archlinux.org/title/Kdump# … _core_dump
Offline
It sounds like you haven't compiled a dump kernel yet. You can't skip that step.
EDIT: too slow x2.
Last edited by Trilby (2022-07-31 21:09:44)
"UNIX is simple and coherent" - Dennis Ritchie; "GNU's Not Unix" - Richard Stallman
Offline
I'm sorry for not mentioning that (my fault), but the main arch kernel has those flags enabled by default. It therefore seems logical to me that recompiling isn't needed at all, or am i missing something here?
Offline
You can also just copy the existing kernel, but will lack the debug symbols for https://wiki.archlinux.org/title/Kdump# … _core_dump
Offline
After soooooo much time trying to debug this, none of the above helped. I could just not get it working.
the symptoms this laptop seems to have is about all the ways linux can crash:
- Total freeze, not even a blinking capslock
- Same but with a blinking capslock
- instant reboot
I'm about 99% certain that it has to do with the build in gPU in this CPU (4800U) and therein i suspect the hardware video decoder the most. Crashes just seemed to happen mostly with video but also with things that were hardware accelerated (like your entire browser is these days).
I tried out a live usb with NEON linux (debian variation) to easily figure out if this is an Arch thing with potential my custom configs over many months. Or if this is a cpu/gpu/laptop specific issue.
Turns out that the "Total freeze, not even a blinking capslock" also happened there. I'm sure the other forms of crashing would occur there too if i had tried it long enough.
Since then i did find a very weird but workable workaround.
This crashing only seemed to happen when my laptop was charging (power cable plugged in). I can't recall ever having the issue while working from the battery. So now my "workable workaround" is to charge it, disconnect the power plug and use it.
Offline
instant reboot
Undervoltage or overtemperature.
Ryzen has a record on the former, https://wiki.archlinux.org/title/Ryzen#Troubleshooting
Offline
instant reboot
Undervoltage or overtemperature.
Ryzen has a record on the former, https://wiki.archlinux.org/title/Ryzen#Troubleshooting
Ohhh thank you!!
Just added "idle=nomwait pci=nomsi" to my boot parameters to see if that helps anything. I will report back with findings, but that might take a while. As sometimes these issies don't happen for days and sometimes they happen like a dozen times in an hour. And that's no exaggeration. Last time that happened i was about ready to grab a drilling machine and drill straight through the CPU as a means of "fuck this laptop, you're going down" ![]()
Offline
And... it happened again.
Even twice in a row! (the record is 18x a row till it finally acted normal again, that's when i started this topic to begin with)
Right now i have these boot parameters: "idle=nomwait pci=nomsi processor.max_cstate=5" which are all the ones "supposedly" or "potentially" fixing it. Last crash proved that to be a dud...
I have no clue anymore. I'm out of options to try with this.
For now i'll just stick to my use of this laptop with a charged battery. Recharging + using it gives about a 100% certainty of it crashing. Not having power plugged again makes it work just fine.
I do really hate these kind of impossible to debug issues!
Offline
I do really hate these kind of impossible to debug issues!
Other than an instant reboot, you could still pursuit the crash kernel dump…
Recharging + using it gives about a 100% certainty of it crashing. Not having power plugged again makes it work just fine.
Does the charge rate matter (ie. does it crash at 15% battery charging and at 95% battery charing or always when the battery is nearly/entirely full)?
If the carching level seems relevant, it could be an issue w/ the battery:
acpi -VIf not, it's likely some power management damon that controls the cpu governor, the GPU or on the HW level even memory timings (check whether the BIOS has any settings about the AC./. battery performance and otherwise review your usage/config of eg. https://wiki.archlinux.org/title/Laptop_Mode_Tools and the likes.
Offline