You are not logged in.

#1 2020-10-05 05:30:17

Abbott
Member
Registered: 2017-10-07
Posts: 18

Kernel Panic 1 minute after boot

I recently updated my machine from linux 5.6 to 5.8 and I am getting panics shortly after booting. I checked for .pacnew files and didn't see anything from pacman during updating, so I don't think I'm missing config options or anything like that. This was the most recent panic:
https://i.imgur.com/efbmyLW.jpg
I have an archiso ready to go as well if other logs would be helpful. Can anyone make sense of this panic?

Mod edit: Removed oversized image -- V1del

Last edited by V1del (2020-10-05 06:34:16)

Offline

#2 2020-10-05 06:38:02

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,410

Re: Kernel Panic 1 minute after boot

Please only link or post thumbnails of images: https://wiki.archlinux.org/index.php/Co … s_and_code

Looks like a timer/acpi issue, if you don't have them set up your: microcode updates what's the mainboard/CPU model?

Online

#3 2020-10-05 20:11:05

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

I am using an MSI B450I for my mainboard and I am using a Ryzen 3 3200G APU. I checked and saw that I did not have ucode updates installed, so I have done that and reconfigured my grub config.
The box stayed on for a bit longer, but it just froze again after about an hour. This time there was no kernel panic printed out. I rebooted into archiso and this is journalctl -b-1: http://0x0.st/i0pv.txt

Last edited by Abbott (2020-10-05 21:18:06)

Offline

#4 2020-10-06 02:29:42

solskog
Member
Registered: 2020-09-05
Posts: 407

Re: Kernel Panic 1 minute after boot

Oct 05 15:02:50 leo radarr[673]:         Native Crash Reporting
...
Oct 05 15:02:54 leo systemd-coredump[980]: Process 673 (mono) of user 971 dumped core.
Oct 05 15:02:54 leo systemd[1]: radarr.service: Main process exited, code=dumped status=6/ABRT
Oct 05 15:27:53 leo systemd-coredump[27872]: Process 464 (rtorrent main) of user 969 dumped core.

Multiple application crash/dump from the start. you maybe need to reconfigure these applications. Also many services start at boottime, are these services dependent of each-other e.g: network connection prerequisite, which caused application failure? I think this is not a kernel related issue anymore.

Last edited by solskog (2020-10-06 03:11:49)

Offline

#5 2020-10-06 07:02:44

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,410

Re: Kernel Panic 1 minute after boot

I don't see messages that the microcode got updated and it isn't listed on the kernel command line (... though it isn't shown here either with GRUB, but I do get an update message)

Something else you can do is just run a general UEFI/BIOS update they will usually include the microcode as well. Just from what we see of the traces it does look like microcode issues.

Online

#6 2021-02-12 23:58:32

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

I am still facing panicks and freezes on this machine a couple of hours after booting and figured I would post an update in case anyone had any ideas as to what I should try next:

  • I have updated my BIOS to the latest version (7A40vAF3) here: https://www.msi.com/Motherboard/support … NG-PLUS-AC

  • The AMD ucode seems to be getting loaded:

    root@leo% dmesg | grep -i microcode
    [    0.708075] microcode: CPU0: patch_level=0x08108109
    [    0.708097] microcode: CPU1: patch_level=0x08108109
    [    0.708100] microcode: CPU2: patch_level=0x08108109
    [    0.708107] microcode: CPU3: patch_level=0x08108109
    [    0.708152] microcode: Microcode Update Driver: v2.2.
    root@leo% 
  • I have switched to the LTS kernel and am currently on version 5.4.97

  • I ran memtest86.org on the machine to test the RAM and it went three passes without any errors

  • I ran smartctl  -t short /dev/sdX` on all of the disks in the computer and they all passed

This is the last journalctl -b -1 : http://0x0.st/-XKk.log
This was the kernel panic I got after rebooting and letting it panic again: https://i.imgur.com/1u9q29O.jpg

solskog wrote:

Multiple application crash/dump from the start. you maybe need to reconfigure these applications. Also many services start at boottime, are these services dependent of each-other e.g: network connection prerequisite, which caused application failure? I think this is not a kernel related issue anymore.

All of these applications are using systemd units that came with their packages (with the exception of rtorrent-ps, which didn't come with a unit), so I was assuming that their dependencies were listed correctly. Would these packages dumping cause a kernel panic? The only unit that ends in a failed state right now is postgresql, as I haven't updated the database yet.

Last edited by Abbott (2021-02-13 03:53:50)

Offline

#7 2021-02-14 21:10:46

euromatlox
Member
Registered: 2017-02-10
Posts: 110

Re: Kernel Panic 1 minute after boot

Read Arch Wiki Ryzen. Perhaps try iommu or amd_iommu or some other kernel parameter (at your own risk).
Here or here is a big list of kernel parameters (there seem to be some differences - which is right?).  Do a web search: kernel panic iommu

Last edited by euromatlox (2021-02-14 21:28:57)

Offline

#8 2021-02-18 17:40:16

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

euromatlox wrote:

Read Arch Wiki Ryzen. Perhaps try iommu or amd_iommu or some other kernel parameter (at your own risk).
Here or here is a big list of kernel parameters (there seem to be some differences - which is right?).  Do a web search: kernel panic iommu

I looked back through the logs and screenshots that I posted and couldn't interpret anything as pointing to a problem with the iommu, could you explain more?

From the Ryzen wiki link, I see that people experience problems with the C6 state, and all of the panics I've been able to take pictures of seem to panic on functions having to do with idle CPU states. My mainboard uses the MSI Click BIOS, and I can't seem to find any option for managing c states or anything having to do with idling: https://i.imgur.com/1tVkiHu.jpg
I am going to try the processor.max_cstate=5 kernel option to see if that resolves my issue. It seems the only downside of disabling c6 is potential power saving. I will update with results soon.

Last edited by Abbott (2021-02-18 17:49:47)

Offline

#9 2021-02-18 19:09:28

euromatlox
Member
Registered: 2017-02-10
Posts: 110

Re: Kernel Panic 1 minute after boot

I have no Ryzen in my computer. I have only readed about problems with Ryzen, like here. Don't know if it's risky to try, but in my opinion log files are not always bulletproof evidence.

Last edited by euromatlox (2021-02-18 19:12:11)

Offline

#10 2021-02-18 19:16:20

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

I booted with processor.max_cstate=5, but still got a panic related to idling: https://i.imgur.com/4aOv53g.jpg
Problems with the iommu seem to cause freezing/panicking during boot, but this machine is able to get past boot, at least for a little while.
Is there anything else I can do to fix these idling problems?

Offline

#11 2021-02-18 19:27:25

euromatlox
Member
Registered: 2017-02-10
Posts: 110

Re: Kernel Panic 1 minute after boot

For occasional freezing problems solution for me has usually been looking inside psu, where bulged capacitors found.
One time I had to replace 32.768 kHz crystal in motherboard (the clock was lagging behind too much).
Of course in your case most likely is a software problem, possibly related to that linux version update.

You can check this: Ryzen crashing while idle..and be careful not damaging your Ryzen.
Quick search tells that you Ryzen is 1.2v, there are also 1.4v Ryzen cpus. Should not fry cpu using too high voltage.

Last edited by euromatlox (2021-02-18 19:52:16)

Offline

#12 2021-02-21 20:02:19

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

An update on the things I've tried:
I tried downgrading the kernel to mainline 5.6.10 (what I was on before seeing these problems), but the panics persisted
I tried installing zenstates and disabling C6. This seemed to work. The machine stayed up for 32 hours before I shut it off to move it. When I brought it back up it failed again after a couple of hours: https://i.imgur.com/HT4F7p7.jpg
This time I am getting a page error, which seems totally different from what I've seen so far. Could the CPU be bad? Is there a way to test?

Edit: another page-related panic https://i.imgur.com/pTb4Miu.jpg
Edit: ...and another https://i.imgur.com/IsNlvQV.jpg

Last edited by Abbott (2021-02-22 03:46:09)

Offline

#13 2021-02-22 09:14:32

euromatlox
Member
Registered: 2017-02-10
Posts: 110

Re: Kernel Panic 1 minute after boot

Perhaps a curse, possibly someone using Ryzen without problems can help. I guess motherboard BIOS might affect things too.

Last edited by euromatlox (2021-02-22 09:14:52)

Offline

#14 2021-03-29 17:38:15

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

Another update:
Someone in #archlinux on freenode suggested updating my kernel again, so I updated to 5.11, but eventually it panicked again: https://i.imgur.com/JVueE3m.jpg

I might be noticing a pattern:
Usually someone recommends I update the kernel, it can maintain a longer uptime than previously (maybe 3 weeks) and then it will panic, then the next uptime is shorter (a couple of days), and the time til panic will continue to decrease until it can't stay up longer than a couple hours. I'm not sure if this can help pinpoint the problem, but figured it was worth mentioning.

Last edited by Abbott (2021-03-29 17:38:49)

Offline

#15 2021-04-05 16:42:02

Abbott
Member
Registered: 2017-10-07
Posts: 18

Re: Kernel Panic 1 minute after boot

Potential lead? Most of the errors have to do with paging or cache, so I was under the impression that this was a CPU problem, but I was not considering that mergerfs could be the source of the issue, as a filesystem in userspace shouldn't be touching the kernel (right?). After opening an issue with mergerfs, the maintainer has suggested that this is an issue with FUSE itself and has opened an issue upstream. The box froze again overnight and I uploaded the dmesg logs here (gzip'ed) for the upstream case and for here, in case anyone can find any smoking guns.
While this has been unfolding, I installed stress to double-check that the CPU wasn't at fault and can confirm that I was able to run stress for ~30 seconds on all 4 cores and did not see any problems. I eventually stopped because temps hit 80 degrees, but after stopping, the CPU returned to idling around 47 degrees. So at this point I have disabled c6 state in the CPU, stress tested the CPU with stress, tested the RAM with memtest86, and checked the SMART info of all the disks.
Please let me know if there is anything else worth trying/testing.

Last edited by Abbott (2021-04-05 16:43:59)

Offline

Board footer

Powered by FluxBB