You are not logged in.

#1 2023-01-03 16:16:46

kubax
Member
Registered: 2014-10-22
Posts: 19

How to analyze kernel crashdump?

I recently have some problems with crashing Kernel.

In 6.1.1 (at least, i just updated to 6.1.2) the Kernel crashes multiple times a day.

I allready configured kdump und kdump-save mostly like the wiki entries (had to alter a litle bit because of full disk encryption, and it's not automatic, but at least i get some data)

Now i have multiple crash dumps and tried to use "crash" with the extracted kernel, but crash doesn't seem to like the debug symbols in the arch kernel. Maniaxx [https://bbs.archlinux.org/viewtopic.php?id=264253] had the same problem, but there is no real solution.

But at least i now have the kernel dmesg from the crashed kernel.

Problem seems to be related to netfilter NAT (might be caused by docker)

[21797.029554] audit: type=1327 audit(1672759021.058:1945): proctitle=43524F4E002D66002D4C003135
[21797.029685] audit: type=1105 audit(1672759021.058:1946): pid=360096 uid=0 auid=0 ses=35 msg='op=PAM:session_open grantors=pam_loginuid,pam_env,pam_env,pam_permit,pam_unix,pam_limits acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
[21797.030965] audit: type=1104 audit(1672759021.058:1947): pid=360096 uid=0 auid=0 ses=35 msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
[21797.030992] audit: type=1106 audit(1672759021.058:1948): pid=360096 uid=0 auid=0 ses=35 msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit,pam_unix,pam_limits acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
[21797.812729] audit: type=1101 audit(1672759021.841:1949): pid=360101 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:accounting grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
[21797.812766] audit: type=1103 audit(1672759021.841:1950): pid=360101 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" hostname=? addr=? terminal=cron res=success'
[21868.884615] BUG: unable to handle page fault for address: 0000000000003fb6
[21868.884621] #PF: supervisor read access in kernel mode
[21868.884623] #PF: error_code(0x0000) - not-present page
[21868.884624] PGD 0 P4D 0 
[21868.884626] Oops: 0000 [#1] PREEMPT SMP NOPTI
[21868.884628] CPU: 6 PID: 361227 Comm: mysql Kdump: loaded Tainted: G        W          6.1.2-arch1-1 #1 9a7c25cc2ea6a78b68b9b1a0fa5137b038c09177
[21868.884630] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X570M Pro4, BIOS P3.40 01/27/2021
[21868.884631] RIP: 0010:nf_nat_setup_info+0xa51/0xd50 [nf_nat]
[21868.884639] Code: 89 ce 41 38 47 46 74 28 4d 8b bf 90 00 00 00 4d 85 ff 0f 84 41 fe ff ff 49 81 ef 90 00 00 00 0f 84 34 fe ff ff 0f b6 44 24 56 <41> 38 47 46 75 d8 49 8b 47 20 49 8b 57 28 48 33 44 24 30 48 33 54
[21868.884640] RSP: 0018:ffffae730031c9c0 EFLAGS: 00010202
[21868.884642] RAX: 0000000000000006 RBX: ffff9321667cae00 RCX: ffff931cfe3ea400
[21868.884643] RDX: ffff931cfe400000 RSI: 32f55f8bf1cddc7a RDI: 21ae91afab954f8f
[21868.884644] RBP: ffffae730031ca98 R08: ffffae730031c998 R09: 0000000000000000
[21868.884645] R10: 033ab3de720943cd R11: 7b984a1de378de39 R12: 0000000000000000
[21868.884646] R13: ffffae730031caa8 R14: ffff931cfe3ea400 R15: 0000000000003f70
[21868.884647] FS:  00007f62605b2740(0000) GS:ffff9323deb80000(0000) knlGS:0000000000000000
[21868.884649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[21868.884650] CR2: 0000000000003fb6 CR3: 00000005b0034000 CR4: 0000000000350ee0
[21868.884651] Call Trace:
[21868.884653]  <IRQ>
[21868.884657]  ? br_dev_queue_push_xmit+0x1a0/0x1a0 [bridge ed62224147a25bcec548ea6cfb34add24d2ecd34]
[21868.884669]  ? br_nf_forward_finish+0xe3/0x1d0 [br_netfilter 06eade1b4431e74f7fac81b43baa80fc947878b6]
[21868.884673]  ? br_dev_queue_push_xmit+0x1a0/0x1a0 [bridge ed62224147a25bcec548ea6cfb34add24d2ecd34]
[21868.884684]  xt_snat_target_v0+0xaa/0xd0 [xt_nat 70361c70933a05fcd09efa27cf563192990782dd]
[21868.884689]  ipt_do_table+0x332/0x740 [ip_tables d8da8a912abee2e7198f0551d96ddeb0d1633ea9]
[21868.884693]  ? ipt_do_table+0x37d/0x740 [ip_tables d8da8a912abee2e7198f0551d96ddeb0d1633ea9]
[21868.884698]  nf_nat_inet_fn+0x165/0x320 [nf_nat ebc1f37e5777aa23a33dba69e59c8b85fce38ae3]
[21868.884702]  nf_nat_ipv4_out+0x4f/0x100 [nf_nat ebc1f37e5777aa23a33dba69e59c8b85fce38ae3]
[21868.884707]  nf_hook_slow+0x45/0xc0
[21868.884711]  ip_output+0xe9/0x130
[21868.884713]  ? __ip_finish_output+0x190/0x190
[21868.884716]  ip_vs_nat_send_or_cont+0x279/0x2c0 [ip_vs d7f2d568308c0b90200151aa90524351194e913c]
[21868.884723]  ? unregister_ip_vs_scheduler+0xb0/0xb0 [ip_vs d7f2d568308c0b90200151aa90524351194e913c]
[21868.884729]  ip_vs_in_hook+0x33b/0x9a0 [ip_vs d7f2d568308c0b90200151aa90524351194e913c]
[21868.884737]  nf_hook_slow+0x45/0xc0
[21868.884739]  ip_local_deliver+0xd2/0x120
[21868.884741]  ? ip_protocol_deliver_rcu+0x210/0x210
[21868.884743]  __netif_receive_skb_one_core+0x89/0xa0
[21868.884746]  process_backlog+0x85/0x120
[21868.884748]  __napi_poll+0x2b/0x160
[21868.884750]  net_rx_action+0x2a2/0x360
[21868.884752]  ? enqueue_task_fair+0x8b/0x440
[21868.884755]  __do_softirq+0xd4/0x2c9
[21868.884759]  do_softirq.part.0+0x5f/0x80
[21868.884762]  </IRQ>
[21868.884763]  <TASK>
[21868.884763]  __local_bh_enable_ip+0x68/0x70
[21868.884765]  ip_finish_output2+0x17a/0x590
[21868.884767]  __ip_queue_xmit+0x175/0x420
[21868.884769]  __tcp_transmit_skb+0x9f6/0xbd0
[21868.884772]  ? tcp_stream_alloc_skb+0x2c/0x130
[21868.884775]  tcp_connect+0xb1e/0xe30
[21868.884777]  tcp_v4_connect+0x413/0x520
[21868.884780]  __inet_stream_connect+0xd3/0x390
[21868.884783]  ? __alloc_file+0x82/0xd0
[21868.884786]  ? alloc_empty_file+0x63/0xc0
[21868.884788]  inet_stream_connect+0x3a/0x60
[21868.884789]  __sys_connect+0xa8/0xd0
[21868.884794]  __x64_sys_connect+0x18/0x20
[21868.884796]  do_syscall_64+0x5f/0x90
[21868.884798]  ? do_syscall_64+0x6b/0x90
[21868.884799]  ? do_user_addr_fault+0x1e0/0x6a0
[21868.884802]  ? exc_page_fault+0x74/0x170
[21868.884804]  entry_SYSCALL_64_after_hwframe+0x63/0xcd
[21868.884807] RIP: 0033:0x7f62606c2223
[21868.884827] Code: 8b 15 a9 9d 00 00 f7 d8 64 89 02 b8 ff ff ff ff eb bc 0f 1f 44 00 00 64 8b 04 25 18 00 00 00 85 c0 75 14 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 55 c3 0f 1f 40 00 48 83 ec 18 89 54 24 0c 48
[21868.884829] RSP: 002b:00007fffbbe6a178 EFLAGS: 00000246 ORIG_RAX: 000000000000002a
[21868.884830] RAX: ffffffffffffffda RBX: 000055c8969656f0 RCX: 00007f62606c2223
[21868.884831] RDX: 0000000000000010 RSI: 000055c896963a20 RDI: 0000000000000004
[21868.884832] RBP: 00007fffbbe6a1d0 R08: 000055c8969639f0 R09: 0000000000000000
[21868.884833] R10: 0000000000000000 R11: 0000000000000246 R12: 000055c896963a20
[21868.884834] R13: 0000000000000010 R14: 000055c896965ec0 R15: 0000000000000001
[21868.884836]  </TASK>
[21868.884837] Modules linked in: tcp_diag udp_diag inet_diag nft_compat ip_vs_rr xt_ipvs ip_vs vxlan ip6_udp_tunnel udp_tunnel xt_policy iptable_mangle xt_mark xt_u32 veth rfcomm bluetooth ecdh_generic xt_nat xt_tcpudp xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat br_netfilter rpcrdma rdma_cm iw_cm ib_cm tun ib_core macvlan bridge stp llc cfg80211 rfkill nft_masq nft_chain_nat nf_nat nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nf_log_syslog nft_log nft_objref nf_conntrack_tftp nft_ct nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink nct6775 nct6775_core hwmon_vid xfs nls_iso8859_1 vfat fat snd_hda_codec_realtek intel_rapl_msr ext4 snd_hda_codec_generic intel_rapl_common crc16 mbcache ledtrig_audio snd_hda_codec_hdmi jbd2 mxl5xx snd_hda_intel snd_intel_dspcfg edac_mce_amd snd_intel_sdw_acpi snd_hda_codec snd_hda_core ddbridge kvm_amd snd_hwdep snd_pcm dvb_core kvm snd_timer videobuf2_vmalloc irqbypass snd videobuf2_memops
[21868.884876]  videobuf2_common sp5100_tco soundcore rapl wmi_bmof videodev pcspkr i2c_piix4 cdc_acm k10temp mousedev corsair_cpro mc acpi_cpufreq mac_hid dm_multipath sg crypto_user nfsd auth_rpcgss nfs_acl fuse lockd grace sunrpc ip_tables x_tables dm_crypt cbc encrypted_keys trusted asn1_encoder tee btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 usbhid dm_mod nvme aesni_intel crypto_simd igb sr_mod nvme_core ccp amdgpu cryptd xhci_pci cdrom dca xhci_pci_renesas nvme_common drm_ttm_helper ttm video wmi gpu_sched drm_buddy drm_display_helper cec
[21868.884908] CR2: 0000000000003fb6

Last edited by kubax (2023-01-03 16:44:40)

Offline

#2 2023-01-04 13:19:41

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: How to analyze kernel crashdump?

I can not be of hep with analyzing the crashdump,  but have you considered bisecting between 6.0 and 6.1 to find the causal commit?
I built some kernels to help bisecting a different issue you could reuse in https://bugs.archlinux.org/task/76922 and if you need more kernels building let me know when the results diverge.

Offline

#3 2023-01-04 15:07:00

kubax
Member
Registered: 2014-10-22
Posts: 19

Re: How to analyze kernel crashdump?

Thanks for your reply.

for now i just switched to linux-lts, because the system in question is not only my local NAS and docker server, but also the Media Server with Kodi and TvHeadend in my livingroom.

It's just realy unpleasant that i can't dig deeper into the issue with the measured data.

Last edited by kubax (2023-01-04 15:07:17)

Offline

#4 2023-01-04 15:17:49

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: How to analyze kernel crashdump?

Assuming you posted the first OOPS then the RIP is for nf_nat_setup_info which is in net/netfilter/nf_nat_core.c

perl scripts/get_maintainer.pl net/netfilter/nf_nat_core.c
Pablo Neira Ayuso <pablo@netfilter.org> (maintainer:NETFILTER)
Jozsef Kadlecsik <kadlec@netfilter.org> (maintainer:NETFILTER)
Florian Westphal <fw@strlen.de> (maintainer:NETFILTER)
"David S. Miller" <davem@davemloft.net> (maintainer:NETWORKING [GENERAL])
Eric Dumazet <edumazet@google.com> (maintainer:NETWORKING [GENERAL])
Jakub Kicinski <kuba@kernel.org> (maintainer:NETWORKING [GENERAL])
Paolo Abeni <pabeni@redhat.com> (maintainer:NETWORKING [GENERAL])
netfilter-devel@vger.kernel.org (open list:NETFILTER)
coreteam@netfilter.org (open list:NETFILTER)
netdev@vger.kernel.org (open list:NETWORKING [GENERAL])
linux-kernel@vger.kernel.org (open list)
bpf@vger.kernel.org (open list:BPF [MISC])

You could also report it on the kernel bugzilla Product Networking Component Netfilter.
Edit:
If you could perform the bisection that usually speeds up the fix process significantly.

Last edited by loqs (2023-01-04 15:23:11)

Offline

#5 2023-01-04 17:52:24

kubax
Member
Registered: 2014-10-22
Posts: 19

Re: How to analyze kernel crashdump?

I allready figured out that it is because of nf_nat_setup_info, i was hoping to get a litle bit more insight, before reporting it upstream.

thats why i tried to use gdb crash programm to see if i could get anything more helpfull... bisecting the problem would be more than disturbing to my family, because i would have to wait for a crash everytime and i am not always around to input the decryption password for the system to bring it back up, sadly.

but i will try to make a bugreport upstream in the hope the dmesg gives them enough data to find the problem.

Offline

#6 2023-01-04 18:04:21

loqs
Member
Registered: 2014-03-06
Posts: 17,372

Re: How to analyze kernel crashdump?

Could you switch the kernel overnight then switch at back in the morning?

Offline

Board footer

Powered by FluxBB