You are not logged in.

#1 2024-08-11 21:02:13

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Help diagnosing a kernel panic

Hello!
I've been trying to figure out why, after a while, my laptop running Arch just crashes with a kernel panic.
I've already got kdumpst using kdump, though I'm not making the full memory dump.

The following messages in dmesg are recovered after the panic:

[ 1964.139779] [  T13501] BUG: unable to handle page fault for address: ffff94c0de4c0800
[ 1964.139790] [  T13501] #PF: supervisor read access in kernel mode
[ 1964.139794] [  T13501] #PF: error_code(0x0000) - not-present page
[ 1964.139797] [  T13501] PGD 261e01067 P4D 261e01067 PUD 0 
[ 1964.139804] [  T13501] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1964.139810] [  T13501] CPU: 5 PID: 13501 Comm: kworker/u33:2 Kdump: loaded Tainted: G           OE      6.10.3-arch1-2 #1 20bffa7dc84b9a89fd543afbd712f49dca71b693
[ 1964.139818] [  T13501] Hardware name: HP HP Laptop 17-cn0xxx/883C, BIOS F.20 03/03/2022
[ 1964.139821] [  T13501] Workqueue: i915_flip intel_atomic_commit_work [i915]
[ 1964.140165] [  T13501] RIP: 0010:intel_atomic_get_new_global_obj_state+0x39/0x50 [i915]
[ 1964.140514] [  T13501] Code: 00 85 d2 7e 33 48 8b 87 80 00 00 00 48 c1 e2 05 48 01 c2 eb 15 66 66 2e 0f 1f 84 00 00 00 00 00 90 48 83 c0 20 48 39 d0 74 0e <48> 39 30 75 f2 48 8b 40 18 c3 cc cc cc cc 31 c0 c3 cc cc cc cc 66
[ 1964.140519] [  T13501] RSP: 0018:ffffb7c0cdf33d68 EFLAGS: 00010282
[ 1964.140524] [  T13501] RAX: ffff94c0de4c0800 RBX: ffff9486150d6000 RCX: ffff948617cb5a00
[ 1964.140528] [  T13501] RDX: ffff94c0de4c0840 RSI: ffff94858633c8f0 RDI: ffff9486150d6000
[ 1964.140531] [  T13501] RBP: ffffb7c0cdf33e58 R08: 0000000000000001 R09: 0000000000000000
[ 1964.140535] [  T13501] R10: 0000000000000005 R11: 0000000000000000 R12: ffff94858633c000
[ 1964.140538] [  T13501] R13: 0000000000000000 R14: 0000000000000000 R15: ffff9486150d6000
[ 1964.140541] [  T13501] FS:  0000000000000000(0000) GS:ffff94882f880000(0000) knlGS:0000000000000000
[ 1964.140545] [  T13501] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1964.140549] [  T13501] CR2: ffff94c0de4c0800 CR3: 0000000261020001 CR4: 0000000000f70ef0
[ 1964.140553] [  T13501] PKRU: 55555554
[ 1964.140555] [  T13501] Call Trace:
[ 1964.140561] [  T13501]  <TASK>
[ 1964.140566] [  T13501]  ? __die_body.cold+0x19/0x27
[ 1964.140575] [  T13501]  ? page_fault_oops+0x15a/0x2d0
[ 1964.140583] [  T13501]  ? search_bpf_extables+0x5f/0x80
[ 1964.140591] [  T13501]  ? exc_page_fault+0x18a/0x190
[ 1964.140599] [  T13501]  ? asm_exc_page_fault+0x26/0x30
[ 1964.140610] [  T13501]  ? intel_atomic_get_new_global_obj_state+0x39/0x50 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1964.140890] [  T13501]  intel_dbuf_post_plane_update+0x21/0x70 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1964.141148] [  T13501]  intel_atomic_commit_tail+0x8f5/0x11e0 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1964.141458] [  T13501]  process_one_work+0x17b/0x330
[ 1964.141464] [  T13501]  worker_thread+0x2e2/0x410
[ 1964.141470] [  T13501]  ? __pfx_worker_thread+0x10/0x10
[ 1964.141474] [  T13501]  kthread+0xcf/0x100
[ 1964.141481] [  T13501]  ? __pfx_kthread+0x10/0x10
[ 1964.141487] [  T13501]  ret_from_fork+0x31/0x50
[ 1964.141495] [  T13501]  ? __pfx_kthread+0x10/0x10
[ 1964.141500] [  T13501]  ret_from_fork_asm+0x1a/0x30
[ 1964.141507] [  T13501]  </TASK>
[ 1964.141510] [  T13501] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo ip6table_nat iptable_nat br_netfilter nft_masq nft_ct nft_reject_ipv4 nft_reject nft_chain_nat nf_nat nf_tables bridge stp llc overlay cmac algif_hash algif_skcipher af_alg bnep snd_ctl_led snd_soc_skl_hda_dsp snd_soc_hdac_hdmi vfat snd_soc_intel_hda_dsp_common snd_sof_probes fat snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci cdc_mbim cdc_wdm snd_sof_xtensa_dsp snd_sof snd_sof_utils intel_uncore_frequency snd_soc_hdac_hda joydev intel_uncore_frequency_common snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec rtw88_8822ce snd_hda_ext_core rtw88_8822c snd_soc_core rtw88_pci snd_compress ac97_bus
[ 1964.141600] [  T13501]  rtw88_core snd_pcm_dmaengine x86_pkg_temp_thermal intel_powerclamp coretemp snd_hda_intel snd_usb_audio mac80211 uvcvideo snd_intel_dspcfg kvm_intel btusb snd_intel_sdw_acpi videobuf2_vmalloc processor_thermal_device_pci_legacy snd_usbmidi_lib processor_thermal_device uvc snd_hda_codec btrtl snd_ump processor_thermal_wt_hint videobuf2_memops libarc4 snd_rawmidi videobuf2_v4l2 btintel snd_hda_core processor_thermal_rfim ip6t_REJECT iTCO_wdt kvm hid_multitouch cdc_ncm hp_wmi intel_rapl_msr intel_pmc_bxt snd_seq_device nf_reject_ipv6 snd_hwdep processor_thermal_rapl btbcm videodev cfg80211 mei_hdcp mei_pxp snd_pcm cdc_ether platform_profile ee1004 iTCO_vendor_support btmtk videobuf2_common snd_timer xt_hl intel_rapl_common usbnet sparse_keymap rapl bluetooth snd mc mousedev mii intel_cstate soundcore intel_uncore spi_nor mei_me ip6t_rt rfkill i2c_i801 pcspkr processor_thermal_wt_req mei processor_thermal_power_floor mtd intel_lpss_pci i2c_smbus i2c_hid_acpi intel_lpss igen6_edac intel_pmc_core wmi_bmof
[ 1964.141707] [  T13501]  i2c_mux processor_thermal_mbox i2c_hid idma64 ipt_REJECT intel_soc_dts_iosf nf_reject_ipv4 intel_vsec int3403_thermal pmt_telemetry int3400_thermal int340x_thermal_zone pinctrl_tigerlake pmt_class acpi_thermal_rel acpi_pad xt_LOG wireless_hotkey nf_log_syslog mac_hid xt_comment xt_multiport xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) nbd crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod xe drm_ttm_helper nvme gpu_sched drm_suballoc_helper nvme_core drm_gpuvm hid_generic nvme_auth drm_exec usbhid i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 i2c_algo_bit serio_raw sha1_ssse3 drm_buddy atkbd ttm aesni_intel libps2 vivaldi_fmap intel_gtt crypto_simd drm_display_helper cryptd spi_intel_pci
[ 1964.141819] [  T13501]  xhci_pci spi_intel vmd cec xhci_pci_renesas video i8042 serio wmi
[ 1964.141836] [  T13501] CR2: ffff94c0de4c0800

This is my first time dealing with a recurring kernel panic, so I came here for some advice.
Any idea how I could narrow down whether it's the kernel itself, or a specific kernel module, and how to find it?

Offline

#2 2024-08-12 07:28:28

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,524
Website

Re: Help diagnosing a kernel panic

Does this kernel panic also happen with an untainted kernel (i.e. with the virtualbox modules removed)?

Also does it occur on the latest mainline release candidate aswell?

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.11rc3-1-x86_64.pkg.tar.zst

Can you solve the problem by downgrading? Which is the first version that does not show the problem?

Last edited by gromit (2024-08-12 07:29:05)

Offline

#3 2024-08-12 17:29:49

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

Will come back to you on that. If the issue is resolved after these steps I'll post too.

Offline

#4 2024-08-13 19:57:43

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

I have done both of those, and it took a while, but I did get another kernel crash

[ 1863.792050] [   T1603] BUG: unable to handle page fault for address: 000000ae00000034
[ 1863.792061] [   T1603] #PF: supervisor read access in kernel mode
[ 1863.792065] [   T1603] #PF: error_code(0x0000) - not-present page
[ 1863.792069] [   T1603] PGD 0 P4D 0 
[ 1863.792074] [   T1603] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1863.792080] [   T1603] CPU: 4 PID: 1603 Comm: sway Kdump: loaded Tainted: G        W          6.10.3-arch1-2 #1 20bffa7dc84b9a89fd543afbd712f49dca71b693
[ 1863.792087] [   T1603] Hardware name: HP HP Laptop 17-cn0xxx/883C, BIOS F.20 03/03/2022
[ 1863.792090] [   T1603] RIP: 0010:__ww_mutex_lock.constprop.0+0x5d0/0x8d0
[ 1863.792102] [   T1603] Code: 0e 48 8b 78 08 31 d2 4c 89 f6 e8 ab 2f 2e ff 65 ff 0d dc 5e 9d 5c 0f 85 7d fb ff ff e8 b9 72 1a ff e9 73 fb ff ff 48 83 e0 f8 <8b> 50 34 85 d2 0f 84 a4 fb ff ff 8b 78 14 48 31 c0 0f 1f 00 84 c0
[ 1863.792106] [   T1603] RSP: 0018:ffffbfd10299b770 EFLAGS: 00010206
[ 1863.792111] [   T1603] RAX: 000000ae00000000 RBX: ffffbfd10299b9d0 RCX: 0000000000004000
[ 1863.792115] [   T1603] RDX: 000000ae00000000 RSI: 0000000000000001 RDI: ffff9ef19bcd1c00
[ 1863.792118] [   T1603] RBP: ffffbfd10299b7d8 R08: ffff9ef108650000 R09: 0000000000000005
[ 1863.792121] [   T1603] R10: 0000000000000014 R11: 0000000000000001 R12: ffff9ef1084c0000
[ 1863.792124] [   T1603] R13: 0000000000000012 R14: ffff9ef19bcd1c00 R15: ffff9ef19bcd1b00
[ 1863.792127] [   T1603] FS:  00007d7a39036e40(0000) GS:ffff9ef3af800000(0000) knlGS:0000000000000000
[ 1863.792131] [   T1603] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1863.792134] [   T1603] CR2: 000000ae00000034 CR3: 00000001052c2004 CR4: 0000000000f70ef0
[ 1863.792137] [   T1603] PKRU: 55555554
[ 1863.792140] [   T1603] Call Trace:
[ 1863.792143] [   T1603]  <TASK>
[ 1863.792149] [   T1603]  ? __die_body.cold+0x19/0x27
[ 1863.792156] [   T1603]  ? page_fault_oops+0x15a/0x2d0
[ 1863.792163] [   T1603]  ? lock_timer_base+0x76/0xa0
[ 1863.792170] [   T1603]  ? exc_page_fault+0x81/0x190
[ 1863.792180] [   T1603]  ? asm_exc_page_fault+0x26/0x30
[ 1863.792191] [   T1603]  ? __ww_mutex_lock.constprop.0+0x5d0/0x8d0
[ 1863.792201] [   T1603]  eb_validate_vmas+0xd0/0xa00 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1863.792476] [   T1603]  i915_gem_do_execbuffer+0x102e/0x2b30 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1863.792725] [   T1603]  ? unix_stream_recvmsg+0x8c/0xa0
[ 1863.792730] [   T1603]  ? __pfx_unix_stream_read_actor+0x10/0x10
[ 1863.792746] [   T1603]  i915_gem_execbuffer2_ioctl+0x139/0x250 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1863.792986] [   T1603]  ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1863.793223] [   T1603]  drm_ioctl_kernel+0xb0/0x100
[ 1863.793232] [   T1603]  drm_ioctl+0x27a/0x4f0
[ 1863.793238] [   T1603]  ? __pfx_i915_gem_execbuffer2_ioctl+0x10/0x10 [i915 d3c8702db504b9a74903ed081edbd7810c6f0f07]
[ 1863.793481] [   T1603]  __x64_sys_ioctl+0x94/0xd0
[ 1863.793489] [   T1603]  do_syscall_64+0x82/0x190
[ 1863.793497] [   T1603]  ? syscall_exit_to_user_mode+0x72/0x200
[ 1863.793505] [   T1603]  ? do_syscall_64+0x8e/0x190
[ 1863.793510] [   T1603]  ? syscall_exit_to_user_mode+0x72/0x200
[ 1863.793517] [   T1603]  ? do_syscall_64+0x8e/0x190
[ 1863.793523] [   T1603]  ? syscall_exit_to_user_mode+0x72/0x200
[ 1863.793530] [   T1603]  ? do_syscall_64+0x8e/0x190
[ 1863.793536] [   T1603]  ? do_syscall_64+0x8e/0x190
[ 1863.793541] [   T1603]  ? do_syscall_64+0x8e/0x190
[ 1863.793546] [   T1603]  ? do_syscall_64+0x8e/0x190
[ 1863.793552] [   T1603]  ? clear_bhb_loop+0x25/0x80
[ 1863.793555] [   T1603]  ? clear_bhb_loop+0x25/0x80
[ 1863.793559] [   T1603]  ? clear_bhb_loop+0x25/0x80
[ 1863.793562] [   T1603]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 1863.793570] [   T1603] RIP: 0033:0x7d7a39c50ced
[ 1863.793627] [   T1603] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[ 1863.793631] [   T1603] RSP: 002b:00007ffe3523d540 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1863.793636] [   T1603] RAX: ffffffffffffffda RBX: 00005c3246b1f550 RCX: 00007d7a39c50ced
[ 1863.793640] [   T1603] RDX: 00007ffe3523d5b0 RSI: 0000000040406469 RDI: 000000000000000f
[ 1863.793643] [   T1603] RBP: 00007ffe3523d590 R08: 00005c3246a3d010 R09: 0000000000000007
[ 1863.793645] [   T1603] R10: 00005c3246ae4c90 R11: 0000000000000246 R12: 00005c3247f18180
[ 1863.793648] [   T1603] R13: 000000000000000f R14: 00007ffe3523d5b0 R15: 00005c3246ae552c
[ 1863.793654] [   T1603]  </TASK>
[ 1863.793656] [   T1603] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo ip6table_nat iptable_nat br_netfilter nft_masq nft_ct nft_reject_ipv4 nft_reject nft_chain_nat nf_nat nf_tables bridge stp llc overlay usbhid cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_ctl_led snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_soc_intel_hda_dsp_common snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic cdc_mbim cdc_wdm snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp intel_uncore_frequency intel_uncore_frequency_common snd_sof snd_sof_utils snd_soc_hdac_hda snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs joydev mousedev snd_soc_hda_codec snd_hda_ext_core snd_soc_core rtw88_8822ce rtw88_8822c
[ 1863.793750] [   T1603]  snd_compress rtw88_pci ac97_bus uvcvideo snd_pcm_dmaengine videobuf2_vmalloc rtw88_core snd_hda_intel uvc x86_pkg_temp_thermal intel_powerclamp snd_intel_dspcfg videobuf2_memops snd_intel_sdw_acpi btusb videobuf2_v4l2 coretemp mac80211 btrtl snd_hda_codec hid_multitouch btintel cdc_ncm videodev hid_generic snd_hda_core kvm_intel cdc_ether processor_thermal_device_pci_legacy btbcm videobuf2_common usbnet libarc4 processor_thermal_device snd_hwdep btmtk kvm mii mc cfg80211 bluetooth hp_wmi iTCO_wdt snd_pcm processor_thermal_wt_hint mei_pxp platform_profile intel_pmc_bxt processor_thermal_rfim iTCO_vendor_support mei_hdcp ee1004 sparse_keymap intel_rapl_msr snd_timer processor_thermal_rapl intel_rapl_common rapl snd mei_me intel_cstate processor_thermal_wt_req spi_nor i2c_i801 intel_lpss_pci intel_lpss soundcore intel_uncore rfkill processor_thermal_power_floor pcspkr mei mtd i2c_smbus idma64 wmi_bmof processor_thermal_mbox i2c_mux igen6_edac intel_soc_dts_iosf i2c_hid_acpi i2c_hid intel_pmc_core
[ 1863.793857] [   T1603]  int3403_thermal ip6t_REJECT int340x_thermal_zone intel_vsec nf_reject_ipv6 int3400_thermal pmt_telemetry acpi_thermal_rel pinctrl_tigerlake acpi_pad pmt_class xt_hl wireless_hotkey mac_hid ip6t_rt ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_comment xt_multiport xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter nbd crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod xe drm_ttm_helper nvme gpu_sched drm_suballoc_helper nvme_core drm_gpuvm drm_exec nvme_auth i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul serio_raw i2c_algo_bit ghash_clmulni_intel drm_buddy atkbd sha512_ssse3 libps2 ttm sha256_ssse3 vivaldi_fmap sha1_ssse3 intel_gtt aesni_intel spi_intel_pci drm_display_helper vmd crypto_simd spi_intel xhci_pci cryptd cec xhci_pci_renesas video i8042 serio wmi
[ 1863.793977] [   T1603] CR2: 000000ae00000034

I'm guessing more information is needed to figure this out, but I got the time and patience to provide that information and learn a thing or two. How to proceed here?

Last edited by AZMCode (2024-08-13 19:58:44)

Offline

#5 2024-08-13 20:29:05

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,524
Website

Re: Help diagnosing a kernel panic

Could you please remove the virtualbox stuff for now so we can test this on an untainted kernel?

Offline

#6 2024-08-14 02:05:04

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

I thought I did. I uninstalled the virtualbox kernel modules package before the second crash log was generated. I guess I haven't done so correctly then. How would I go about doing this?

Offline

#7 2024-08-14 02:59:44

ap_qld
Member
Registered: 2024-08-14
Posts: 3

Re: Help diagnosing a kernel panic

If you have Nvidia hardware, this thread maybe relevant: https://bbs.archlinux.org/viewtopic.php?id=293400

Offline

#8 2024-08-14 07:15:14

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,524
Website

Re: Help diagnosing a kernel panic

What did you remove? And whats the output of "dkms status"?

Offline

#9 2024-08-14 18:58:22

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

I uninstalled `virtualbox-host-modules-arch`, and `dkms status` returns command not found, and I currently don't have `dkms` installed either

With respect to the GPU hardware, here's my fastfetch output with irrelevant details stripped

                                           OS: Arch Linux x86_64
                                           Host: HP Laptop 17-cn0xxx
                                           Kernel: Linux 6.10.3-arch1-2
                                           Shell: bash 5.2.32
                                           Display (BOE0953): 1920x1080 @ 60 Hz in 17″ [Built-in]
                                           WM: Sway (Wayland)
                                           CPU: 11th Gen Intel(R) Core(TM) i5-1135G7 (8) @ 4.20 GHz
                                           GPU: Intel Iris Xe Graphics @ 1.30 GHz [Integrated]
                                           Memory: 1.94 GiB / 9.11 GiB (21%)
                                           Swap: 0 B / 16.00 GiB (0%)
                                           Disk (/): 291.61 GiB / 424.88 GiB (69%) - ext4
                                           Locale: en_GB.utf8

I'm supposed to have 12GBs of ram (one 4GB stick and one 8GB one) but I went a bit overboard with kdumpst due to its inability to open my encrypted root partition so I just went and gave it 2GB so it wouldn't complain. Doubt that's part of the issue though.

I run intel integrated graphics, so I doubt it's an NVIDIA issue

Last edited by AZMCode (2024-08-14 19:37:03)

Offline

#10 2024-08-14 20:15:25

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,176

Re: Help diagnosing a kernel panic

[ 1863.792080] [   T1603] CPU: 4 PID: 1603 Comm: sway Kdump: loaded Tainted: G        W          6.10.3-arch1-2 #1 20bffa7dc84b9a89fd543afbd712f49dca71b693

Do you also have a crash that's not tainted "G" ("W" is ok)?
Also

grep '(' /proc/modules

Online

#11 2024-08-14 23:54:14

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

This is the latest crash (they've kept happening regularly)

[  245.201147] [   T3863] BUG: unable to handle page fault for address: ffff8bf2a8cbe720
[  245.201158] [   T3863] #PF: supervisor read access in kernel mode
[  245.201163] [   T3863] #PF: error_code(0x0000) - not-present page
[  245.201166] [   T3863] PGD 340201067 P4D 340201067 PUD 0 
[  245.201174] [   T3863] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
[  245.201180] [   T3863] CPU: 7 PID: 3863 Comm: code Kdump: loaded Not tainted 6.10.3-arch1-2 #1 20bffa7dc84b9a89fd543afbd712f49dca71b693
[  245.201187] [   T3863] Hardware name: HP HP Laptop 17-cn0xxx/883C, BIOS F.20 03/03/2022
[  245.201190] [   T3863] RIP: 0010:kmem_cache_alloc_noprof+0xa7/0x2f0
[  245.201199] [   T3863] Code: 08 27 7d 48 8b 50 08 48 83 78 10 00 48 8b 38 0f 84 b6 01 00 00 48 85 ff 0f 84 ad 01 00 00 41 8b 44 24 28 49 8b 34 24 48 01 f8 <48> 8b 18 48 89 c1 49 33 9c 24 b8 00 00 00 48 89 f8 48 0f c9 48 31
[  245.201203] [   T3863] RSP: 0018:ffffa12a42eabc20 EFLAGS: 00010282
[  245.201208] [   T3863] RAX: ffff8bf2a8cbe720 RBX: ffff8b83c9981730 RCX: 0000000000000070
[  245.201211] [   T3863] RDX: 00000000018fdc07 RSI: 000000000003c750 RDI: ffff8bf2a8cbe700
[  245.201215] [   T3863] RBP: ffffa12a42eabc60 R08: ffffa12a42eabc20 R09: ffff8b83a8cbed80
[  245.201218] [   T3863] R10: 0000000000000000 R11: 00007b7d63a59fff R12: ffff8b8340225a00
[  245.201221] [   T3863] R13: 0000000000000cc0 R14: ffffffff82d88518 R15: 0000000000000040
[  245.201225] [   T3863] FS:  00007b7d63035e80(0000) GS:ffff8b85e3b80000(0000) knlGS:0000000000000000
[  245.201228] [   T3863] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  245.201232] [   T3863] CR2: ffff8bf2a8cbe720 CR3: 0000000162f46004 CR4: 0000000000f70ef0
[  245.201236] [   T3863] PKRU: 55555554
[  245.201238] [   T3863] Call Trace:
[  245.201241] [   T3863]  <TASK>
[  245.201243] [   T3863]  ? __die_body.cold+0x19/0x27
[  245.201251] [   T3863]  ? page_fault_oops+0x15a/0x2d0
[  245.201258] [   T3863]  ? search_bpf_extables+0x5f/0x80
[  245.201265] [   T3863]  ? exc_page_fault+0x18a/0x190
[  245.201273] [   T3863]  ? asm_exc_page_fault+0x26/0x30
[  245.201280] [   T3863]  ? anon_vma_fork+0x98/0x120
[  245.201288] [   T3863]  ? kmem_cache_alloc_noprof+0xa7/0x2f0
[  245.201294] [   T3863]  anon_vma_fork+0x98/0x120
[  245.201299] [   T3863]  copy_process+0x1857/0x25a0
[  245.201310] [   T3863]  kernel_clone+0xbd/0x420
[  245.201316] [   T3863]  ? __handle_mm_fault+0xac6/0x1050
[  245.201322] [   T3863]  __do_sys_clone+0x66/0x90
[  245.201327] [   T3863]  do_syscall_64+0x82/0x190
[  245.201335] [   T3863]  ? do_user_addr_fault+0x36c/0x620
[  245.201341] [   T3863]  ? clear_bhb_loop+0x25/0x80
[  245.201345] [   T3863]  ? clear_bhb_loop+0x25/0x80
[  245.201348] [   T3863]  ? clear_bhb_loop+0x25/0x80
[  245.201351] [   T3863]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  245.201358] [   T3863] RIP: 0033:0x7b7d64246b57
[  245.201413] [   T3863] Code: 00 00 00 f3 0f 1e fa 64 48 8b 04 25 10 00 00 00 45 31 c0 31 d2 31 f6 bf 11 00 20 01 4c 8d 90 d0 02 00 00 b8 38 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 39 89 c2 85 c0 75 2c 64 48 8b 04 25 10 00 00
[  245.201417] [   T3863] RSP: 002b:00007ffdf30cc338 EFLAGS: 00000246 ORIG_RAX: 0000000000000038
[  245.201422] [   T3863] RAX: ffffffffffffffda RBX: 0000000000000006 RCX: 00007b7d64246b57
[  245.201425] [   T3863] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000001200011
[  245.201428] [   T3863] RBP: 00007ffdf30cc440 R08: 0000000000000000 R09: 0000000000000000
[  245.201430] [   T3863] R10: 00007b7d63036150 R11: 0000000000000246 R12: 0000000000000001
[  245.201433] [   T3863] R13: 0000000000000000 R14: 00007ffdf30cc450 R15: 00007ffdf30cc340
[  245.201438] [   T3863]  </TASK>
[  245.201440] [   T3863] Modules linked in: ccm rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo ip6table_nat iptable_nat br_netfilter nft_masq nft_ct nft_reject_ipv4 nft_reject nft_chain_nat nf_nat nf_tables bridge stp llc overlay cmac algif_hash algif_skcipher af_alg bnep vfat fat snd_ctl_led snd_soc_skl_hda_dsp snd_soc_hdac_hdmi snd_soc_intel_hda_dsp_common snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_scodec_component snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_pci_intel_cnl snd_sof_intel_hda_generic soundwire_intel soundwire_cadence snd_sof_intel_hda_common snd_sof_intel_hda_mlink snd_sof_intel_hda snd_sof_pci intel_uncore_frequency snd_sof_xtensa_dsp intel_uncore_frequency_common snd_sof snd_sof_utils snd_soc_hdac_hda snd_soc_acpi_intel_match soundwire_generic_allocation snd_soc_acpi soundwire_bus snd_soc_avs snd_soc_hda_codec snd_hda_ext_core joydev mousedev rtw88_8822ce snd_soc_core rtw88_8822c snd_compress ac97_bus
[  245.201521] [   T3863]  rtw88_pci x86_pkg_temp_thermal snd_pcm_dmaengine intel_powerclamp coretemp rtw88_core snd_hda_intel kvm_intel uvcvideo snd_intel_dspcfg mac80211 btusb videobuf2_vmalloc snd_intel_sdw_acpi btrtl hid_multitouch kvm uvc processor_thermal_device_pci_legacy btintel snd_hda_codec videobuf2_memops iTCO_wdt btbcm processor_thermal_device videobuf2_v4l2 hid_generic btmtk snd_hda_core intel_pmc_bxt processor_thermal_wt_hint snd_hwdep libarc4 videodev ee1004 mei_hdcp mei_pxp iTCO_vendor_support intel_rapl_msr snd_pcm hp_wmi rapl processor_thermal_rfim cfg80211 bluetooth videobuf2_common spi_nor platform_profile processor_thermal_rapl i2c_i801 intel_rapl_common intel_cstate snd_timer sparse_keymap mc snd intel_lpss_pci processor_thermal_wt_req intel_uncore mtd wmi_bmof pcspkr rfkill i2c_smbus mei_me processor_thermal_power_floor soundcore intel_lpss i2c_mux mei processor_thermal_mbox ip6t_REJECT idma64 intel_soc_dts_iosf igen6_edac i2c_hid_acpi nf_reject_ipv6 i2c_hid intel_pmc_core int3403_thermal intel_vsec
[  245.201596] [   T3863]  int340x_thermal_zone xt_hl int3400_thermal pmt_telemetry ip6t_rt acpi_thermal_rel pinctrl_tigerlake pmt_class wireless_hotkey acpi_pad ipt_REJECT mac_hid nf_reject_ipv4 xt_LOG nf_log_syslog xt_comment xt_multiport xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter nbd crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod xe drm_ttm_helper nvme gpu_sched drm_suballoc_helper nvme_core drm_gpuvm nvme_auth drm_exec i915 crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 serio_raw sha256_ssse3 atkbd sha1_ssse3 i2c_algo_bit libps2 aesni_intel drm_buddy vivaldi_fmap ttm crypto_simd intel_gtt cryptd drm_display_helper spi_intel_pci xhci_pci spi_intel cec vmd xhci_pci_renesas video i8042 serio wmi

Says Not Tainted, if I checked correctly

Offline

#12 2024-08-14 23:58:04

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

I've noticed they're all page faults. Does this indicate my memory may be at issue here?
I did just get it replaced after a POST code told me my memory was bust, though I kept one of the two original sticks (crashes have happened with both original sticks, each tested separately with a new 8GB stick, which just confuses me)

I doubt both memory sticks died at the same time, so maybe the motherboard?

There's also the chance it really is a software bug, which is why I'm here.

Offline

#13 2024-08-15 07:24:29

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,176

Re: Help diagnosing a kernel panic

Does this (only) happen under RAM pressure?

I did just get it replaced after a POST code told me my memory was bust

Were the crashes before or after (or both) the replacement?
Did you replace them yourself or had them replaced by some pro or by some "bro who's really good with computers and such"?
(They might not be properly seated)

Can you downclock the RAM in the UEFI/BIOS settings (choose the most conservative timings, clocks, …)?
=> memtest86+, run it at least over night

Online

#14 2024-08-16 01:57:11

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

I got the RAM replaced by a professional, they're properly seated. The crashes happened before and after.
I have also run a memtest before and after the replacement and it all CHECKed out. I'll try running it overnight.

I have never down over/underclocking is it safe?

Offline

#15 2024-08-16 08:06:57

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,176

Re: Help diagnosing a kernel panic

It's not a real "underclock", you just select the most conservative settings and that's considered to be beyond safe - it's typically *required* for heterogenous DIMMs (different brand or even just batch)

Online

#16 2024-08-16 12:19:20

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

Have run the memtest overnight, and it's the memory
Got multiple failures

Should I try to underclock it, or remove the old memory stick first?

Offline

#17 2024-08-16 12:48:23

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,176

Re: Help diagnosing a kernel panic

That's really up to you, do NOT underVOLT the RAM, you just want to use the most conservative settings (and eg. certainly not anything like XMP) and if you've currently and old and a new (ie. heterogenous setup) DIMM, that's kinda mandatory anyway. Full speed can typically only be achieved for equal DIMMs, same vendor, same brand, usually same batch (ie. they came in the same box and the serial numbers are nnnn and nnnn+1)

Online

#18 2024-08-16 18:26:28

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

Will do. In any case, the tip of leaving the memcheck overnight gave me a way to conclusively test if what i'm doing is OK or not. I'll make sure to underclock the ram if it keeps happening, and test again. I appreciate the advice. I'll check back in eventually if I manage to fix my issues, or have any further problems. Thanks for your patience.

Offline

#19 2024-08-16 20:33:55

AZMCode
Member
Registered: 2024-08-11
Posts: 11

Re: Help diagnosing a kernel panic

Oh God it gets even worse. I have tried both old sticks, as well as the new stick by itself, and I'm still getting memtest FAILures.

I'm going to try updating the BIOS and if that dont work maybe getting a professional to try to suss out the issue. Maybe thermals or such.

Is this still the correct forum to ask for advice about this?

Offline

#20 2024-08-16 20:35:36

seth
Member
From: Won't reply 2 private help req
Registered: 2012-09-03
Posts: 75,176

Re: Help diagnosing a kernel panic

It's not the most specific platform and we're not going to be able to systematically help you but I'd test the DIMMs in some other board.

Online

Board footer

Powered by FluxBB