You are not logged in.
Pages: 1
Good morning.
Looking for possible pointers to track down the cause of the kernel crashing ever since 6.15.1.
- There is no specific trigger (that I can determine). The machine stays up for hours, then crashes "out of the blue".
- Downgrading to 6.14.10 alleviates the problem.
- Ryzen 7 5700x3d, GPU RX7900xt running Wayland/Sway
The dumps all look similar, with amdgpu featuring heavily in the stack trace (see below).
Insights appreciated.
Thanks.
Panic Report
Arch: x86_64
Version: 6.15.3-arch1-1
[ 540.145464] wlan0: RX AssocResp from 44:4e:6d:df:34:2d (capab=0x1511 status=0 aid=1)
[ 540.151122] wlan0: associated
[ 540.164344] wlan0: Limiting TX power to 21 (24 - 3) dBm as advertised by 44:4e:6d:df:34:2d
[ 553.950174] iwlwifi 0000:06:00.0 wlan0: entered promiscuous mode
[ 561.806607] iwlwifi 0000:06:00.0 wlan0: left promiscuous mode
[ 567.333334] warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211
[ 821.020869] nvme nvme0: using unchecked data buffer
[ 4079.229725] ------------[ cut here ]------------
[ 4079.229729] kernel BUG at mm/vmalloc.c:3118!
[ 4079.229737] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 4079.229743] CPU: 14 UID: 0 PID: 1870 Comm: kworker/u64:12 Tainted: G OE 6.15.3-arch1-1 #1 PREEMPT(full) d8e4be090634982aecb41eb415d6a2689ce50bdb
[ 4079.229749] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 4079.229751] Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS ELITE AX V2/B550 AORUS ELITE AX V2, BIOS F19d 09/02/2024
[ 4079.229753] Workqueue: events_unbound commit_work
[ 4079.229761] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 4079.229767] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 68 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 4079.229770] RSP: 0018:ffffd4ec847d7650 EFLAGS: 00010202
[ 4079.229773] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 4079.229776] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 4079.229778] RBP: 000000000000000c R08: ffffd4ec80000000 R09: fffff4ec7fffffff
[ 4079.229780] R10: ffff8efdbf355280 R11: 0000000000000000 R12: 0000000000038b98
[ 4079.229782] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 4079.229784] FS: 0000000000000000(0000) GS:ffff8efe078af000(0000) knlGS:0000000000000000
[ 4079.229787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.229789] CR2: 00000000a808b000 CR3: 00000001ab053000 CR4: 0000000000f50ef0
[ 4079.229792] PKRU: 55555554
[ 4079.229794] Call Trace:
[ 4079.229796] <TASK>
[ 4079.229798] __vmalloc_node_range_noprof+0x13a/0x890
[ 4079.229806] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230120] ? __alloc_frozen_pages_noprof+0x334/0x350
[ 4079.230124] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230387] ? ___kmalloc_large_node+0x66/0x100
[ 4079.230393] __kvmalloc_node_noprof+0x2f2/0x640
[ 4079.230397] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230659] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230921] ? srso_alias_return_thunk+0x5/0xfbef5
[ 4079.230927] ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231228] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231481] dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231689] dc_state_create_phantom_plane+0x1a/0x60 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231882] dcn32_add_phantom_pipes+0x163/0x440 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232129] dcn32_internal_validate_bw+0xb8f/0x15e0 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232379] ? dcn32_validate_bandwidth+0xb3/0x320 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232611] dcn32_validate_bandwidth+0x10b/0x320 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232842] update_planes_and_stream_state+0x267/0x510 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233060] update_planes_and_stream_v2+0x22f/0x580 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233266] dc_update_planes_and_stream+0x56/0xd0 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233465] ? sort+0x34/0x60
[ 4079.233470] amdgpu_dm_atomic_commit_tail+0x1571/0x3860 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233710] commit_tail+0xa1/0x130
[ 4079.233715] process_one_work+0x193/0x350
[ 4079.233721] worker_thread+0x2d7/0x410
[ 4079.233724] ? __pfx_worker_thread+0x10/0x10
[ 4079.233727] kthread+0xfc/0x240
[ 4079.233731] ? __pfx_kthread+0x10/0x10
[ 4079.233733] ret_from_fork+0x34/0x50
[ 4079.233738] ? __pfx_kthread+0x10/0x10
[ 4079.233740] ret_from_fork_asm+0x1a/0x30
[ 4079.233747] </TASK>
[ 4079.233749] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq ccm iwlmvm mousedev mac80211 libarc4 ptp pps_core btusb btrtl iwlwifi btintel btbcm amdgpu btmtk cfg80211 bluetooth amd_atl intel_rapl_msr snd_hda_codec_realtek intel_rapl_common snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi amdxcp gpu_sched snd_hda_intel drm_panel_backlight_quirks snd_usb_audio drm_buddy snd_intel_dspcfg drm_exec snd_intel_sdw_acpi snd_usbmidi_lib drm_suballoc_helper snd_hda_codec drm_ttm_helper snd_ump kvm_amd snd_hda_core ttm snd_rawmidi gigabyte_wmi wmi_bmof i2c_algo_bit snd_hwdep snd_seq_device uvcvideo kvm drm_display_helper r8169 snd_pcm videobuf2_vmalloc uvc realtek cec snd_timer irqbypass videobuf2_memops sp5100_tco mdio_devres video rapl videobuf2_v4l2 snd i2c_piix4 pcspkr wacom soundcore libphy k10temp i2c_smbus videobuf2_common rfkill wmi gpio_amdpt joydev razermouse(OE) razerkbd(OE) gpio_generic mac_hid v4l2loopback(OE) videodev mc pkcs8_key_parser crypto_user loop nfnetlink ip_tables x_tables dm_crypt
[ 4079.233833] encrypted_keys trusted asn1_encoder tee dm_mod polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme aesni_intel crypto_simd nvme_core cryptd ccp nvme_keyring nvme_auth
[ 4079.233865] ---[ end trace 0000000000000000 ]---
[ 4079.233867] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 4079.233871] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 68 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 4079.233873] RSP: 0018:ffffd4ec847d7650 EFLAGS: 00010202
[ 4079.233876] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 4079.233878] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 4079.233880] RBP: 000000000000000c R08: ffffd4ec80000000 R09: fffff4ec7fffffff
[ 4079.233881] R10: ffff8efdbf355280 R11: 0000000000000000 R12: 0000000000038b98
[ 4079.233883] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 4079.233885] FS: 0000000000000000(0000) GS:ffff8efe078af000(0000) knlGS:0000000000000000
[ 4079.233887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.233888] CR2: 00000000a808b000 CR3: 00000001ab053000 CR4: 0000000000f50ef0
[ 4079.233890] PKRU: 55555554
[ 4079.233892] Kernel panic - not syncing: Fatal exception in interrupt
[ 4079.235729] Kernel Offset: 0x33600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Offline
Could you try if the same problem is also present on the latest mainline release?
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.16rc3-1-x86_64.pkg.tar.zst
In any case this issue will be hard to debug without a reproducer ...
Also why is your kernel tainted, which OOT module do you have loaded? Does the crash also occur without it?
Offline
razermouse(OE) razerkbd(OE) v4l2loopback(OE)
There is no specific trigger (that I can determine). The machine stays up for hours, then crashes "out of the blue".
Keep an eye on "cat /proc/meminfo" - do you run OOM/leak RAM?
Offline
Great suggestions - thank you.
Seeing as kernel 6.15.4 was just released along with a new linux-firmware package, I'm going to try that first (before the rc-kernel) - as well as removing the modules tainting the kernel (packages openrazer-driver-dkms and v4l2loopback-dkms from extra).
No OOMs, as far as I can tell (I would expect to see relevant log/journal entries).
Thanks again. Will report back.
Offline
A quick update:
On kernel 6.15.4 (not tainted) with linux-firmware 20250627-1 the machine remained stable all week until yesterday. I had another kernel panic, with identical-looking dump (same call stack). This time, however, I think I have the trigger: Steam is updating a game (ARK: Survival Ascended) in the background. At about the 32% completion mark, the kernel invariably panics (recreated three times). I'm yet to figure out if it's just this particular game or others too.
Offline
do you use a nvme as storage for steam?
if so: keep an eye on it's temps: game updates are quite resource intensive tasks
could be the nvme overheating may lead to the crash (just an idea)
Offline
The amdgpu module crashes in __get_vm_area_node which is memory allocation.
If you're somehow using a tmpfs as the download destination (overlayfs?) or amdgpu leaks GTT/GART (shows up frequently) you're not gonna see any OOM killer when/before this happens.
Keep an eye on /proc/meminfo
Offline
I am seeing the same kernel panic with an AMD Ryzen 7 7800X3D and AMD Radeon RX 7700 XT. I've only had it occur after a several hour uptime(6+) and playing Guild Wars 2 in a specific area(SMC in WvW). I have watched both GPU VRAM and GTT and memory with amdgpu_top and htop and have not noticed either hit their limits during this time.
Specifically I am using Bottles with ge-proton10-9, dxvk-2.7 and vkd3d-proton-2.14.1 when this occurs.
Last edited by fly (2025-07-14 00:30:07)
Offline
Pages: 1