You are not logged in.

#1 2025-06-28 00:56:07

suspiciouslyspirited
Member
Registered: 2025-06-28
Posts: 4

Linux 6.15.* kernel crash

Good morning.

Looking for possible pointers to track down the cause of the kernel crashing ever since 6.15.1.
- There is no specific trigger (that I can determine). The machine stays up for hours, then crashes "out of the blue".
- Downgrading to 6.14.10 alleviates the problem.
- Ryzen 7 5700x3d, GPU RX7900xt running Wayland/Sway

The dumps all look similar, with amdgpu featuring heavily in the stack trace (see below).

Insights appreciated.
Thanks.

Panic Report
Arch: x86_64
Version: 6.15.3-arch1-1

[  540.145464] wlan0: RX AssocResp from 44:4e:6d:df:34:2d (capab=0x1511 status=0 aid=1)
[  540.151122] wlan0: associated
[  540.164344] wlan0: Limiting TX power to 21 (24 - 3) dBm as advertised by 44:4e:6d:df:34:2d
[  553.950174] iwlwifi 0000:06:00.0 wlan0: entered promiscuous mode
[  561.806607] iwlwifi 0000:06:00.0 wlan0: left promiscuous mode
[  567.333334] warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211
[  821.020869] nvme nvme0: using unchecked data buffer
[ 4079.229725] ------------[ cut here ]------------
[ 4079.229729] kernel BUG at mm/vmalloc.c:3118!
[ 4079.229737] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 4079.229743] CPU: 14 UID: 0 PID: 1870 Comm: kworker/u64:12 Tainted: G           OE       6.15.3-arch1-1 #1 PREEMPT(full)  d8e4be090634982aecb41eb415d6a2689ce50bdb
[ 4079.229749] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 4079.229751] Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS ELITE AX V2/B550 AORUS ELITE AX V2, BIOS F19d 09/02/2024
[ 4079.229753] Workqueue: events_unbound commit_work
[ 4079.229761] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 4079.229767] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 68 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 4079.229770] RSP: 0018:ffffd4ec847d7650 EFLAGS: 00010202
[ 4079.229773] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 4079.229776] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 4079.229778] RBP: 000000000000000c R08: ffffd4ec80000000 R09: fffff4ec7fffffff
[ 4079.229780] R10: ffff8efdbf355280 R11: 0000000000000000 R12: 0000000000038b98
[ 4079.229782] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 4079.229784] FS:  0000000000000000(0000) GS:ffff8efe078af000(0000) knlGS:0000000000000000
[ 4079.229787] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.229789] CR2: 00000000a808b000 CR3: 00000001ab053000 CR4: 0000000000f50ef0
[ 4079.229792] PKRU: 55555554
[ 4079.229794] Call Trace:
[ 4079.229796]  <TASK>
[ 4079.229798]  __vmalloc_node_range_noprof+0x13a/0x890
[ 4079.229806]  ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230120]  ? __alloc_frozen_pages_noprof+0x334/0x350
[ 4079.230124]  ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230387]  ? ___kmalloc_large_node+0x66/0x100
[ 4079.230393]  __kvmalloc_node_noprof+0x2f2/0x640
[ 4079.230397]  ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230659]  ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230921]  ? srso_alias_return_thunk+0x5/0xfbef5
[ 4079.230927]  ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231228]  ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231481]  dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231689]  dc_state_create_phantom_plane+0x1a/0x60 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231882]  dcn32_add_phantom_pipes+0x163/0x440 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232129]  dcn32_internal_validate_bw+0xb8f/0x15e0 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232379]  ? dcn32_validate_bandwidth+0xb3/0x320 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232611]  dcn32_validate_bandwidth+0x10b/0x320 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232842]  update_planes_and_stream_state+0x267/0x510 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233060]  update_planes_and_stream_v2+0x22f/0x580 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233266]  dc_update_planes_and_stream+0x56/0xd0 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233465]  ? sort+0x34/0x60
[ 4079.233470]  amdgpu_dm_atomic_commit_tail+0x1571/0x3860 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233710]  commit_tail+0xa1/0x130
[ 4079.233715]  process_one_work+0x193/0x350
[ 4079.233721]  worker_thread+0x2d7/0x410
[ 4079.233724]  ? __pfx_worker_thread+0x10/0x10
[ 4079.233727]  kthread+0xfc/0x240
[ 4079.233731]  ? __pfx_kthread+0x10/0x10
[ 4079.233733]  ret_from_fork+0x34/0x50
[ 4079.233738]  ? __pfx_kthread+0x10/0x10
[ 4079.233740]  ret_from_fork_asm+0x1a/0x30
[ 4079.233747]  </TASK>
[ 4079.233749] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq ccm iwlmvm mousedev mac80211 libarc4 ptp pps_core btusb btrtl iwlwifi btintel btbcm amdgpu btmtk cfg80211 bluetooth amd_atl intel_rapl_msr snd_hda_codec_realtek intel_rapl_common snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi amdxcp gpu_sched snd_hda_intel drm_panel_backlight_quirks snd_usb_audio drm_buddy snd_intel_dspcfg drm_exec snd_intel_sdw_acpi snd_usbmidi_lib drm_suballoc_helper snd_hda_codec drm_ttm_helper snd_ump kvm_amd snd_hda_core ttm snd_rawmidi gigabyte_wmi wmi_bmof i2c_algo_bit snd_hwdep snd_seq_device uvcvideo kvm drm_display_helper r8169 snd_pcm videobuf2_vmalloc uvc realtek cec snd_timer irqbypass videobuf2_memops sp5100_tco mdio_devres video rapl videobuf2_v4l2 snd i2c_piix4 pcspkr wacom soundcore libphy k10temp i2c_smbus videobuf2_common rfkill wmi gpio_amdpt joydev razermouse(OE) razerkbd(OE) gpio_generic mac_hid v4l2loopback(OE) videodev mc pkcs8_key_parser crypto_user loop nfnetlink ip_tables x_tables dm_crypt
[ 4079.233833]  encrypted_keys trusted asn1_encoder tee dm_mod polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme aesni_intel crypto_simd nvme_core cryptd ccp nvme_keyring nvme_auth
[ 4079.233865] ---[ end trace 0000000000000000 ]---
[ 4079.233867] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 4079.233871] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 68 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 4079.233873] RSP: 0018:ffffd4ec847d7650 EFLAGS: 00010202
[ 4079.233876] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 4079.233878] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 4079.233880] RBP: 000000000000000c R08: ffffd4ec80000000 R09: fffff4ec7fffffff
[ 4079.233881] R10: ffff8efdbf355280 R11: 0000000000000000 R12: 0000000000038b98
[ 4079.233883] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 4079.233885] FS:  0000000000000000(0000) GS:ffff8efe078af000(0000) knlGS:0000000000000000
[ 4079.233887] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.233888] CR2: 00000000a808b000 CR3: 00000001ab053000 CR4: 0000000000f50ef0
[ 4079.233890] PKRU: 55555554
[ 4079.233892] Kernel panic - not syncing: Fatal exception in interrupt
[ 4079.235729] Kernel Offset: 0x33600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Offline

#2 2025-06-28 06:46:32

gromit
Package Maintainer (PM)
From: Germany
Registered: 2024-02-10
Posts: 1,173
Website

Re: Linux 6.15.* kernel crash

Could you try if the same problem is also present on the latest mainline release?

sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.16rc3-1-x86_64.pkg.tar.zst

In any case this issue will be hard to debug without a reproducer ...

Also why is your kernel tainted, which OOT module do you have loaded? Does the crash also occur without it?

Offline

#3 2025-06-28 06:59:56

seth
Member
Registered: 2012-09-03
Posts: 65,807

Re: Linux 6.15.* kernel crash

razermouse(OE) razerkbd(OE) v4l2loopback(OE)

There is no specific trigger (that I can determine). The machine stays up for hours, then crashes "out of the blue".

Keep an eye on "cat /proc/meminfo" - do you run OOM/leak RAM?

Online

#4 2025-06-28 13:37:37

suspiciouslyspirited
Member
Registered: 2025-06-28
Posts: 4

Re: Linux 6.15.* kernel crash

Great suggestions - thank you.

Seeing as kernel 6.15.4 was just released along with a new linux-firmware package, I'm going to try that first (before the rc-kernel) - as well as removing the modules tainting the kernel (packages openrazer-driver-dkms and v4l2loopback-dkms from extra).

No OOMs, as far as I can tell (I would expect to see relevant log/journal entries).

Thanks again. Will report back.

Offline

#5 2025-07-04 11:52:41

suspiciouslyspirited
Member
Registered: 2025-06-28
Posts: 4

Re: Linux 6.15.* kernel crash

A quick update:
On kernel 6.15.4 (not tainted) with linux-firmware 20250627-1 the machine remained stable all week until yesterday. I had another kernel panic, with identical-looking dump (same call stack). This time, however, I think I have the trigger: Steam is updating a game (ARK: Survival Ascended) in the background. At about the 32% completion mark, the kernel invariably panics (recreated three times). I'm yet to figure out if it's just this particular game or others too.

Offline

#6 2025-07-04 12:17:25

cryptearth
Member
Registered: 2024-02-03
Posts: 1,548

Re: Linux 6.15.* kernel crash

do you use a nvme as storage for steam?
if so: keep an eye on it's temps: game updates are quite resource intensive tasks
could be the nvme overheating may lead to the crash (just an idea)

Offline

#7 2025-07-04 18:53:26

seth
Member
Registered: 2012-09-03
Posts: 65,807

Re: Linux 6.15.* kernel crash

The amdgpu module crashes in __get_vm_area_node which is memory allocation.

If you're somehow using a tmpfs as the download destination (overlayfs?) or amdgpu leaks GTT/GART (shows up frequently) you're not gonna see any OOM killer when/before this happens.
Keep an eye on /proc/meminfo

Online

#8 2025-07-14 00:27:55

fly
Member
Registered: 2025-07-14
Posts: 1

Re: Linux 6.15.* kernel crash

I am seeing the same kernel panic with an AMD Ryzen 7 7800X3D and AMD Radeon RX 7700 XT. I've only had it occur after a several hour uptime(6+) and playing Guild Wars 2 in a specific area(SMC in WvW). I have watched both GPU VRAM and GTT and memory with amdgpu_top and htop and have not noticed either hit their limits during this time.

Specifically I am using Bottles with ge-proton10-9, dxvk-2.7 and vkd3d-proton-2.14.1 when this occurs.

Last edited by fly (2025-07-14 00:30:07)

Offline

Board footer

Powered by FluxBB