You are not logged in.
Pages: 1
Good morning.
Looking for possible pointers to track down the cause of the kernel crashing ever since 6.15.1.
- There is no specific trigger (that I can determine). The machine stays up for hours, then crashes "out of the blue".
- Downgrading to 6.14.10 alleviates the problem.
- Ryzen 7 5700x3d, GPU RX7900xt running Wayland/Sway
The dumps all look similar, with amdgpu featuring heavily in the stack trace (see below).
Insights appreciated.
Thanks.
Panic Report
Arch: x86_64
Version: 6.15.3-arch1-1
[ 540.145464] wlan0: RX AssocResp from 44:4e:6d:df:34:2d (capab=0x1511 status=0 aid=1)
[ 540.151122] wlan0: associated
[ 540.164344] wlan0: Limiting TX power to 21 (24 - 3) dBm as advertised by 44:4e:6d:df:34:2d
[ 553.950174] iwlwifi 0000:06:00.0 wlan0: entered promiscuous mode
[ 561.806607] iwlwifi 0000:06:00.0 wlan0: left promiscuous mode
[ 567.333334] warning: `ThreadPoolForeg' uses wireless extensions which will stop working for Wi-Fi 7 hardware; use nl80211
[ 821.020869] nvme nvme0: using unchecked data buffer
[ 4079.229725] ------------[ cut here ]------------
[ 4079.229729] kernel BUG at mm/vmalloc.c:3118!
[ 4079.229737] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 4079.229743] CPU: 14 UID: 0 PID: 1870 Comm: kworker/u64:12 Tainted: G OE 6.15.3-arch1-1 #1 PREEMPT(full) d8e4be090634982aecb41eb415d6a2689ce50bdb
[ 4079.229749] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 4079.229751] Hardware name: Gigabyte Technology Co., Ltd. B550 AORUS ELITE AX V2/B550 AORUS ELITE AX V2, BIOS F19d 09/02/2024
[ 4079.229753] Workqueue: events_unbound commit_work
[ 4079.229761] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 4079.229767] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 68 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 4079.229770] RSP: 0018:ffffd4ec847d7650 EFLAGS: 00010202
[ 4079.229773] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 4079.229776] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 4079.229778] RBP: 000000000000000c R08: ffffd4ec80000000 R09: fffff4ec7fffffff
[ 4079.229780] R10: ffff8efdbf355280 R11: 0000000000000000 R12: 0000000000038b98
[ 4079.229782] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 4079.229784] FS: 0000000000000000(0000) GS:ffff8efe078af000(0000) knlGS:0000000000000000
[ 4079.229787] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.229789] CR2: 00000000a808b000 CR3: 00000001ab053000 CR4: 0000000000f50ef0
[ 4079.229792] PKRU: 55555554
[ 4079.229794] Call Trace:
[ 4079.229796] <TASK>
[ 4079.229798] __vmalloc_node_range_noprof+0x13a/0x890
[ 4079.229806] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230120] ? __alloc_frozen_pages_noprof+0x334/0x350
[ 4079.230124] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230387] ? ___kmalloc_large_node+0x66/0x100
[ 4079.230393] __kvmalloc_node_noprof+0x2f2/0x640
[ 4079.230397] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230659] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.230921] ? srso_alias_return_thunk+0x5/0xfbef5
[ 4079.230927] ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231228] ? dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231481] dc_create_plane_state+0x23/0x80 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231689] dc_state_create_phantom_plane+0x1a/0x60 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.231882] dcn32_add_phantom_pipes+0x163/0x440 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232129] dcn32_internal_validate_bw+0xb8f/0x15e0 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232379] ? dcn32_validate_bandwidth+0xb3/0x320 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232611] dcn32_validate_bandwidth+0x10b/0x320 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.232842] update_planes_and_stream_state+0x267/0x510 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233060] update_planes_and_stream_v2+0x22f/0x580 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233266] dc_update_planes_and_stream+0x56/0xd0 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233465] ? sort+0x34/0x60
[ 4079.233470] amdgpu_dm_atomic_commit_tail+0x1571/0x3860 [amdgpu 22b7670854b1240a200e82d1470a7e7db1b276ef]
[ 4079.233710] commit_tail+0xa1/0x130
[ 4079.233715] process_one_work+0x193/0x350
[ 4079.233721] worker_thread+0x2d7/0x410
[ 4079.233724] ? __pfx_worker_thread+0x10/0x10
[ 4079.233727] kthread+0xfc/0x240
[ 4079.233731] ? __pfx_kthread+0x10/0x10
[ 4079.233733] ret_from_fork+0x34/0x50
[ 4079.233738] ? __pfx_kthread+0x10/0x10
[ 4079.233740] ret_from_fork_asm+0x1a/0x30
[ 4079.233747] </TASK>
[ 4079.233749] Modules linked in: snd_seq_dummy snd_hrtimer snd_seq ccm iwlmvm mousedev mac80211 libarc4 ptp pps_core btusb btrtl iwlwifi btintel btbcm amdgpu btmtk cfg80211 bluetooth amd_atl intel_rapl_msr snd_hda_codec_realtek intel_rapl_common snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi amdxcp gpu_sched snd_hda_intel drm_panel_backlight_quirks snd_usb_audio drm_buddy snd_intel_dspcfg drm_exec snd_intel_sdw_acpi snd_usbmidi_lib drm_suballoc_helper snd_hda_codec drm_ttm_helper snd_ump kvm_amd snd_hda_core ttm snd_rawmidi gigabyte_wmi wmi_bmof i2c_algo_bit snd_hwdep snd_seq_device uvcvideo kvm drm_display_helper r8169 snd_pcm videobuf2_vmalloc uvc realtek cec snd_timer irqbypass videobuf2_memops sp5100_tco mdio_devres video rapl videobuf2_v4l2 snd i2c_piix4 pcspkr wacom soundcore libphy k10temp i2c_smbus videobuf2_common rfkill wmi gpio_amdpt joydev razermouse(OE) razerkbd(OE) gpio_generic mac_hid v4l2loopback(OE) videodev mc pkcs8_key_parser crypto_user loop nfnetlink ip_tables x_tables dm_crypt
[ 4079.233833] encrypted_keys trusted asn1_encoder tee dm_mod polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sha1_ssse3 nvme aesni_intel crypto_simd nvme_core cryptd ccp nvme_keyring nvme_auth
[ 4079.233865] ---[ end trace 0000000000000000 ]---
[ 4079.233867] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 4079.233871] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 68 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 4079.233873] RSP: 0018:ffffd4ec847d7650 EFLAGS: 00010202
[ 4079.233876] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 4079.233878] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 4079.233880] RBP: 000000000000000c R08: ffffd4ec80000000 R09: fffff4ec7fffffff
[ 4079.233881] R10: ffff8efdbf355280 R11: 0000000000000000 R12: 0000000000038b98
[ 4079.233883] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 4079.233885] FS: 0000000000000000(0000) GS:ffff8efe078af000(0000) knlGS:0000000000000000
[ 4079.233887] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4079.233888] CR2: 00000000a808b000 CR3: 00000001ab053000 CR4: 0000000000f50ef0
[ 4079.233890] PKRU: 55555554
[ 4079.233892] Kernel panic - not syncing: Fatal exception in interrupt
[ 4079.235729] Kernel Offset: 0x33600000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Offline
Could you try if the same problem is also present on the latest mainline release?
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.16rc3-1-x86_64.pkg.tar.zst
In any case this issue will be hard to debug without a reproducer ...
Also why is your kernel tainted, which OOT module do you have loaded? Does the crash also occur without it?
Offline
razermouse(OE) razerkbd(OE) v4l2loopback(OE)
There is no specific trigger (that I can determine). The machine stays up for hours, then crashes "out of the blue".
Keep an eye on "cat /proc/meminfo" - do you run OOM/leak RAM?
Online
Great suggestions - thank you.
Seeing as kernel 6.15.4 was just released along with a new linux-firmware package, I'm going to try that first (before the rc-kernel) - as well as removing the modules tainting the kernel (packages openrazer-driver-dkms and v4l2loopback-dkms from extra).
No OOMs, as far as I can tell (I would expect to see relevant log/journal entries).
Thanks again. Will report back.
Offline
A quick update:
On kernel 6.15.4 (not tainted) with linux-firmware 20250627-1 the machine remained stable all week until yesterday. I had another kernel panic, with identical-looking dump (same call stack). This time, however, I think I have the trigger: Steam is updating a game (ARK: Survival Ascended) in the background. At about the 32% completion mark, the kernel invariably panics (recreated three times). I'm yet to figure out if it's just this particular game or others too.
Offline
do you use a nvme as storage for steam?
if so: keep an eye on it's temps: game updates are quite resource intensive tasks
could be the nvme overheating may lead to the crash (just an idea)
Offline
The amdgpu module crashes in __get_vm_area_node which is memory allocation.
If you're somehow using a tmpfs as the download destination (overlayfs?) or amdgpu leaks GTT/GART (shows up frequently) you're not gonna see any OOM killer when/before this happens.
Keep an eye on /proc/meminfo
Online
I am seeing the same kernel panic with an AMD Ryzen 7 7800X3D and AMD Radeon RX 7700 XT. I've only had it occur after a several hour uptime(6+) and playing Guild Wars 2 in a specific area(SMC in WvW). I have watched both GPU VRAM and GTT and memory with amdgpu_top and htop and have not noticed either hit their limits during this time.
Specifically I am using Bottles with ge-proton10-9, dxvk-2.7 and vkd3d-proton-2.14.1 when this occurs.
Last edited by fly (2025-07-14 00:30:07)
Offline
I believe it is a kernel bug as it is clearly stated
kernel BUG at mm/vmalloc.c:3118!
Looking at the code: https://github.com/archlinux/linux/blob … oc.c#L3118 shows that it's
BUG_ON(in_interrupt());
This probably should get reported to the kernel, however I am new to reporting issues and am not sure where to begin. I had originally posted here because a search returned this post.
I also see that there's a commit for vmalloc.c: https://github.com/archlinux/linux/comm … fdf8841321
This is applied to 6.15.7, however I am not sure if that would be related to this at all.
Last edited by fly (2025-07-22 12:46:33)
Offline
The OOPS in teh OP is because amdgpu tries to allocate memory from an interrupt context.
Do you have a system journal fro context?
For the previous boot:
sudo journalctl -b -1 | curl -F 'file=@-' 0x0.st
Online
I found the journal from a few days ago. http://0x0.st/8nes.txt
I had skimmed it at the time and nothing stood out. As far as I could tell the messages near the end are not related and are just Wine warnings.
I did notice the call trace showing amdgpu, I believe there's been a bit of changes. I that the power profile mode handling had changed. Perhaps it has already been fixed in mainline/6.16.
Offline
That journal is mostly just flatschpak & steam, there're no obvious errors, let alone kernel oopse's, let alone anything akin to the oops in the OP… is it supposed to cover
I am seeing the same kernel panic with an AMD Ryzen 7 7800X3D and AMD Radeon RX 7700 XT.
Randomly:
https://bbs.archlinux.org/viewtopic.php?id=306429&p=2
https://bbs.archlinux.org/viewtopic.php … 0#p2252990
Do you also get this w/ the LTS kernel?
Online
Yes, that journal was from just before the panic, I am not sure why it wouldn't be showing any kernel issues either. When the kernel panic occurs it displays a QR code for the kernel log, and pauses the system until forced reboot.
For reference here's the kernel panic that I got.
[30882.405879] ------------[ cut here ]------------
[30882.405882] kernel BUG at mm/vmalloc.c:3118!
[30882.405889] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[30882.405892] CPU: 15 UID: 0 PID: 93559 Comm: kworker/u64:9 Not tainted 6.15.6-arch1-1 #1 PREEMPT(full) a49b9575025ef78fca63b5f170baaeaabd0c299d
[30882.405895] Hardware name: Gigabyte Technology Co., Ltd. B650 GAMING X AX V2/B650 GAMING X AX V2, BIOS F34 05/23/2025
[30882.405897] Workqueue: events_unbound commit_work
[30882.405902] RIP: 0010:__get_vm_area_node+0x12d/0x130
[30882.405906] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 e8 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[30882.405908] RSP: 0018:ffffd40ac7e37650 EFLAGS: 00010202
[30882.405910] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[30882.405911] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[30882.405913] RBP: 000000000000000c R08: ffffd40ac0000000 R09: fffff40abfffffff
[30882.405914] R10: 00001c165ed50326 R11: 0000000000000001 R12: 0000000000038b98
[30882.405915] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[30882.405917] FS: 0000000000000000(0000) GS:ffff8b97e60ed000(0000) knlGS:0000000000000000
[30882.405918] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[30882.405919] CR2: 00000001452a0000 CR3: 000000013c2c8000 CR4: 0000000000f50ef0
[30882.405921] PKRU: 55555554
[30882.405922] Call Trace:
[30882.405924] <TASK>
[30882.405925] __vmalloc_node_range_noprof+0x13a/0x890
[30882.405930] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406116] ? __alloc_frozen_pages_noprof+0x334/0x350
[30882.406119] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406256] ? ___kmalloc_large_node+0x66/0x100
[30882.406260] __kvmalloc_node_noprof+0x2f2/0x640
[30882.406262] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406374] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406483] ? srso_alias_return_thunk+0x5/0xfbef5
[30882.406486] ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406671] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406842] dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.406990] dc_state_create_phantom_plane+0x1a/0x60 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.407123] dcn32_add_phantom_pipes+0x163/0x440 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.407289] dcn32_internal_validate_bw+0xb8c/0x15e0 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.407468] ? dcn32_validate_bandwidth+0xb3/0x320 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.407641] dcn32_validate_bandwidth+0x10b/0x320 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.407791] update_planes_and_stream_state+0x264/0x510 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.407931] update_planes_and_stream_v2+0x22f/0x580 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.408082] dc_update_planes_and_stream+0x56/0xd0 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.408208] ? sort+0x34/0x60
[30882.408211] amdgpu_dm_atomic_commit_tail+0x1571/0x3860 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.408374] ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[30882.408498] ? srso_alias_return_thunk+0x5/0xfbef5
[30882.408500] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x145/0x380
[30882.408504] ? srso_alias_return_thunk+0x5/0xfbef5
[30882.408506] ? dma_fence_default_wait+0x8a/0x280
[30882.408509] ? srso_alias_return_thunk+0x5/0xfbef5
[30882.408511] ? wait_for_completion_timeout+0x14e/0x1a0
[30882.408515] ? srso_alias_return_thunk+0x5/0xfbef5
[30882.408518] commit_tail+0x9e/0x130
[30882.408521] process_one_work+0x190/0x350
[30882.408525] worker_thread+0x2d7/0x410
[30882.408528] ? __pfx_worker_thread+0x10/0x10
[30882.408530] kthread+0xf9/0x240
[30882.408533] ? __pfx_kthread+0x10/0x10
[30882.408535] ret_from_fork+0x31/0x50
[30882.408538] ? __pfx_kthread+0x10/0x10
[30882.408540] ret_from_fork_asm+0x1a/0x30
[30882.408545] </TASK>
[30882.408547] Modules linked in: udp_diag tcp_diag inet_diag sctp ip6_udp_tunnel udp_tunnel snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfkill vfat fat amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi kvm snd_hda_intel irqbypass snd_intel_dspcfg snd_intel_sdw_acpi polyval_clmulni polyval_generic snd_hda_codec ghash_clmulni_intel sha512_ssse3 snd_hda_core sha256_ssse3 sha1_ssse3 snd_hwdep aesni_intel r8169 snd_pcm sp5100_tco crypto_simd snd_timer realtek cryptd i2c_piix4 mdio_devres rapl snd i2c_smbus wmi_bmof gigabyte_wmi pcspkr k10temp ccp soundcore libphy joydev mousedev amd_3d_vcache mac_hid pkcs8_key_parser i2c_dev crypto_user dm_mod loop nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper nvme drm_panel_backlight_quirks drm_buddy nvme_core drm_display_helper nvme_keyring cec nvme_auth video wmi
[30882.408613] ---[ end trace 0000000000000000 ]---
[30882.408615] RIP: 0010:__get_vm_area_node+0x12d/0x130
[30882.408619] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 e8 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[30882.408622] RSP: 0018:ffffd40ac7e37650 EFLAGS: 00010202
[30882.408624] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[30882.408626] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[30882.408628] RBP: 000000000000000c R08: ffffd40ac0000000 R09: fffff40abfffffff
[30882.408630] R10: 00001c165ed50326 R11: 0000000000000001 R12: 0000000000038b98
[30882.408631] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[30882.408633] FS: 0000000000000000(0000) GS:ffff8b97e60ed000(0000) knlGS:0000000000000000
[30882.408635] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[30882.408639] CR2: 00000001452a0000 CR3: 000000013c2c8000 CR4: 0000000000f50ef0
[30882.408641] PKRU: 55555554
[30882.408643] Kernel panic - not syncing: Fatal exception in interrupt
[30882.409814] Kernel Offset: 0x34400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Thanks for those posts, gives me a few tweaks to attempt.
I have since updated to 6.15.7, and haven't had the panic, however I also haven't had the system on for very long.
I have not tried the LTS kernel, I suppose that would be an important way to diagnose this, I'll try if the kernel panics continue.
Offline
When the kernel panic occurs it displays a QR code for the kernel log
Fwwi, the kernel panic'd - nothing will be stored to disk anymore after this.
https://wiki.archlinux.org/title/Kdump - but you got the backtrace and
[30882.408500] ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x145/0x380
it seems to fall into the vicinity of the flip timeouts (same cause, different impact?)
Online
I've been running the LTS kernel for a few days now and have not had the panic. I haven't changed any other values or tweaks.
Kdump looks very useful, if I continue to get the panic I'll look into setting it up.
Offline
I had same issue with linux-kernel 6.15.6-arch1-1 on AMD Ryzen 7 5600G, RX 7600. While starting some games (e.g. The Finals) system panic'd with same message:
[ 874.115141] umip: RenderThread 1[7578] ip:14b127f66 sp:56c415e8: SGDT instruction cannot be used by applications.
[ 874.115146] umip: RenderThread 1[7578] ip:14b127f66 sp:56c415e8: For now, expensive software emulation returns the result.
[ 874.115384] umip: AtomicHeart-Win[7370] ip:152d33b97 sp:b97fe8: SGDT instruction cannot be used by applications.
[ 874.115387] umip: AtomicHeart-Win[7370] ip:152d33b97 sp:b97fe8: For now, expensive software emulation returns the result.
[ 874.115489] umip: RenderThread 1[7578] ip:154bca7a9 sp:56c4e9f8: SGDT instruction cannot be used by applications.
[ 2171.210115] xpadneo 0005:045E:0B13.000A: reverting to original version (changed version from 0x00001130 to 0x00000517)
[ 2171.210120] xpadneo 0005:045E:0B13.000A: reverting to original product (changed PID from 0x028E to 0x0B13)
[ 3938.731606] ------------[ cut here ]------------
[ 3938.731610] kernel BUG at mm/vmalloc.c:3118!
[ 3938.731619] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[ 3938.731624] CPU: 6 UID: 0 PID: 11296 Comm: kworker/u64:3 Tainted: G OE 6.15.6-arch1-1 #1 PREEMPT(full) a49b9575025ef78fca63b5f170baaeaabd0c299d
[ 3938.731630] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 3938.731632] Hardware name: To Be Filled By O.E.M. B450 Steel Legend/B450 Steel Legend, BIOS P4.31 05/20/2022
[ 3938.731635] Workqueue: events_unbound commit_work
[ 3938.731641] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 3938.731646] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 e8 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 3938.731649] RSP: 0018:ffffd33f81803650 EFLAGS: 00010202
[ 3938.731652] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 3938.731654] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 3938.731656] RBP: 000000000000000c R08: ffffd33f80000000 R09: fffff33f7fffffff
[ 3938.731657] R10: 000003950eb2aff6 R11: 0000000000000001 R12: 0000000000038b98
[ 3938.731659] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 3938.731660] FS: 0000000000000000(0000) GS:ffff8ca744ead000(0000) knlGS:0000000000000000
[ 3938.731662] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3938.731664] CR2: 00007f4adf896950 CR3: 00000001573f4000 CR4: 0000000000f50ef0
[ 3938.731666] PKRU: 55555554
[ 3938.731667] Call Trace:
[ 3938.731669] <TASK>
[ 3938.731671] __vmalloc_node_range_noprof+0x13a/0x890
[ 3938.731677] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.731990] ? __alloc_frozen_pages_noprof+0x334/0x350
[ 3938.731995] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.732281] ? ___kmalloc_large_node+0x66/0x100
[ 3938.732286] __kvmalloc_node_noprof+0x2f2/0x640
[ 3938.732289] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.732563] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.732824] ? srso_alias_return_thunk+0x5/0xfbef5
[ 3938.732829] ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.733108] ? dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.733333] dc_create_plane_state+0x23/0x80 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.733579] dc_state_create_phantom_plane+0x1a/0x60 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.733795] dcn32_add_phantom_pipes+0x163/0x440 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.734029] dcn32_internal_validate_bw+0xb8f/0x15e0 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.734270] ? dcn32_validate_bandwidth+0xb3/0x320 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.734493] dcn32_validate_bandwidth+0x10b/0x320 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.734705] update_planes_and_stream_state+0x267/0x510 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.734902] update_planes_and_stream_v2+0x22f/0x580 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.735097] dc_update_planes_and_stream+0x56/0xd0 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.735286] ? sort+0x34/0x60
[ 3938.735290] amdgpu_dm_atomic_commit_tail+0x1571/0x3860 [amdgpu 8500ea93f3e29b74cbfc8e0273f6b300b2277449]
[ 3938.735511] commit_tail+0xa1/0x130
[ 3938.735516] process_one_work+0x193/0x350
[ 3938.735522] worker_thread+0x2d7/0x410
[ 3938.735525] ? __pfx_worker_thread+0x10/0x10
[ 3938.735528] kthread+0xfc/0x240
[ 3938.735531] ? __pfx_kthread+0x10/0x10
[ 3938.735534] ret_from_fork+0x34/0x50
[ 3938.735538] ? __pfx_kthread+0x10/0x10
[ 3938.735540] ret_from_fork_asm+0x1a/0x30
[ 3938.735547] </TASK>
[ 3938.735549] Modules linked in: hid_xpadneo(OE) ff_memless rfcomm snd_seq_dummy snd_hrtimer snd_seq uhid cmac algif_hash algif_skcipher af_alg nct6775 bnep nct6775_core hwmon_vid vfat fat amdgpu btusb btrtl intel_rapl_msr amd_atl btintel snd_hda_codec_realtek btbcm snd_hda_codec_generic intel_rapl_common btmtk snd_hda_scodec_component snd_hda_codec_hdmi amdxcp bluetooth snd_usb_audio snd_hda_intel gpu_sched drm_panel_backlight_quirks snd_intel_dspcfg drm_buddy ee1004 rfkill snd_usbmidi_lib snd_intel_sdw_acpi drm_exec snd_ump drm_suballoc_helper kvm_amd snd_hda_codec snd_rawmidi drm_ttm_helper snd_hda_core snd_seq_device ttm snd_hwdep r8169 i2c_algo_bit kvm snd_pcm mousedev realtek joydev snd_timer mdio_devres irqbypass mc apple_mfi_fastcharge drm_display_helper wmi_bmof rapl i2c_piix4 pcspkr snd cec libphy i2c_smbus soundcore gpio_amdpt gpio_generic mac_hid loop nfnetlink ip_tables x_tables polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 nvme sha1_ssse3 aesni_intel nvme_core crypto_simd
[ 3938.735630] cryptd nvme_keyring ccp sp5100_tco nvme_auth zenpower(OE) hid_apple video wmi dm_mirror dm_region_hash dm_log dm_mod vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) pkcs8_key_parser i2c_dev crypto_user
[ 3938.735660] ---[ end trace 0000000000000000 ]---
[ 3938.735662] RIP: 0010:__get_vm_area_node+0x12d/0x130
[ 3938.735668] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 e8 f8 01 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[ 3938.735670] RSP: 0018:ffffd33f81803650 EFLAGS: 00010202
[ 3938.735672] RAX: 00000000ffffffff RBX: 0000000000001000 RCX: 0000000000000422
[ 3938.735674] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038b98
[ 3938.735676] RBP: 000000000000000c R08: ffffd33f80000000 R09: fffff33f7fffffff
[ 3938.735677] R10: 000003950eb2aff6 R11: 0000000000000001 R12: 0000000000038b98
[ 3938.735679] R13: 0000000000038b98 R14: 000000000000000c R15: 0000000000000dc0
[ 3938.735681] FS: 0000000000000000(0000) GS:ffff8ca744ead000(0000) knlGS:0000000000000000
[ 3938.735682] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3938.735684] CR2: 00007f4adf896950 CR3: 00000001573f4000 CR4: 0000000000f50ef0
[ 3938.735686] PKRU: 55555554
[ 3938.735688] Kernel panic - not syncing: Fatal exception in interrupt
[ 3938.735970] Kernel Offset: 0x24e00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Installing linux-lts kernel seems to have solved issue.
Offline
Is the issue still present in linux 6.16 and if so is there an upstream bug report?
Offline
Also you could test the latest release candidate:
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.17rc1-1-x86_64.pkg.tar.zst
Offline
Also you could test the latest release candidate:
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.17rc1-1-x86_64.pkg.tar.zst
I tried this kernel and system freezes before starting game:
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
Aug 14 13:43:16 kernel: [drm:amdgpu_job_submit [amdgpu]] *ERROR* Trying to push to a killed entity
...
Aug 14 13:43:18 steam[1689]: Adding process 3319 for gameID 2073850
Aug 14 13:43:19 systemd[1]: systemd-localed.service: Deactivated successfully.
Aug 14 13:43:26 kernel: ------------[ cut here ]------------
Aug 14 13:43:26 kernel: WARNING: CPU: 7 PID: 1059 at ./include/linux/sched.h:2185 __ww_mutex_lock.constprop.0+0x5cb/0xca0
Aug 14 13:43:26 kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq uhid cmac algif_hash algif_skcipher af_alg bnep nct6775 nct6775_core hwmon_vid vfat fat intel_rapl_msr amd_atl intel_rapl_common amdgpu snd_hda_codec_alc662 snd_hda_codec_realtek_lib snd_hda_codec_generic snd_hda_codec_atihdmi snd_hda_codec_hdmi snd_hda_intel snd_hda_codec snd_usb_audio btusb snd_hda_core snd_usbmidi_lib btrtl amdxcp kvm_amd snd_ump gpu_sched btintel snd_intel_dspcfg r8169 snd_rawmidi snd_intel_sdw_acpi btbcm drm_panel_backlight_quirks ee1004 snd_seq_device realtek btmtk drm_buddy snd_hwdep kvm drm_exec snd_pcm drm_suballoc_helper mdio_devres drm_ttm_helper bluetooth irqbypass libphy ttm snd_timer i2c_piix4 i2c_algo_bit rfkill wmi_bmof rapl mc pcspkr i2c_smbus mdio_bus snd drm_display_helper soundcore cec mousedev gpio_amdpt joydev gpio_generic apple_mfi_fastcharge mac_hid loop nfnetlink nvme polyval_clmulni nvme_core ghash_clmulni_intel nvme_keyring aesni_intel nvme_auth sp5100_tco ccp hid_apple video wmi dm_mirror dm_region_hash
Aug 14 13:43:26 kernel: dm_log dm_mod pkcs8_key_parser i2c_dev crypto_user
Aug 14 13:43:26 kernel: CPU: 7 UID: 1000 PID: 1059 Comm: kwin_wayland Tainted: G S 6.17.0-rc1-1-mainline #1 PREEMPT(full) 63d3864c904e19e518ddef3a869cee1f9a905c50
Aug 14 13:43:26 kernel: Tainted: [S]=CPU_OUT_OF_SPEC
Aug 14 13:43:26 kernel: Hardware name: To Be Filled By O.E.M. B450 Steel Legend/B450 Steel Legend, BIOS P4.31 05/20/2022
Aug 14 13:43:26 kernel: RIP: 0010:__ww_mutex_lock.constprop.0+0x5cb/0xca0
Aug 14 13:43:26 kernel: Code: 01 00 00 00 66 89 48 14 48 39 de 0f 84 dd fd ff ff 48 8b 86 30 0e 00 00 48 85 c0 0f 84 a9 06 00 00 49 39 c7 0f 84 ad 06 00 00 <0f> 0b 48 c7 86 30 0e 00 00 00 00 00 00 4c 89 f7 e8 50 55 01 ff e9
Aug 14 13:43:26 kernel: RSP: 0018:ffffcf8281e8faa0 EFLAGS: 00010093
Aug 14 13:43:26 kernel: RAX: ffff8d3fe70ad028 RBX: ffff8d3fe7253680 RCX: 0000000000000001
Aug 14 13:43:26 kernel: RDX: 0000000000000001 RSI: ffff8d3fe724d1c0 RDI: 0000000000000001
Aug 14 13:43:26 kernel: RBP: ffffcf8281e8fb48 R08: ffff8d3fcd2a1820 R09: 0000000000000000
Aug 14 13:43:26 kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffffcf8281e8fc78
Aug 14 13:43:26 kernel: R13: ffff8d3fcd2a1828 R14: ffffcf8281e8fad0 R15: ffff8d3fcd2a1820
Aug 14 13:43:26 kernel: FS: 00007fdb81bcba40(0000) GS:ffff8d43304ca000(0000) knlGS:0000000000000000
Aug 14 13:43:26 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 14 13:43:26 kernel: CR2: 00007f9ff00bf000 CR3: 000000014a435000 CR4: 0000000000f50ef0
Aug 14 13:43:26 kernel: PKRU: 55555554
Aug 14 13:43:26 kernel: Call Trace:
Aug 14 13:43:26 kernel: <TASK>
Aug 14 13:43:26 kernel: drm_modeset_lock+0xde/0x100
Aug 14 13:43:26 kernel: drm_atomic_get_plane_state+0x85/0x1a0
Aug 14 13:43:26 kernel: drm_atomic_set_property+0x2bd/0xd40
Aug 14 13:43:26 kernel: drm_mode_atomic_ioctl+0x235/0xcf0
Aug 14 13:43:26 kernel: ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 14 13:43:26 kernel: drm_ioctl_kernel+0xae/0x100
Aug 14 13:43:26 kernel: drm_ioctl+0x29b/0x550
Aug 14 13:43:26 kernel: ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
Aug 14 13:43:26 kernel: amdgpu_drm_ioctl+0x4a/0x90 [amdgpu 0861f9787b7519d7bcb8131767381f8eef34b68c]
Aug 14 13:43:26 kernel: __x64_sys_ioctl+0x97/0xe0
Aug 14 13:43:26 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 13:43:26 kernel: do_syscall_64+0x81/0x970
Aug 14 13:43:26 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 13:43:26 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 13:43:26 kernel: ? sched_clock_cpu+0xf/0x200
Aug 14 13:43:26 kernel: ? __flush_smp_call_function_queue+0xae/0x410
Aug 14 13:43:26 kernel: ? sched_clock_cpu+0xf/0x200
Aug 14 13:43:26 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 13:43:26 kernel: ? irqtime_account_irq+0x3c/0xc0
Aug 14 13:43:26 kernel: ? srso_alias_return_thunk+0x5/0xfbef5
Aug 14 13:43:26 kernel: ? __irq_exit_rcu+0x4c/0xf0
Aug 14 13:43:26 kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e
Aug 14 13:43:26 kernel: RIP: 0033:0x7fdb87724ecd
Aug 14 13:43:26 kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
Aug 14 13:43:26 kernel: RSP: 002b:00007ffe3b8fd910 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Aug 14 13:43:26 kernel: RAX: ffffffffffffffda RBX: 00007fdb7c01dd10 RCX: 00007fdb87724ecd
Aug 14 13:43:26 kernel: RDX: 00007ffe3b8fda00 RSI: 00000000c03864bc RDI: 0000000000000013
Aug 14 13:43:26 kernel: RBP: 00007ffe3b8fd960 R08: 000056294fd05c64 R09: 00007fdb48004db8
Aug 14 13:43:26 kernel: R10: 0000000000000003 R11: 0000000000000246 R12: 00007ffe3b8fda00
Aug 14 13:43:26 kernel: R13: 00000000c03864bc R14: 0000000000000013 R15: 000056294fd05c00
Aug 14 13:43:26 kernel: </TASK>
Aug 14 13:43:26 kernel: ---[ end trace 0000000000000000 ]---
Offline
Different backtrace - does the game use a different resolution?
(Can you run it in a windowed mode?)
Online
I've just upgraded to 6.16, I last had the issue on 6.15.9. I'll see if it happens still.
I have been slowly trying to make single changes to see if anything would fix it. The last change I made was to switch from fsync to ntsync for Guild Wars 2 where I was experiencing the problem, I hadn't experienced the issue since however like previously stated I only was getting the panic about once a week. I'll switch back to fsync (in case that was a temporary workaround) to test 6.16.
Also previously I thought it might have been uptime related, I don't anymore, I had it happen around 5 hours system uptime, and during the week the system is often on for more than 8 hours. The inconsistency is probably due to requiring a large group vs group combat event during gameplay which isn't always happening or consistently large (the combat varies from 1 to 3 groups potentially each being about 70 players, so 210 players maximum, often the groups are much smaller, like 15 to 20, depending on two or three teams would be 30 to 80 players).
The Finals appears to be a free to play game, so I could give that a try (especially if I don't have to go into a round as it's competitive) to see if I can reproduce it more consistently.
EDIT: I gave The Finals a test on 6.16, and didn't have any problem getting through the tutorial and even played part of a match(Got kicked, I think due to some recent change with anti cheat).
Last edited by fly (2025-08-14 20:29:09)
Offline
I appears that 6.16 has fixed this issue for me, I have not experienced it since updating to 6.16. Usually I would have experienced it now that I've been in a few large events in game.
Offline
\o/
In case and please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
Online
Pages: 1