You are not logged in.

#26 2025-08-24 14:54:25

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

Did your most recent crash look like #19 or like #24?

Offline

#27 2025-08-24 15:16:29

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

Did your most recent crash look like #19 or like #24?

#24

[42738.478919] ------------[ cut here ]------------
[42738.478921] kernel BUG at mm/vmalloc.c:3167!
[42738.478929] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[42738.478932] CPU: 8 UID: 0 PID: 110064 Comm: kworker/u64:6 Not tainted 6.16.2-arch1-1 #1 PREEMPT(full)  b49bb083563e9de92216080cb6a360543eca66c0
[42738.478935] Hardware name: Gigabyte Technology Co., Ltd. B650 GAMING X AX V2/B650 GAMING X AX V2, BIOS F36 07/31/2025
[42738.478937] Workqueue: events_unbound commit_work
[42738.478942] RIP: 0010:__get_vm_area_node+0x12d/0x130
[42738.478946] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 a8 05 02 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[42738.478947] RSP: 0018:ffffd464494bb600 EFLAGS: 00010202
[42738.478949] RAX: 0000000000000dc0 RBX: 0000000000001000 RCX: 0000000000000422
[42738.478951] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038ba0
[42738.478952] RBP: 000000000000000c R08: ffffd46440000000 R09: fffff4643fffffff
[42738.478953] R10: 8000000000000163 R11: 0000000000000000 R12: 8000000000000163
[42738.478955] R13: 000000000000000c R14: 00000000ffffffff R15: 0000000000000dc0
[42738.478956] FS:  0000000000000000(0000) GS:ffff8e5b8791b000(0000) knlGS:0000000000000000
[42738.478958] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42738.478959] CR2: 00007fd71f818fc0 CR3: 0000000173d5e000 CR4: 0000000000f50ef0
[42738.478960] PKRU: 55555554
[42738.478962] Call Trace:
[42738.478964]  <TASK>
[42738.478965]  __vmalloc_node_range_noprof+0x139/0x8e0
[42738.478970]  ? dc_create_plane_state+0x23/0x80 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479149]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.479152]  ? __alloc_frozen_pages_noprof+0x334/0x350
[42738.479156]  ? dc_create_plane_state+0x23/0x80 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479289]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.479292]  __kvmalloc_node_noprof+0x2f3/0x640
[42738.479296]  ? dc_create_plane_state+0x23/0x80 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479413]  ? dc_create_plane_state+0x23/0x80 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479523]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.479525]  ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479690]  ? dc_create_plane_state+0x23/0x80 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479811]  dc_create_plane_state+0x23/0x80 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.479942]  dc_state_create_phantom_plane+0x1a/0x60 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480060]  dcn32_add_phantom_pipes+0x163/0x440 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480204]  dcn32_internal_validate_bw+0xbc7/0x1610 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480366]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.480371]  ? dml1_validate+0x67/0x2c0 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480520]  dml1_validate+0xc0/0x2c0 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480651]  ? dc_state_remove_plane+0xaf/0x150 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480781]  dcn32_validate_bandwidth+0xb7/0x1c0 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.480932]  update_planes_and_stream_state+0x399/0x500 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.481066]  update_planes_and_stream_v2+0x22b/0x560 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.481192]  dc_update_planes_and_stream+0x71/0xf0 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.481307]  ? sort+0x34/0x60
[42738.481311]  amdgpu_dm_atomic_commit_tail+0x1591/0x3840 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.481472]  ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu 3c1d6947e8853ea2913f8fc280b67d05d7f90e81]
[42738.481603]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.481605]  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x145/0x380
[42738.481609]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.481611]  ? dma_fence_default_wait+0x8a/0x280
[42738.481614]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.481616]  ? wait_for_completion_timeout+0x14e/0x1a0
[42738.481619]  ? srso_alias_return_thunk+0x5/0xfbef5
[42738.481623]  commit_tail+0x9e/0x130
[42738.481625]  process_one_work+0x190/0x350
[42738.481630]  worker_thread+0x2d7/0x410
[42738.481633]  ? __pfx_worker_thread+0x10/0x10
[42738.481635]  kthread+0xf9/0x240
[42738.481637]  ? __pfx_kthread+0x10/0x10
[42738.481639]  ? __pfx_kthread+0x10/0x10
[42738.481641]  ret_from_fork+0x197/0x1d0
[42738.481644]  ? __pfx_kthread+0x10/0x10
[42738.481646]  ret_from_fork_asm+0x1a/0x30
[42738.481651]  </TASK>
[42738.481652] Modules linked in: vhost_net vhost vhost_iotlb tap tun nft_masq nft_ct nft_reject_ipv4 nf_reject_ipv4 nft_reject act_csum cls_u32 sch_htb nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables bridge stp llc snd_seq_dummy snd_hrtimer snd_seq snd_seq_device rfkill vfat fat amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg kvm snd_intel_sdw_acpi snd_hda_codec irqbypass snd_hda_core polyval_clmulni ghash_clmulni_intel snd_hwdep r8169 sha512_ssse3 snd_pcm sha1_ssse3 realtek aesni_intel sp5100_tco wmi_bmof gigabyte_wmi snd_timer mdio_devres rapl i2c_piix4 libphy snd pcspkr k10temp ccp i2c_smbus soundcore mdio_bus amd_3d_vcache joydev mousedev mac_hid pkcs8_key_parser ntsync i2c_dev crypto_user loop dm_mod nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper
[42738.481716]  nvme drm_panel_backlight_quirks drm_buddy nvme_core drm_display_helper nvme_keyring cec nvme_auth video wmi
[42738.481732] ---[ end trace 0000000000000000 ]---
[42738.481734] RIP: 0010:__get_vm_area_node+0x12d/0x130
[42738.481737] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 a8 05 02 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[42738.481739] RSP: 0018:ffffd464494bb600 EFLAGS: 00010202
[42738.481740] RAX: 0000000000000dc0 RBX: 0000000000001000 RCX: 0000000000000422
[42738.481742] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038ba0
[42738.481743] RBP: 000000000000000c R08: ffffd46440000000 R09: fffff4643fffffff
[42738.481744] R10: 8000000000000163 R11: 0000000000000000 R12: 8000000000000163
[42738.481746] R13: 000000000000000c R14: 00000000ffffffff R15: 0000000000000dc0
[42738.481747] FS:  0000000000000000(0000) GS:ffff8e5b8791b000(0000) knlGS:0000000000000000
[42738.481748] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[42738.481750] CR2: 00007fd71f818fc0 CR3: 000000043a624000 CR4: 0000000000f50ef0
[42738.481751] PKRU: 55555554
[42738.481752] Kernel panic - not syncing: Fatal exception in interrupt
[42738.482916] Kernel Offset: 0x12a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Offline

#28 2025-08-24 18:55:14

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

https://gitlab.freedesktop.org/drm/amd/-/issues/4268 looks *somewhat* related (but ends in a different stack position and is on different hardware)
You're not using  mesa-tkg, are you?

Offline

#29 2025-08-24 19:55:46

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

No, since I am using Bottles through Flatpak, it's mesa 25.1.7.

Offline

#30 2025-08-24 20:04:17

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

Can you trigger the crash w/ the repo mesa version?

Offline

#31 2025-08-24 21:15:37

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

When I was testing it 3 weeks ago, I had switched to using Steam and I believe the the repo had 25.1.7 at the time, I was able to trigger it. I can switch to using Steam again, in case 25.2.1 fixes it.

EDIT: I have switched from Bottles to Steam, I am now using the latest mesa (25.2.1).

Last edited by fly (2025-08-24 21:30:07)

Offline

#32 2025-08-28 02:48:12

ArchEr9
Member
Registered: 2025-03-18
Posts: 36

Re: Linux 6.15.* kernel crash

fly wrote:

I've been running the LTS kernel for a few days now and have not had the panic. I haven't changed any other values or tweaks.

Kdump looks very useful, if I continue to get the panic I'll look into setting it up.

Which version of Linux LTS Kernel did you use?

Offline

#33 2025-08-28 10:49:01

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

6.12.40. I also used 6.12.41 briefly.

Offline

#34 2025-08-28 13:16:18

AceFour
Member
Registered: 2025-08-28
Posts: 2

Re: Linux 6.15.* kernel crash

Having the same issue with both 6-12.43-1-lts and 6.16.3-arch1-1 using minecraft

OS: EndeavourOS x86_64
Kernel: Linux 6.12.43-1-lts
Uptime: 14 hours, 25 mins
Packages: 2203 (pacman)
Shell: zsh 5.9
Authorization required, but no authorization protocol specified

Authorization required, but no authorization protocol specified

Display (LG Ultra HD): 3840x2160 @ 60 Hz in 27" [External]
Display (DELL S3221QS): 3840x2160 @ 60 Hz in 32" [External]
WM: Hyprland 0.50.1 (X11)
Theme: Colloid-Dark-Catppuccin [GTK3]
CPU: 12th Gen Intel(R) Core(TM) i5-12400F (12) @ 4.40 GHz
GPU: AMD Radeon RX 6700 [Discrete]
Memory: 19.02 GiB / 62.61 GiB (30%)

On the 6.13.3 kernel

Aug 26 18:20:09 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=7925632, emitted seq=7925634
Aug 26 18:20:09 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Process information: process java pid 456665 thread java:cs0 pid 456716
Aug 26 18:20:09 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Aug 26 18:20:09 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset failed
...
Aug 26 18:20:25 acefour-1 kernel: [drm:psp_v11_0_memory_training [amdgpu]] *ERROR* send training msg failed.
Aug 26 18:20:25 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to process memory training!
Aug 26 18:20:25 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: resume of IP block <psp> failed -62
Aug 26 18:20:25 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -62
Aug 26 18:20:25 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU Recovery Failed: -62
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=7925634, emitted seq=7925634
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Process information: process java pid 456665 thread java:cs0 pid 456716
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Starting gfx_0.0.0 ring reset
Aug 26 18:20:36 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Ring gfx_0.0.0 reset failed


the 6.12.43 kernel

kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State Completed
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 timeout, signaled seq=3614266, emitted seq=3614268
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Process information: process java pid 386904 thread java:cs0 pid 386955
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
Aug 28 19:18:27 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
Aug 28 19:18:33 acefour-1 podman[387968]: 2025-08-28 19:18:33.454895664 -0400 EDT m=+0.061131043 container health_status 25680d7f8a8caae4e87ef9a1206fd822fdd343f3b2bfb159c8>
Aug 28 19:18:38 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 28 19:18:38 acefour-1 kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
Aug 28 19:18:38 acefour-1 kernel: [drm] VRAM is lost due to GPU reset!
Aug 28 19:18:38 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: PSP is resuming...
Aug 28 19:18:44 acefour-1 kernel: [drm:psp_v11_0_memory_training [amdgpu]] *ERROR* send training msg failed.
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: Failed to process memory training!
Aug 28 19:18:44 acefour-1 kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -62
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
Aug 28 19:18:44 acefour-1 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Aug 28 19:18:44 acefour-1 systemd-coredump[388016]: Process 386904 (java) of user 1001 terminated abnormally with signal 6/ABRT, processing...
Aug 28 19:18:44 acefour-1 systemd[1]: Started Process Core Dump (PID 388016/UID 0).
Aug 28 19:18:44 acefour-1 kernel: pcieport 0000:00:01.0: AER: Multiple Uncorrectable (Non-Fatal) error message received from 0000:03:00.1
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0:   device [1002:73df] error status/mask=00100000/00000000
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0:    [20] UnsupReq               (First)
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0: AER:   TLP Header: 40000001 0000000c 8352000c 00000000
Aug 28 19:18:44 acefour-1 kernel: snd_hda_intel 0000:03:00.1: PCIe Bus Error: severity=Uncorrectable (Non-Fatal), type=Transaction Layer, (Requester ID)
Aug 28 19:18:44 acefour-1 kernel: snd_hda_intel 0000:03:00.1:   device [1002:ab28] error status/mask=00100000/00000000
Aug 28 19:18:44 acefour-1 kernel: snd_hda_intel 0000:03:00.1:    [20] UnsupReq               (First)
Aug 28 19:18:44 acefour-1 kernel: snd_hda_intel 0000:03:00.1: AER:   TLP Header: 40000001 0000000c 8352000c 00000000
Aug 28 19:18:44 acefour-1 kernel: snd_hda_intel 0000:03:00.1: AER:   Error of this Agent is reported first
Aug 28 19:18:44 acefour-1 kernel: snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -62
Aug 28 19:18:44 acefour-1 kernel: amdgpu 0000:03:00.0: amdgpu: GPU Recovery Failed: -62

Last edited by AceFour (2025-08-28 23:27:26)

Offline

#35 2025-08-28 21:25:03

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

@AceFour, looks more like https://bbs.archlinux.org/viewtopic.php?id=299954
@fly have you encountered the crash again w/ mesa 25.2.1 ?

Offline

#36 2025-08-28 23:28:58

AceFour
Member
Registered: 2025-08-28
Posts: 2

Re: Linux 6.15.* kernel crash

@seth I updated the 6.12.43 kernel crash log above and that was running mesa 1:25.2.1-4

Offline

#37 2025-08-29 07:01:55

ArchEr9
Member
Registered: 2025-03-18
Posts: 36

Re: Linux 6.15.* kernel crash

fly wrote:

6.12.40. I also used 6.12.41 briefly.

Thanks @fly for the confirmation. The current version of Linux LTS Kernel is 6.12.42. Did you face the same issue using it also?

Offline

#38 2025-08-29 07:03:48

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

@AceFour
Please use [code][/code] tags. Edit your post in this regard.
Backtrace still looks nothing like the malloc crashes ITT, -62 is ETIME
The GPU resets for reasons previous to the segments you posted and then stops responding. See the other thread.

Last edited by seth (2025-08-29 07:04:14)

Offline

#39 2025-08-29 13:54:20

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

seth wrote:

@fly have you encountered the crash again w/ mesa 25.2.1 ?

So far I have not, I've been trying to get into situations in GW2 that would usually cause it.

ArchEr9 wrote:

Thanks @fly for the confirmation. The current version of Linux LTS Kernel is 6.12.42. Did you face the same issue using it also?

I have not given 6.12.42 a try yet as I can't reliably reproduce the oops, I try to run for a week after making changes.

Offline

#40 2025-08-30 12:02:08

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

AceFour wrote:

@seth I updated the 6.12.43 kernel crash log above and that was running mesa 1:25.2.1-4

Since it seems you can reproduce it more regularly than me, you could try downgrading LTS or the regular kernel, there's info on the wiki. https://wiki.archlinux.org/title/Downgrading_packages
Either downgrade from pacman cache if you already had it installed in the past and haven't cleared the cache, or from the archive.

Offline

#41 2025-09-12 00:29:13

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

I've been updating daily and haven't had any problems until today, had another "kernel BUG at mm/vmalloc.c" oops. I honestly thought the problem might have been solved.

Linux: 6.16.6
mesa: 25.2.2

Last edited by fly (2025-09-12 00:29:44)

Offline

#42 2025-09-12 07:48:33

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

Does this only happen after considerable uptime? Are you running OOM/leaking RAM to GART/GTT?
Either way you want to report this upstream.

Offline

#43 2025-09-12 12:51:19

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

Looking at all my past oops, the lowest uptime was just over 5 hours, so some uptime. I don't experience the oops when I reset and go back in game.
I've watched /proc/meminfo for MemFree, SwapFree, and VmallocUsed / VmallocTotal, I've also looked at VRAM and GTT with amdgpu_top though I was more focused on VRAM, I don't believe I saw GTT getting close to being full. GART I don't see in any monitor.

I've tried to report it upstream, though due to the inconsistency to reproduce I don't think that it's very useful. I feel very stuck with this problem due to not being able to reproduce it.

Offline

#44 2025-09-12 13:38:56

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

You've some relatively solid backtraces that should™ allow the developers to gauge where the problem is located.
(The look all the same?)

Offline

#45 2025-09-14 15:05:54

taldarus
Member
Registered: 2019-09-30
Posts: 2

Re: Linux 6.15.* kernel crash

Hi,

Random dump, but I am 90% I am running into the same problem. Computer completely crashes when I run high-er end games. Ryzen 5600 running on Manjaro. I was on 6.12.44 (Kernel) when the crashes started, I believe. Downgraded back to 6.06.103-3 and they kept going. Double checked to make sure it isn't hardware. Then turned to the Ryzen page on Arch/wiki. Tinkering with that stuff brought a lot of changes. But they still persist. I eventually went even further back in Kernels and it seemed to help, for a while.

Now I am moving to the experimental kernel to see if that fixes it. I assume it will be a significant improvement. This thread sounds almost word for word like what I am dealing with. But I would add that, perhaps coincidentally, I was working on getting wine to behave with DX11 and steam. It involved getting older games to behave. I'll pop back in after a day or two, if you guys want a specific log or something, let me know.

Offline

#46 2025-09-14 18:13:42

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

If you want a comment on whether you're kind hitting the same problem, look for the

------------[ cut here ]------------

part in your journal/dmesg and post it.

Offline

#47 2025-09-15 07:00:54

lulzette
Member
Registered: 2021-10-30
Posts: 7

Re: Linux 6.15.* kernel crash

fly wrote:

I've been updating daily and haven't had any problems until today, had another "kernel BUG at mm/vmalloc.c" oops. I honestly thought the problem might have been solved.

Linux: 6.16.6
mesa: 25.2.2

I updated my system and got kernel version 6.16.7. I had no issues for a week. Seems like problem gone.

Offline

#48 2025-09-24 13:49:56

SkyeStarfall
Member
Registered: 2025-09-24
Posts: 1

Re: Linux 6.15.* kernel crash

Hi I seem to be having a similar issue to the one described here

Linux 6.16.8
Mesa 25.2.3

[13242.350948] ------------[ cut here ]------------
[13242.350951] kernel BUG at mm/vmalloc.c:3167!
[13242.350958] Oops: invalid opcode: 0000 [#1] SMP NOPTI
[13242.350962] CPU: 3 UID: 0 PID: 27967 Comm: kworker/u64:1 Tainted: G           OE       6.16.8-arch2-1 #1 PREEMPT(full)  de52b3ffa7625e72d9c953dfb673005e33f20984
[13242.350966] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[13242.350967] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 3.16 12/18/2024
[13242.350968] Workqueue: events_unbound commit_work
[13242.350974] RIP: 0010:__get_vm_area_node+0x12d/0x130
[13242.350979] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 a8 05 02 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[13242.350981] RSP: 0018:ffffcffac9b6b600 EFLAGS: 00010202
[13242.350983] RAX: 0000000000000dc0 RBX: 0000000000001000 RCX: 0000000000000422
[13242.350984] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038ba0
[13242.350985] RBP: 000000000000000c R08: ffffcffac0000000 R09: ffffeffabfffffff
[13242.350987] R10: 8000000000000163 R11: 0000000000000000 R12: 8000000000000163
[13242.350988] R13: 000000000000000c R14: 00000000ffffffff R15: 0000000000000dc0
[13242.350989] FS:  0000000000000000(0000) GS:ffff8d407e3da000(0000) knlGS:0000000000000000
[13242.350991] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13242.350992] CR2: 00007fbfa386f008 CR3: 0000000192d0e000 CR4: 0000000000f50ef0
[13242.350994] PKRU: 55555554
[13242.350995] Call Trace:
[13242.350997]  <TASK>
[13242.350999]  __vmalloc_node_range_noprof+0x139/0x8e0
[13242.351004]  ? dc_create_plane_state+0x23/0x80 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.351195]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.351199]  ? __alloc_frozen_pages_noprof+0x334/0x350
[13242.351203]  ? dc_create_plane_state+0x23/0x80 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.351365]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.351368]  __kvmalloc_node_noprof+0x2f3/0x640
[13242.351371]  ? dc_create_plane_state+0x23/0x80 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.351501]  ? dc_create_plane_state+0x23/0x80 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.351612]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.351615]  ? dcn20_build_pipe_pix_clk_params+0x1d/0x40 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.351770]  ? dc_create_plane_state+0x23/0x80 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.351891]  dc_create_plane_state+0x23/0x80 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.352020]  dc_state_create_phantom_plane+0x1a/0x60 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.352142]  dcn32_add_phantom_pipes+0x163/0x440 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.352304]  dcn32_internal_validate_bw+0xbc7/0x1610 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.352521]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.352526]  ? dml1_validate+0x67/0x2c0 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.352754]  dml1_validate+0xc0/0x2c0 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.352975]  ? dc_state_remove_plane+0xaf/0x150 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353144]  dcn32_validate_bandwidth+0xb7/0x1c0 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353294]  update_planes_and_stream_state+0x399/0x500 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353430]  update_planes_and_stream_v2+0x22b/0x560 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353560]  dc_update_planes_and_stream+0x71/0xf0 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353680]  ? sort+0x34/0x60
[13242.353683]  amdgpu_dm_atomic_commit_tail+0x1591/0x3840 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353848]  ? amdgpu_crtc_get_scanout_position+0x28/0x40 [amdgpu e804af2cdf29bc4cd9958e19ff3fee1fda277c69]
[13242.353966]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.353968]  ? drm_crtc_vblank_helper_get_vblank_timestamp_internal+0x145/0x380
[13242.353972]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.353974]  ? dma_fence_default_wait+0x8a/0x280
[13242.353977]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.353979]  ? wait_for_completion_timeout+0x14e/0x1a0
[13242.353982]  ? srso_alias_return_thunk+0x5/0xfbef5
[13242.353987]  commit_tail+0x9e/0x130
[13242.353990]  process_one_work+0x190/0x350
[13242.353995]  worker_thread+0x2d7/0x410
[13242.353998]  ? __pfx_worker_thread+0x10/0x10
[13242.354001]  kthread+0xf9/0x240
[13242.354003]  ? __pfx_kthread+0x10/0x10
[13242.354005]  ? __pfx_kthread+0x10/0x10
[13242.354007]  ret_from_fork+0x1c1/0x1f0
[13242.354009]  ? __pfx_kthread+0x10/0x10
[13242.354011]  ret_from_fork_asm+0x1a/0x30
[13242.354016]  </TASK>
[13242.354018] Modules linked in: vsock_loopback vmw_vsock_virtio_transport_common vmw_vsock_vmci_transport vsock vmw_vmci rfkill vfat fat amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi kvm snd_hda_intel uvcvideo spd5118 snd_intel_dspcfg videobuf2_vmalloc snd_intel_sdw_acpi snd_usb_audio irqbypass uvc videobuf2_memops polyval_clmulni snd_usbmidi_lib snd_hda_codec ghash_clmulni_intel videobuf2_v4l2 snd_ump sha512_ssse3 videobuf2_common snd_hda_core snd_rawmidi sp5100_tco sha1_ssse3 snd_hwdep snd_seq_device r8169 aesni_intel videodev snd_pcm i2c_piix4 realtek rapl wmi_bmof i2c_smbus pcspkr k10temp snd_timer mc mdio_devres ccp snd libphy soundcore mdio_bus gpio_amdpt mousedev gpio_generic amd_3d_vcache joydev razermouse(OE) mac_hid ntsync i2c_dev crypto_user dm_mod loop nfnetlink ip_tables x_tables amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec uas gpu_sched usb_storage drm_suballoc_helper drm_panel_backlight_quirks nvme
[13242.354077]  drm_buddy nvme_core drm_display_helper video nvme_keyring cec nvme_auth wmi
[13242.354115] ---[ end trace 0000000000000000 ]---
[13242.354119] RIP: 0010:__get_vm_area_node+0x12d/0x130
[13242.354123] Code: 83 c1 01 39 d1 0f 4c ca ba 1e 00 00 00 39 d1 0f 4f ca 48 d3 e6 49 89 f7 e9 35 ff ff ff 4c 89 f7 e8 a8 05 02 00 45 31 f6 eb ae <0f> 0b 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 0f 1f
[13242.354125] RSP: 0018:ffffcffac9b6b600 EFLAGS: 00010202
[13242.354128] RAX: 0000000000000dc0 RBX: 0000000000001000 RCX: 0000000000000422
[13242.354130] RDX: 000000000000000c RSI: 0000000000001000 RDI: 0000000000038ba0
[13242.354131] RBP: 000000000000000c R08: ffffcffac0000000 R09: ffffeffabfffffff
[13242.354133] R10: 8000000000000163 R11: 0000000000000000 R12: 8000000000000163
[13242.354135] R13: 000000000000000c R14: 00000000ffffffff R15: 0000000000000dc0
[13242.354136] FS:  0000000000000000(0000) GS:ffff8d407e3da000(0000) knlGS:0000000000000000
[13242.354138] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[13242.354140] CR2: 00007fbfa386f008 CR3: 0000000192d0e000 CR4: 0000000000f50ef0
[13242.354142] PKRU: 55555554
[13242.354144] Kernel panic - not syncing: Fatal exception in interrupt
[13242.355291] Kernel Offset: 0x1bc00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)

Last edited by SkyeStarfall (2025-09-24 13:51:52)

Offline

#49 2025-09-24 14:57:47

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 70,064

Re: Linux 6.15.* kernel crash

@fly, did you already report this upstream?

Offline

#50 2025-09-25 12:16:44

fly
Member
Registered: 2025-07-14
Posts: 17

Re: Linux 6.15.* kernel crash

seth wrote:

@fly, did you already report this upstream?

https://gitlab.freedesktop.org/drm/amd/-/issues/4470

Offline

Board footer

Powered by FluxBB