You are not logged in.

#1 2024-04-06 01:44:19

Ranguvar
Member
Registered: 2008-08-12
Posts: 2,563

[SOLVED] amdgpu kernel panics beginning with 6.7.1 - 6.7.4

I've had issues losing video output completely from my AMD Polaris video card, starting with Linux 6.8.1 and mesa 24.0.3-2.
It happens when I leave my desktop alone for many hours with swaylock on and both monitors off.
Maybe 50/50 chance after 6 hours? Hard to say.
The panic happens when I give input and power the monitors back on.

This continued with 6.8.2 -- I've not tried 6.8.3/4 yet.
I downgraded to 6.7.9 and still crashed.
With mesa 24.0.4-1, I still had the issue using both 6.7.9 and 6.7.4.

I could not replicate on 6.7.0 after 48hrs, and am now testing 6.7.2 - crashed, now 6.7.1.

I'm not posting full dmesg or looking for assistance, especially because I'm using a highly-tainted linux-zen.
Just hoping to help anyone in a similar spot, as I'm seeing similar threads lately.

I haven't submitted a bug for the upstream kernel before but I'll try if I can isolate the commit, just cannot test this setup with a vanilla/clean kernel easily.

Here are a few example panics.

6.8.2:

Mar 30 13:21:37 arch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400E80000).
Mar 30 13:21:37 arch kernel: amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.1.0 test failed (-110)
Mar 30 13:21:37 arch kernel: amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.2.0 test failed (-110)
Mar 30 13:21:38 arch kernel: amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.1.1 test failed (-110)
Mar 30 13:21:38 arch kernel: [drm] DM_MST: starting TM on aconnector: 00000000cfbdd76a [id: 77]
Mar 30 13:21:38 arch kernel: [drm] DM_MST: DP12, 4-lane link detected
Mar 30 13:21:38 arch kernel: ------------[ cut here ]------------
Mar 30 13:21:38 arch kernel: WARNING: CPU: 29 PID: 2197 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8160 amdgpu_dm_atomic_commit_tail+0x2270/0x2580 [amdgpu]
Mar 30 13:21:38 arch kernel: Modules linked in: xt_nat vhost_net vhost vhost_iotlb tap snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp ip6table_mangle ip6table_nat ip6table_fil>
Mar 30 13:21:38 arch kernel:  rapl gigabyte_wmi nft_masq wmi_bmof mc i2c_piix4 snd_hwdep cfg80211 bluetooth snd_pcm r8169 igb realtek snd_timer mdio_devres ptp ecdh_generic snd pps_core mousedev ccp soundcore rfkill dca libphy crc16 joydev mac_hid nft_ct>
Mar 30 13:21:38 arch kernel: CPU: 29 PID: 2197 Comm: systemd-logind Tainted: P        W  OE      6.8.2-zen2-1-zen #1 2f27ac2810bbd221aea68cde2f42843e48e62d59
Mar 30 13:21:38 arch kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F37h 12/25/2023
Mar 30 13:21:38 arch kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x2270/0x2580 [amdgpu]
Mar 30 13:21:38 arch kernel: Code: 89 95 a8 fe ff ff e9 c6 e6 ff ff 48 c7 c2 a0 d3 1f c1 48 c7 c6 b8 ef 32 c1 48 c7 c7 90 5d ea c0 e8 b5 a2 54 e8 e9 4a fa ff ff <0f> 0b e9 eb eb ff ff 0f 0b e9 d4 eb ff ff 48 8b 85 80 fe ff ff c6
Mar 30 13:21:38 arch kernel: RSP: 0018:ffffbeb54246b998 EFLAGS: 00010282
Mar 30 13:21:38 arch kernel: RAX: 00000000ffffffea RBX: 0000000000000005 RCX: 0000000000000000
Mar 30 13:21:38 arch kernel: RDX: 0000000000000002 RSI: 0000000000000297 RDI: ffffa05117c8015c
Mar 30 13:21:38 arch kernel: RBP: ffffbeb54246bb50 R08: 0000000000000006 R09: 0000000000000000
Mar 30 13:21:38 arch kernel: R10: ffffa0553f0b0180 R11: 0000000000000000 R12: ffffa0511294d000
Mar 30 13:21:38 arch kernel: R13: ffffa0553f0b0180 R14: ffffa0512d18ac00 R15: ffffa055241ae600
Mar 30 13:21:38 arch kernel: FS:  0000776bfa809100(0000) GS:ffffa05fff140000(0000) knlGS:0000000000000000
Mar 30 13:21:38 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 30 13:21:38 arch kernel: CR2: 00007a4a96f50000 CR3: 0000000108d3c000 CR4: 0000000000f50ef0
Mar 30 13:21:38 arch kernel: PKRU: 55555554
Mar 30 13:21:38 arch kernel: Call Trace:
Mar 30 13:21:38 arch kernel:  <TASK>
Mar 30 13:21:38 arch kernel:  ? __warn+0x81/0x1b0
Mar 30 13:21:38 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x2270/0x2580 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? report_bug+0x202/0x270
Mar 30 13:21:38 arch kernel:  ? handle_bug+0x3c/0x80
Mar 30 13:21:38 arch kernel:  ? exc_invalid_op+0x19/0xc0
Mar 30 13:21:38 arch kernel:  ? asm_exc_invalid_op+0x1a/0x20
Mar 30 13:21:38 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x2270/0x2580 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0xe5a/0x2580 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  commit_tail+0x94/0x130
Mar 30 13:21:38 arch kernel:  drm_atomic_helper_commit+0x11a/0x140
Mar 30 13:21:38 arch kernel:  drm_atomic_commit+0x9a/0xd0
Mar 30 13:21:38 arch kernel:  ? __pfx___drm_printfn_info+0x10/0x10
Mar 30 13:21:38 arch kernel:  drm_atomic_helper_commit_duplicated_state+0xf2/0x100
Mar 30 13:21:38 arch kernel:  drm_atomic_helper_resume+0xa5/0x160
Mar 30 13:21:38 arch kernel:  dm_resume+0x35e/0x950 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? smum_send_msg_to_smc_with_parameter+0x93/0x110 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? smum_send_msg_to_smc+0x8b/0x100 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  ? smu7_force_dpm_level+0x229/0x590 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  amdgpu_device_ip_resume_phase2+0x52/0xc0 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  amdgpu_device_resume+0xa0/0x330 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 30 13:21:38 arch kernel:  amdgpu_pmops_runtime_resume+0x82/0xf0 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 30 13:21:38 arch kernel:  __rpm_callback+0x44/0x170
Mar 30 13:21:38 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Mar 30 13:21:38 arch kernel:  rpm_resume+0x66b/0x8e0
Mar 30 13:21:38 arch kernel:  __pm_runtime_resume+0x4b/0x80
Mar 30 13:21:38 arch kernel:  amdgpu_drm_ioctl+0x38/0x90 [amdgpu a9fac568e4f57677a7aa86ac5b06871c26eda53a]
Mar 30 13:21:38 arch kernel:  __x64_sys_ioctl+0x97/0xd0
Mar 30 13:21:38 arch kernel:  do_syscall_64+0x89/0x170
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  ? syscall_exit_to_user_mode+0x80/0x230
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  ? do_syscall_64+0x96/0x170
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  ? syscall_exit_to_user_mode+0x80/0x230
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  ? do_syscall_64+0x96/0x170
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Mar 30 13:21:38 arch kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
Mar 30 13:21:38 arch kernel: RIP: 0033:0x776bfa3224ff
Mar 30 13:21:38 arch kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Mar 30 13:21:38 arch kernel: RSP: 002b:00007ffe38dcba80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Mar 30 13:21:38 arch kernel: RAX: ffffffffffffffda RBX: 00005a5fa6731980 RCX: 0000776bfa3224ff
Mar 30 13:21:38 arch kernel: RDX: 0000000000000000 RSI: 000000000000641f RDI: 000000000000001e
Mar 30 13:21:38 arch kernel: RBP: 00007ffe38dcbb10 R08: 00000000000001eb R09: 00005a5fa6735c90
Mar 30 13:21:38 arch kernel: R10: 00007ffe38dcbb10 R11: 0000000000000246 R12: 00007ffe38dcbb08
Mar 30 13:21:38 arch kernel: R13: 00007ffe38dcbba8 R14: 00007ffe38dcbbb0 R15: 00005a5fa671b1d0
Mar 30 13:21:38 arch kernel:  </TASK>
Mar 30 13:21:38 arch kernel: ---[ end trace 0000000000000000 ]---

6.7.4 (appeared to panic and possibly recover every few hours, I did not test between panics)

Apr 05 02:11:38 arch kernel: [drm] DM_MST: stopping TM on aconnector: 00000000c8ff63a0 [id: 77]
Apr 05 02:11:51 arch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400E80000).
Apr 05 02:11:51 arch kernel: amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.1.0 test failed (-110)
Apr 05 02:11:51 arch kernel: ------------[ cut here ]------------
Apr 05 02:11:51 arch kernel: WARNING: CPU: 26 PID: 17621 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8057 amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu]
Apr 05 02:11:51 arch kernel: Modules linked in: snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_n>
Apr 05 02:11:51 arch kernel:  snd_timer nf_reject_ipv6 nft_reject snd soundcore usb_storage hfsplus hfs cdrom nft_masq btrfs nft_ct blake2b_generic xor raid6_pq ext4 nft_chain_nat nf_nat nf_conntrack crc16 nf_defrag_ipv6 mbcache nf_defrag_ipv4 jbd2 exfat>
Apr 05 02:11:51 arch kernel: CPU: 26 PID: 17621 Comm: brave Tainted: P           OE      6.7.4-zen1-1-zen #1 3d4d8321df48782f53772bf77b31195eea22b03b
Apr 05 02:11:51 arch kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F37h 12/25/2023
Apr 05 02:11:51 arch kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu]
Apr 05 02:11:51 arch kernel: Code: 4c 89 9d 60 fe ff ff e8 cc 92 27 dd 4c 8b 9d 60 fe ff ff e9 86 dd ff ff 0f 0b 49 8b 3e e8 b6 5b dd dc 85 c0 0f 84 1c d7 ff ff <0f> 0b e9 15 d7 ff ff 48 8b 85 88 fe ff ff 31 d2 31 f6 48 8d b8 60
Apr 05 02:11:51 arch kernel: RSP: 0018:ffffb2781ed7b970 EFLAGS: 00010282
Apr 05 02:11:51 arch kernel: RAX: 00000000ffffffea RBX: ffff9ff1d80cd800 RCX: 0000000000000000
Apr 05 02:11:51 arch kernel: RDX: 0000000000000002 RSI: 0000000000000297 RDI: ffff9ff1d818015c
Apr 05 02:11:51 arch kernel: RBP: ffffb2781ed7bbc8 R08: 0000000000000006 R09: 0000000000000000
Apr 05 02:11:51 arch kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9ff1d4855e00
Apr 05 02:11:51 arch kernel: R13: 0000000000000005 R14: ffff9ff7963da400 R15: ffff9ff1d8180178
Apr 05 02:11:51 arch kernel: FS:  00007f1b04183140(0000) GS:ffffa000bf080000(0000) knlGS:0000000000000000
Apr 05 02:11:51 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 05 02:11:51 arch kernel: CR2: 0000747f5c806000 CR3: 00000004d92ce000 CR4: 0000000000f50ef0
Apr 05 02:11:51 arch kernel: PKRU: 55555558
Apr 05 02:11:51 arch kernel: Call Trace:
Apr 05 02:11:51 arch kernel:  <TASK>
Apr 05 02:11:51 arch kernel:  ? __warn+0x81/0x1b0
Apr 05 02:11:51 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? report_bug+0x202/0x270
Apr 05 02:11:51 arch kernel:  ? handle_bug+0x3c/0x80
Apr 05 02:11:51 arch kernel:  ? exc_invalid_op+0x19/0xc0
Apr 05 02:11:51 arch kernel:  ? asm_exc_invalid_op+0x1a/0x20
Apr 05 02:11:51 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:11:51 arch kernel:  commit_tail+0x94/0x130
Apr 05 02:11:51 arch kernel:  drm_atomic_helper_commit+0x11a/0x140
Apr 05 02:11:51 arch kernel:  drm_atomic_commit+0x9a/0xd0
Apr 05 02:11:51 arch kernel:  ? __pfx___drm_printfn_info+0x10/0x10
Apr 05 02:11:51 arch kernel:  drm_atomic_helper_commit_duplicated_state+0xf2/0x100
Apr 05 02:11:51 arch kernel:  drm_atomic_helper_resume+0xa5/0x160
Apr 05 02:11:51 arch kernel:  dm_resume+0x364/0x950 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? smum_send_msg_to_smc_with_parameter+0x93/0x110 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? smum_send_msg_to_smc+0x8b/0x100 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:11:51 arch kernel:  ? smu7_force_dpm_level+0x229/0x590 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  amdgpu_device_ip_resume_phase2+0x52/0xc0 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  amdgpu_device_resume+0xa0/0x330 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:11:51 arch kernel:  amdgpu_pmops_runtime_resume+0x82/0xf0 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:11:51 arch kernel:  __rpm_callback+0x44/0x170
Apr 05 02:11:51 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:11:51 arch kernel:  rpm_resume+0x66b/0x8e0
Apr 05 02:11:51 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:11:51 arch kernel:  ? __x64_sys_futex+0x2a1/0x3e0
Apr 05 02:11:51 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:11:51 arch kernel:  __pm_runtime_resume+0x4b/0x80
Apr 05 02:11:51 arch kernel:  amdgpu_drm_ioctl+0x38/0x90 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:11:51 arch kernel:  __x64_sys_ioctl+0x97/0xd0
Apr 05 02:11:51 arch kernel:  do_syscall_64+0x64/0xe0
Apr 05 02:11:51 arch kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 05 02:11:51 arch kernel: RIP: 0033:0x7f1b052874ff
Apr 05 02:11:51 arch kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Apr 05 02:11:51 arch kernel: RSP: 002b:00007ffc40af5f20 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 05 02:11:51 arch kernel: RAX: ffffffffffffffda RBX: 00007ffc40af5ff0 RCX: 00007f1b052874ff
Apr 05 02:11:51 arch kernel: RDX: 00007ffc40af5fc0 RSI: 00000000c0106442 RDI: 0000000000000017
Apr 05 02:11:51 arch kernel: RBP: 00007ffc40af5fc0 R08: 000064ec87868250 R09: 0000000000000000
Apr 05 02:11:51 arch kernel: R10: 00007ffc40af61c0 R11: 0000000000000246 R12: 00000000c0106442
Apr 05 02:11:51 arch kernel: R13: 0000000000000017 R14: 00001d4000fe6b50 R15: 0000000000000039
Apr 05 02:11:51 arch kernel:  </TASK>
Apr 05 02:11:51 arch kernel: ---[ end trace 0000000000000000 ]---
Apr 05 02:11:51 arch kernel: [drm] UVD and UVD ENC initialized successfully.
Apr 05 02:11:51 arch kernel: [drm] VCE initialized successfully.
Apr 05 02:12:02 arch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400E80000).
Apr 05 02:12:02 arch kernel: amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.1.0 test failed (-110)
Apr 05 02:12:02 arch kernel: ------------[ cut here ]------------
Apr 05 02:12:02 arch kernel: WARNING: CPU: 0 PID: 17621 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8057 amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu]
Apr 05 02:12:02 arch kernel: Modules linked in: snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_n>
Apr 05 02:12:02 arch kernel:  snd_timer nf_reject_ipv6 nft_reject snd soundcore usb_storage hfsplus hfs cdrom nft_masq btrfs nft_ct blake2b_generic xor raid6_pq ext4 nft_chain_nat nf_nat nf_conntrack crc16 nf_defrag_ipv6 mbcache nf_defrag_ipv4 jbd2 exfat>
Apr 05 02:12:02 arch kernel: CPU: 0 PID: 17621 Comm: brave Tainted: P        W  OE      6.7.4-zen1-1-zen #1 3d4d8321df48782f53772bf77b31195eea22b03b
Apr 05 02:12:02 arch kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F37h 12/25/2023
Apr 05 02:12:02 arch kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu]
Apr 05 02:12:02 arch kernel: Code: 4c 89 9d 60 fe ff ff e8 cc 92 27 dd 4c 8b 9d 60 fe ff ff e9 86 dd ff ff 0f 0b 49 8b 3e e8 b6 5b dd dc 85 c0 0f 84 1c d7 ff ff <0f> 0b e9 15 d7 ff ff 48 8b 85 88 fe ff ff 31 d2 31 f6 48 8d b8 60
Apr 05 02:12:02 arch kernel: RSP: 0018:ffffb2781ed7b8a0 EFLAGS: 00010282
Apr 05 02:12:02 arch kernel: RAX: 00000000ffffffea RBX: ffff9ff1d80cd800 RCX: 0000000000000000
Apr 05 02:12:02 arch kernel: RDX: 0000000000000002 RSI: 0000000000000297 RDI: ffff9ff1d818015c
Apr 05 02:12:02 arch kernel: RBP: ffffb2781ed7baf8 R08: 0000000000000006 R09: 0000000000000000
Apr 05 02:12:02 arch kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9ff7d3512100
Apr 05 02:12:02 arch kernel: R13: 0000000000000005 R14: ffff9ff7d3fe6e00 R15: ffff9ff1d8180178
Apr 05 02:12:02 arch kernel: FS:  00007f1b04183140(0000) GS:ffffa000bea00000(0000) knlGS:0000000000000000
Apr 05 02:12:02 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 05 02:12:02 arch kernel: CR2: 00000278002b5ffc CR3: 00000004d92ce000 CR4: 0000000000f50ef0
Apr 05 02:12:02 arch kernel: PKRU: 55555558
Apr 05 02:12:02 arch kernel: Call Trace:
Apr 05 02:12:02 arch kernel:  <TASK>
Apr 05 02:12:02 arch kernel:  ? __warn+0x81/0x1b0
Apr 05 02:12:02 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? report_bug+0x202/0x270
Apr 05 02:12:02 arch kernel:  ? handle_bug+0x3c/0x80
Apr 05 02:12:02 arch kernel:  ? exc_invalid_op+0x19/0xc0
Apr 05 02:12:02 arch kernel:  ? asm_exc_invalid_op+0x1a/0x20
Apr 05 02:12:02 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  commit_tail+0x94/0x130
Apr 05 02:12:02 arch kernel:  drm_atomic_helper_commit+0x11a/0x140
Apr 05 02:12:02 arch kernel:  drm_atomic_commit+0x9a/0xd0
Apr 05 02:12:02 arch kernel:  ? __pfx___drm_printfn_info+0x10/0x10
Apr 05 02:12:02 arch kernel:  drm_atomic_helper_commit_duplicated_state+0xf2/0x100
Apr 05 02:12:02 arch kernel:  drm_atomic_helper_resume+0xa5/0x160
Apr 05 02:12:02 arch kernel:  dm_resume+0x364/0x950 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? smum_send_msg_to_smc_with_parameter+0x93/0x110 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? smum_send_msg_to_smc+0x8b/0x100 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  ? smu7_force_dpm_level+0x229/0x590 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  amdgpu_device_ip_resume_phase2+0x52/0xc0 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  amdgpu_device_resume+0xa0/0x330 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:12:02 arch kernel:  amdgpu_pmops_runtime_resume+0x82/0xf0 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:12:02 arch kernel:  __rpm_callback+0x44/0x170
Apr 05 02:12:02 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:12:02 arch kernel:  rpm_resume+0x66b/0x8e0
Apr 05 02:12:02 arch kernel:  __pm_runtime_resume+0x4b/0x80
Apr 05 02:12:02 arch kernel:  amdgpu_drm_ioctl+0x38/0x90 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:02 arch kernel:  __x64_sys_ioctl+0x97/0xd0
Apr 05 02:12:02 arch kernel:  do_syscall_64+0x64/0xe0
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  ? swake_up_one+0x39/0x80
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  ? __rseq_handle_notify_resume+0xa9/0x590
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  ? task_mm_cid_work+0x1a1/0x220
Apr 05 02:12:02 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:02 arch kernel:  ? exit_to_user_mode_prepare+0x17d/0x1f0
Apr 05 02:12:02 arch kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 05 02:12:02 arch kernel: RIP: 0033:0x7f1b052874ff
Apr 05 02:12:02 arch kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Apr 05 02:12:02 arch kernel: RSP: 002b:00007ffc40af6300 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 05 02:12:02 arch kernel: RAX: ffffffffffffffda RBX: 00007ffc40af63d0 RCX: 00007f1b052874ff
Apr 05 02:12:02 arch kernel: RDX: 00007ffc40af63a0 RSI: 00000000c0106442 RDI: 0000000000000017
Apr 05 02:12:02 arch kernel: RBP: 00007ffc40af63a0 R08: 000000000000fffe R09: 0000000000000000
Apr 05 02:12:02 arch kernel: R10: 00007ffc40af65a0 R11: 0000000000000246 R12: 00000000c0106442
Apr 05 02:12:02 arch kernel: R13: 0000000000000017 R14: 00001d4000fe6b50 R15: 0000000000000039
Apr 05 02:12:02 arch kernel:  </TASK>
Apr 05 02:12:02 arch kernel: ---[ end trace 0000000000000000 ]---
Apr 05 02:12:02 arch kernel: [drm] UVD and UVD ENC initialized successfully.
Apr 05 02:12:02 arch kernel: [drm] VCE initialized successfully.
Apr 05 02:12:32 arch kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400E80000).
Apr 05 02:12:33 arch kernel: amdgpu 0000:0f:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.1.0 test failed (-110)
Apr 05 02:12:33 arch kernel: ------------[ cut here ]------------
Apr 05 02:12:33 arch kernel: WARNING: CPU: 16 PID: 17621 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8057 amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu]
Apr 05 02:12:33 arch kernel: Modules linked in: snd_seq_dummy snd_seq_midi snd_hrtimer snd_seq_midi_event snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle iptable_n>
Apr 05 02:12:33 arch kernel:  snd_timer nf_reject_ipv6 nft_reject snd soundcore usb_storage hfsplus hfs cdrom nft_masq btrfs nft_ct blake2b_generic xor raid6_pq ext4 nft_chain_nat nf_nat nf_conntrack crc16 nf_defrag_ipv6 mbcache nf_defrag_ipv4 jbd2 exfat>
Apr 05 02:12:33 arch kernel: CPU: 16 PID: 17621 Comm: brave Tainted: P        W  OE      6.7.4-zen1-1-zen #1 3d4d8321df48782f53772bf77b31195eea22b03b
Apr 05 02:12:33 arch kernel: Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS MASTER/X570 AORUS MASTER, BIOS F37h 12/25/2023
Apr 05 02:12:33 arch kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu]
Apr 05 02:12:33 arch kernel: Code: 4c 89 9d 60 fe ff ff e8 cc 92 27 dd 4c 8b 9d 60 fe ff ff e9 86 dd ff ff 0f 0b 49 8b 3e e8 b6 5b dd dc 85 c0 0f 84 1c d7 ff ff <0f> 0b e9 15 d7 ff ff 48 8b 85 88 fe ff ff 31 d2 31 f6 48 8d b8 60
Apr 05 02:12:33 arch kernel: RSP: 0018:ffffb2781ed7b8e8 EFLAGS: 00010282
Apr 05 02:12:33 arch kernel: RAX: 00000000ffffffea RBX: ffff9ff1d80cd800 RCX: 0000000000000000
Apr 05 02:12:33 arch kernel: RDX: 0000000000000002 RSI: 0000000000000297 RDI: ffff9ff1d818015c
Apr 05 02:12:33 arch kernel: RBP: ffffb2781ed7bb40 R08: 0000000000000006 R09: 0000000000000000
Apr 05 02:12:33 arch kernel: R10: 0000000000000001 R11: 0000000000000000 R12: ffff9ff59a60cd00
Apr 05 02:12:33 arch kernel: R13: 0000000000000005 R14: ffff9ff792209a00 R15: ffff9ff1d8180178
Apr 05 02:12:33 arch kernel: FS:  00007f1b04183140(0000) GS:ffffa000bee00000(0000) knlGS:0000000000000000
Apr 05 02:12:33 arch kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 05 02:12:33 arch kernel: CR2: 000072ccd844a000 CR3: 00000004d92ce000 CR4: 0000000000f50ef0
Apr 05 02:12:33 arch kernel: PKRU: 55555558
Apr 05 02:12:33 arch kernel: Call Trace:
Apr 05 02:12:33 arch kernel:  <TASK>
Apr 05 02:12:33 arch kernel:  ? __warn+0x81/0x1b0
Apr 05 02:12:33 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? report_bug+0x202/0x270
Apr 05 02:12:33 arch kernel:  ? handle_bug+0x3c/0x80
Apr 05 02:12:33 arch kernel:  ? exc_invalid_op+0x19/0xc0
Apr 05 02:12:33 arch kernel:  ? asm_exc_invalid_op+0x1a/0x20
Apr 05 02:12:33 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0x3682/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? amdgpu_dm_atomic_commit_tail+0xd96/0x3e10 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  commit_tail+0x94/0x130
Apr 05 02:12:33 arch kernel:  drm_atomic_helper_commit+0x11a/0x140
Apr 05 02:12:33 arch kernel:  drm_atomic_commit+0x9a/0xd0
Apr 05 02:12:33 arch kernel:  ? __pfx___drm_printfn_info+0x10/0x10
Apr 05 02:12:33 arch kernel:  drm_atomic_helper_commit_duplicated_state+0xf2/0x100
Apr 05 02:12:33 arch kernel:  drm_atomic_helper_resume+0xa5/0x160
Apr 05 02:12:33 arch kernel:  dm_resume+0x364/0x950 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? smum_send_msg_to_smc_with_parameter+0x93/0x110 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? smum_send_msg_to_smc+0x8b/0x100 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:33 arch kernel:  ? smu7_force_dpm_level+0x229/0x590 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  amdgpu_device_ip_resume_phase2+0x52/0xc0 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  amdgpu_device_resume+0xa0/0x330 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:12:33 arch kernel:  amdgpu_pmops_runtime_resume+0x82/0xf0 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:12:33 arch kernel:  __rpm_callback+0x44/0x170
Apr 05 02:12:33 arch kernel:  ? __pfx_pci_pm_runtime_resume+0x10/0x10
Apr 05 02:12:33 arch kernel:  rpm_resume+0x66b/0x8e0
Apr 05 02:12:33 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:33 arch kernel:  __pm_runtime_resume+0x4b/0x80
Apr 05 02:12:33 arch kernel:  amdgpu_drm_ioctl+0x38/0x90 [amdgpu a0a6ec898d79d235507c715346f6ed73445a8e9e]
Apr 05 02:12:33 arch kernel:  __x64_sys_ioctl+0x97/0xd0
Apr 05 02:12:33 arch kernel:  do_syscall_64+0x64/0xe0
Apr 05 02:12:33 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:33 arch kernel:  ? exit_to_user_mode_prepare+0x132/0x1f0
Apr 05 02:12:33 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:33 arch kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
Apr 05 02:12:33 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:33 arch kernel:  ? do_syscall_64+0x70/0xe0
Apr 05 02:12:33 arch kernel:  ? do_syscall_64+0x70/0xe0
Apr 05 02:12:33 arch kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Apr 05 02:12:33 arch kernel:  ? exc_page_fault+0x7f/0x180
Apr 05 02:12:33 arch kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0x76
Apr 05 02:12:33 arch kernel: RIP: 0033:0x7f1b052874ff
Apr 05 02:12:33 arch kernel: Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 18 48 8b 44 24 18 64 48 2b 04 25 28 00 00
Apr 05 02:12:33 arch kernel: RSP: 002b:00007ffc40af6300 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 05 02:12:33 arch kernel: RSP: 002b:00007ffc40af6300 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Apr 05 02:12:33 arch kernel: RAX: ffffffffffffffda RBX: 00007ffc40af63d0 RCX: 00007f1b052874ff
Apr 05 02:12:33 arch kernel: RDX: 00007ffc40af63a0 RSI: 00000000c0106442 RDI: 0000000000000017
Apr 05 02:12:33 arch kernel: RBP: 00007ffc40af63a0 R08: 000000000000fffe R09: 0000000000000000
Apr 05 02:12:33 arch kernel: R10: 00007ffc40af65a0 R11: 0000000000000246 R12: 00000000c0106442
Apr 05 02:12:33 arch kernel: R13: 0000000000000017 R14: 00001d4000fe6b50 R15: 0000000000000039
Apr 05 02:12:33 arch kernel:  </TASK>
Apr 05 02:12:33 arch kernel: ---[ end trace 0000000000000000 ]---
Apr 05 02:12:33 arch kernel: [drm] UVD and UVD ENC initialized successfully.
Apr 05 02:12:33 arch kernel: [drm] VCE initialized successfully.
</snip>

Last edited by Ranguvar (2024-04-13 15:14:55)

Offline

#2 2024-04-06 07:13:40

seth
Member
Registered: 2012-09-03
Posts: 59,882

Re: [SOLVED] amdgpu kernel panics beginning with 6.7.1 - 6.7.4

Offline

#3 2024-04-13 15:14:29

Ranguvar
Member
Registered: 2008-08-12
Posts: 2,563

Re: [SOLVED] amdgpu kernel panics beginning with 6.7.1 - 6.7.4

Thanks for the idea.

In case anyone else finds this, 6.8.4/6.8.5 resolved it for me.

6.7.0 seemed fine, and I believe 6.7.1/6.7.2 failed to wake from idle as well but did not produce a kernel panic.

Offline

Board footer

Powered by FluxBB