You are not logged in.

#1 2024-05-23 15:07:04

clinta
Member
Registered: 2018-04-02
Posts: 10

Kernel 6.9 amdgpu crash with multiple monitors via MST

Kernel 6.8.9 and 6.8.9-zen both work fine. But upon upgrading to 6.9 I get a kernel crash on launching Hyprland.

I'm running on a ThinkPad L15 Gen 1 AMD system in a dock using MST to drive 4 monitors. I have 2 1080p monitors, 1 1920x1200 and 1 4k monitor.

This error does not occur if I turn off all but one 1080p monitor. I haven't thoroughly tried each permutations of monitors being on or off though.

This is the crash kernel log when I try to run with all the monitors on Linux 6.9.

[drm] Send DSC enable to synaptics
BUG: kernel NULL pointer dereference, address: 0000000000000008
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0 
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 5 PID: 1446 Comm: Hyprland Not tainted 6.9.1-arch1-2 #1 06928436e5a6b4805e171d14d8efa397d7db9ad0
Hardware name: LENOVO 20U7000VUS/20U7000VUS, BIOS R19ET48W (1.32 ) 10/30/2023
RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x270 [drm_display_helper]
Code: 01 00 00 48 8b 85 68 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 34 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8b 40 08 4d 8d 65 38 8b 88 90 00 00 00 b8 01 00 00 00 d3 e0 41
RSP: 0018:ffffa55143f47418 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff91ca2a47fd80 RCX: 0000000000000224
RDX: ffff91ca15bad600 RSI: ffff91c9c55b2800 RDI: ffff91ca2a47fd80
RBP: ffff91c9c1e98000 R08: 0000000000000001 R09: 0000000000000407
R10: 000000000000001b R11: 0000000000000001 R12: 0000000000000000
R13: ffff91ca09fefb40 R14: ffff91c9c55b2800 R15: 0000000000000224
FS:  000078e7ce64ab80(0000) GS:ffff91d830c80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000156f9a000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
 ? __die_body.cold+0x19/0x27
 ? page_fault_oops+0x15a/0x2b0
 ? exc_page_fault+0x81/0x190
 ? asm_exc_page_fault+0x26/0x30
 ? drm_dp_atomic_find_time_slots+0x5e/0x270 [drm_display_helper e20ce2ee8a03a2dcf51de8dc0b3d681e00835812]
 ? drm_dp_atomic_find_time_slots+0x28/0x270 [drm_display_helper e20ce2ee8a03a2dcf51de8dc0b3d681e00835812]
 compute_mst_dsc_configs_for_link+0x31f/0xb10 [amdgpu f9765449229a4c4ad337d3e542448922d280f459]
 ? dcn21_fast_validate_bw+0x406/0x4b0 [amdgpu f9765449229a4c4ad337d3e542448922d280f459]
 pre_validate_dsc+0x3f2/0x470 [amdgpu f9765449229a4c4ad337d3e542448922d280f459]
 amdgpu_dm_atomic_check+0x8aa/0x14d0 [amdgpu f9765449229a4c4ad337d3e542448922d280f459]
 ? srso_return_thunk+0x5/0x5f
 drm_atomic_check_only+0x5b2/0xa30
 drm_atomic_commit+0x60/0xd0
 ? __pfx___drm_printfn_info+0x10/0x10
 drm_mode_atomic_ioctl+0xa72/0xcb0
 ? srso_return_thunk+0x5/0x5f
 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
 drm_ioctl_kernel+0xb3/0x100
 drm_ioctl+0x27a/0x4e0
 ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
 amdgpu_drm_ioctl+0x4e/0x90 [amdgpu f9765449229a4c4ad337d3e542448922d280f459]
 __x64_sys_ioctl+0x97/0xd0
 do_syscall_64+0x82/0x160
 ? srso_return_thunk+0x5/0x5f
 ? xas_load+0x41/0x50
 ? srso_return_thunk+0x5/0x5f
 ? filemap_get_entry+0xde/0x140
 ? srso_return_thunk+0x5/0x5f
 ? shmem_get_folio_gfp+0x1bf/0x580
 ? srso_return_thunk+0x5/0x5f
 ? copy_page_from_iter_atomic+0xe6/0x6e0
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? balance_dirty_pages_ratelimited_flags+0x21/0x380
 ? srso_return_thunk+0x5/0x5f
 ? generic_perform_write+0x14e/0x230
 ? srso_return_thunk+0x5/0x5f
 ? shmem_file_write_iter+0x5e/0x90
 ? srso_return_thunk+0x5/0x5f
 ? vfs_write+0x296/0x460
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 ? syscall_exit_to_user_mode+0x75/0x210
 ? srso_return_thunk+0x5/0x5f
 ? do_syscall_64+0x8e/0x160
 ? srso_return_thunk+0x5/0x5f
 ? srso_return_thunk+0x5/0x5f
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x78e7d043a9ed
Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
RSP: 002b:00007fff52e266e0 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000607c4cb11990 RCX: 000078e7d043a9ed
RDX: 00007fff52e26780 RSI: 00000000c03864bc RDI: 000000000000000d
RBP: 00007fff52e26730 R08: 0000000000000007 R09: 0000000000000007
R10: 0000000000000003 R11: 0000000000000246 R12: 00000000c03864bc
R13: 000000000000000d R14: 0000607c4c9d13a0 R15: 0000607c4ca15220
 </TASK>
Modules linked in: ccm cmac algif_hash algif_skcipher af_alg cdc_mbim cdc_wdm cdc_ncm cdc_ether usbnet bnep vfat fat snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_pci_ps snd_usb_audio snd_amd_sdw_acpi snd_usbmidi_lib iwlmvm snd_ctl_led ledtrig_audio r8152 snd_ump soundwire_amd mii btusb intel_rapl_msr snd_rawmidi soundwire_generic_allocation amd_atl snd_hda_codec_realtek btrtl intel_rapl_common uvcvideo snd_seq_device soundwire_bus snd_hda_codec_generic btintel mac80211 videobuf2_vmalloc snd_hda_scodec_component btbcm snd_hda_codec_hdmi uvc snd_soc_core libarc4 btmtk videobuf2_memops ptp snd_compress ac97_bus pps_core videobuf2_v4l2 snd_pcm_dmaengine snd_hda_intel bluetooth snd_intel_dspcfg snd_rpl_pci_acp6x videodev snd_acp_pci snd_intel_sdw_acpi kvm_amd videobuf2_common ecdh_generic snd_acp_legacy_common mc crc16 ledtrig_netdev joydev snd_hda_codec mousedev snd_pci_acp6x iwlwifi think_lmi(+) kvm
 snd_hda_core r8169 snd_pci_acp5x rapl psmouse wmi_bmof firmware_attributes_class pcspkr acpi_cpufreq ucsi_acpi snd_rn_pci_acp3x snd_hwdep realtek cfg80211 typec_ucsi snd_acp_config sp5100_tco mdio_devres snd_pcm snd_soc_acpi typec ipmi_devintf snd_timer k10temp snd_pci_acp3x libphy i2c_piix4 ipmi_msghandler roles i2c_scmi mac_hid udl i2c_dev crypto_user loop nfnetlink ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor raid6_pq dm_crypt cbc encrypted_keys trusted asn1_encoder tee hid_logitech_hidpp hid_logitech_dj crct10dif_pclmul crc32_pclmul crc32c_intel hid_generic polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 usbhid serio_raw sha256_ssse3 atkbd dm_mod sdhci_pci sha1_ssse3 libps2 amdgpu cqhci thinkpad_acpi aesni_intel vivaldi_fmap nvme sdhci platform_profile crypto_simd snd nvme_core cryptd i8042 mmc_core xhci_pci ccp soundcore xhci_pci_renesas nvme_auth serio rfkill video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper
 drm_buddy drm_display_helper cec
CR2: 0000000000000008
---[ end trace 0000000000000000 ]---
RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x270 [drm_display_helper]
Code: 01 00 00 48 8b 85 68 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 34 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8b 40 08 4d 8d 65 38 8b 88 90 00 00 00 b8 01 00 00 00 d3 e0 41
RSP: 0018:ffffa55143f47418 EFLAGS: 00010293
RAX: 0000000000000000 RBX: ffff91ca2a47fd80 RCX: 0000000000000224
RDX: ffff91ca15bad600 RSI: ffff91c9c55b2800 RDI: ffff91ca2a47fd80
RBP: ffff91c9c1e98000 R08: 0000000000000001 R09: 0000000000000407
R10: 000000000000001b R11: 0000000000000001 R12: 0000000000000000
R13: ffff91ca09fefb40 R14: ffff91c9c55b2800 R15: 0000000000000224
FS:  000078e7ce64ab80(0000) GS:ffff91d830c80000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000008 CR3: 0000000156f9a000 CR4: 0000000000350ef0
note: Hyprland[1446] exited with irqs disabled

Offline

#2 2024-05-23 15:14:07

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Have you already looked into bisecting the issue? This looks like a regression:

- https://docs.kernel.org/admin-guide/rep … sions.html
- https://wiki.archlinux.org/title/Kernel … egressions

If you're not comfortable in doing that on your own I can also provide you prebuilt kernel images smile

Offline

#3 2024-05-23 15:22:00

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

I'll give it a shot and see if I can find the commit that introduced this. Thanks for the helpful links.

Offline

#4 2024-05-23 15:50:10

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Great, if you're stuck anywhere in the process just ask, I try to keep an eye on this thread smile

Offline

#5 2024-05-23 19:40:10

ddimi
Member
Registered: 2024-05-23
Posts: 1

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Same Problem here. Kernel 6.9 and 6.9.1
B550 with 7900xtx
I have to unplug one DP and start with 1 Monitor
after login, I can add the second Monitor without a problem.

I have 2 4K monitors connected with a KVM switch

A Second System on the same KVM switch (X399 with 6950 xt) have no Problems

Offline

#6 2024-05-24 06:41:38

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Bisect found the issue is in commit 480e035fc4c714fb5536e64ab9db04fedc89e910

A pretty massive commit, so I'm not sure what can be done to narrow it down further.

Offline

#7 2024-05-24 10:25:59

fardog
Member
Registered: 2015-03-14
Posts: 1

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

I've got the same issue here, Thinkpad P14s Gen 2 AMD; have rolled back to 6.8.9 in order to avoid the issue. Trace looks similar to the others, and my setup is similar: 2x 1440p monitors connected over displayport via a USB-C dock, using sway. Thanks for tracking it down to the commit that caused it, but similarly with a commit that big I've no idea how to help further, but here's the stacktrace in case it helps any.

May 22 09:05:55 belka kernel: BUG: kernel NULL pointer dereference, address: 0000000000000008
May 22 09:05:55 belka kernel: #PF: supervisor read access in kernel mode
May 22 09:05:55 belka kernel: #PF: error_code(0x0000) - not-present page
May 22 09:05:55 belka kernel: PGD 0 P4D 0 
May 22 09:05:55 belka kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
May 22 09:05:55 belka kernel: CPU: 13 PID: 1782 Comm: sway Not tainted 6.9.1-arch1-1 #1 8721656fa781c58301f7268d475f3e6380e2b47c
May 22 09:05:55 belka kernel: Hardware name: LENOVO 21A0003GUS/21A0003GUS, BIOS R1MET55W (1.25 ) 10/30/2023
May 22 09:05:55 belka kernel: RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x270 [drm_display_helper]
May 22 09:05:55 belka kernel: Code: 01 00 00 48 8b 85 68 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 34 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8b 40 08 4d 8d 65 38 8b 88 90 00 00 00 b8 01 00 00 00 d3 e0 41
May 22 09:05:55 belka kernel: RSP: 0018:ffffa00b03393750 EFLAGS: 00010293
May 22 09:05:55 belka kernel: RAX: 0000000000000000 RBX: ffff88ac0efe6380 RCX: 000000000000037b
May 22 09:05:55 belka kernel: RDX: ffff88ab82ea4000 RSI: ffff88abbdad4800 RDI: ffff88ac0efe6380
May 22 09:05:55 belka kernel: RBP: ffff88ab83fc6000 R08: 000000000000007c R09: ffff88ab84385078
May 22 09:05:55 belka kernel: R10: 7fe2851fffffffff R11: 000000000000037b R12: 0000000000000001
May 22 09:05:55 belka kernel: R13: ffff88abb5c08600 R14: ffff88abbdad4800 R15: 000000000000037b
May 22 09:05:55 belka kernel: FS:  00007087f6edb9c0(0000) GS:ffff88b192080000(0000) knlGS:0000000000000000
May 22 09:05:55 belka kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 09:05:55 belka kernel: CR2: 0000000000000008 CR3: 0000000150d20000 CR4: 0000000000f50ef0
May 22 09:05:55 belka kernel: PKRU: 55555554
May 22 09:05:55 belka kernel: Call Trace:
May 22 09:05:55 belka kernel:  <TASK>
May 22 09:05:55 belka kernel:  ? __die_body.cold+0x19/0x27
May 22 09:05:55 belka kernel:  ? page_fault_oops+0x15a/0x2b0
May 22 09:05:55 belka kernel:  ? exc_page_fault+0x81/0x190
May 22 09:05:55 belka kernel:  ? asm_exc_page_fault+0x26/0x30
May 22 09:05:55 belka kernel:  ? drm_dp_atomic_find_time_slots+0x5e/0x270 [drm_display_helper 65f67947d414ae68f393d33bb34483835212ecce]
May 22 09:05:55 belka kernel:  compute_mst_dsc_configs_for_link+0x31f/0xb10 [amdgpu 11785c3085e75bb1d1465c3bd7f7962d53ef457f]
May 22 09:05:55 belka kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
May 22 09:05:55 belka kernel:  ? dcn21_fast_validate_bw+0x406/0x4b0 [amdgpu 11785c3085e75bb1d1465c3bd7f7962d53ef457f]
May 22 09:05:55 belka kernel:  pre_validate_dsc+0x3f2/0x470 [amdgpu 11785c3085e75bb1d1465c3bd7f7962d53ef457f]
May 22 09:05:55 belka kernel:  amdgpu_dm_atomic_check+0x8aa/0x14d0 [amdgpu 11785c3085e75bb1d1465c3bd7f7962d53ef457f]
May 22 09:05:55 belka kernel:  ? internal_get_user_pages_fast+0x735/0x10d0
May 22 09:05:55 belka kernel:  ? __kmalloc_node_track_caller+0x1fa/0x410
May 22 09:05:55 belka kernel:  drm_atomic_check_only+0x5b2/0xa30
May 22 09:05:55 belka kernel:  drm_mode_atomic_ioctl+0x831/0xcb0
May 22 09:05:55 belka kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
May 22 09:05:55 belka kernel:  drm_ioctl_kernel+0xb3/0x100
May 22 09:05:55 belka kernel:  drm_ioctl+0x27a/0x4e0
May 22 09:05:55 belka kernel:  ? __pfx_drm_mode_atomic_ioctl+0x10/0x10
May 22 09:05:55 belka kernel:  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu 11785c3085e75bb1d1465c3bd7f7962d53ef457f]
May 22 09:05:55 belka kernel:  __x64_sys_ioctl+0x97/0xd0
May 22 09:05:55 belka kernel:  do_syscall_64+0x82/0x160
May 22 09:05:55 belka kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
May 22 09:05:55 belka kernel:  ? syscall_exit_to_user_mode+0x75/0x210
May 22 09:05:55 belka kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
May 22 09:05:55 belka kernel:  ? do_syscall_64+0x8e/0x160
May 22 09:05:55 belka kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
May 22 09:05:55 belka kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
May 22 09:05:55 belka kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
May 22 09:05:55 belka kernel: RIP: 0033:0x7087f7d8f9ed
May 22 09:05:55 belka kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
May 22 09:05:55 belka kernel: RSP: 002b:00007ffc15e30b80 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 22 09:05:55 belka kernel: RAX: ffffffffffffffda RBX: 00005dca1bddc050 RCX: 00007087f7d8f9ed
May 22 09:05:55 belka kernel: RDX: 00007ffc15e30c20 RSI: 00000000c03864bc RDI: 000000000000000b
May 22 09:05:55 belka kernel: RBP: 00007ffc15e30bd0 R08: 0000000000000007 R09: 0000000000000007
May 22 09:05:55 belka kernel: R10: 0000000000000003 R11: 0000000000000246 R12: 00000000c03864bc
May 22 09:05:55 belka kernel: R13: 000000000000000b R14: 00005dca1b4877c0 R15: 00005dca1be27e70
May 22 09:05:55 belka kernel:  </TASK>
May 22 09:05:55 belka kernel: Modules linked in: snd_seq_dummy snd_hrtimer rfcomm snd_seq snd_seq_device xt_MASQUERADE nf_conntrack_netlink xfrm_user xfrm_algo iptable_nat nf_nat br_netfilter bridge stp llc uinput overlay uhid cmac algif_hash algif_skcipher af_alg bnep btusb uvcvideo btrtl btintel videobuf2_vmalloc btbcm uvc btmtk videobuf2_memops videobuf2_v4l2 bluetooth videodev videobuf2_common snd_acp_legacy_mach joydev mousedev mc ecdh_generic snd_acp_mach snd_soc_nau8821 snd_soc_dmic snd_acp3x_rn snd_acp3x_pdm_dma amdgpu snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_ctl_led snd_pci_ps ledtrig_audio snd_amd_sdw_acpi soundwire_amd amd_atl snd_hda_codec_realtek intel_rapl_msr rtw89_8852ae soundwire_generic_allocation intel_rapl_common amdxcp rtw89_8852a soundwire_bus snd_hda_codec_generic drm_exec snd_hda_scodec_component rtw89_pci gpu_sched snd_hda_codec_hdmi snd_soc_core drm_buddy snd_compress rtw89_core i2c_algo_bit ac97_bus
May 22 09:05:55 belka kernel:  snd_hda_intel drm_suballoc_helper snd_pcm_dmaengine snd_intel_dspcfg drm_ttm_helper ttm snd_rpl_pci_acp6x kvm_amd snd_intel_sdw_acpi mac80211 snd_acp_pci ledtrig_netdev snd_hda_codec drm_display_helper snd_acp_legacy_common thinkpad_acpi snd_pci_acp6x snd_hda_core libarc4 cec platform_profile kvm snd_pci_acp5x r8169 think_lmi snd_hwdep vfat firmware_attributes_class wmi_bmof fat snd_pcm ucsi_acpi snd_rn_pci_acp3x video realtek cfg80211 rapl psmouse snd_acp_config ip6t_REJECT snd_timer typec_ucsi sp5100_tco mdio_devres snd_soc_acpi nf_reject_ipv6 snd typec rfkill libphy i2c_piix4 k10temp snd_pci_acp3x soundcore roles xt_hl wmi i2c_scmi ip6t_rt mac_hid ipt_REJECT nf_reject_ipv4 xt_LOG nf_log_syslog xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter pkcs8_key_parser crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 hid_generic usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee
May 22 09:05:55 belka kernel:  dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 sdhci_pci serio_raw sha1_ssse3 atkbd cqhci aesni_intel nvme libps2 sdhci crypto_simd vivaldi_fmap nvme_core cryptd xhci_pci mmc_core ccp i8042 xhci_pci_renesas nvme_auth serio
May 22 09:05:55 belka kernel: CR2: 0000000000000008
May 22 09:05:55 belka kernel: ---[ end trace 0000000000000000 ]---
May 22 09:05:55 belka kernel: RIP: 0010:drm_dp_atomic_find_time_slots+0x5e/0x270 [drm_display_helper]
May 22 09:05:55 belka kernel: Code: 01 00 00 48 8b 85 68 05 00 00 48 63 80 88 00 00 00 3b 43 28 0f 8d 34 01 00 00 48 8b 53 30 48 8d 04 80 48 8d 04 c2 48 8b 40 18 <48> 8b 40 08 4d 8d 65 38 8b 88 90 00 00 00 b8 01 00 00 00 d3 e0 41
May 22 09:05:55 belka kernel: RSP: 0018:ffffa00b03393750 EFLAGS: 00010293
May 22 09:05:55 belka kernel: RAX: 0000000000000000 RBX: ffff88ac0efe6380 RCX: 000000000000037b
May 22 09:05:55 belka kernel: RDX: ffff88ab82ea4000 RSI: ffff88abbdad4800 RDI: ffff88ac0efe6380
May 22 09:05:55 belka kernel: RBP: ffff88ab83fc6000 R08: 000000000000007c R09: ffff88ab84385078
May 22 09:05:55 belka kernel: R10: 7fe2851fffffffff R11: 000000000000037b R12: 0000000000000001
May 22 09:05:55 belka kernel: R13: ffff88abb5c08600 R14: ffff88abbdad4800 R15: 000000000000037b
May 22 09:05:55 belka kernel: FS:  00007087f6edb9c0(0000) GS:ffff88b192080000(0000) knlGS:0000000000000000
May 22 09:05:55 belka kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
May 22 09:05:55 belka kernel: CR2: 0000000000000008 CR3: 0000000150d20000 CR4: 0000000000f50ef0
May 22 09:05:55 belka kernel: PKRU: 55555554
May 22 09:05:55 belka kernel: note: sway[1782] exited with irqs disabled

Offline

#8 2024-05-24 11:54:34

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

clinta wrote:

Bisect found the issue is in commit 480e035fc4c714fb5536e64ab9db04fedc89e910

A pretty massive commit, so I'm not sure what can be done to narrow it down further.


Hmm, it seems like you have landed on a merge commit which is quite unfortunate as that does not really narrow down the source of the issue  hmm
Are you sure that you took the right turn on every step of the bisection?

Offline

#9 2024-05-24 12:41:42

loqs
Member
Registered: 2014-03-06
Posts: 18,633

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

$ git bisect start
status: waiting for both good and bad commits
$ git bisect good v6.8
status: waiting for bad commit, 1 good commit known
$ git bisect bad v6.9
Bisecting: 7604 revisions left to test after this (roughly 13 steps)
[480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
$ git bisect bad
Bisecting: 2870 revisions left to test after this (roughly 12 steps)
[9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
$ git bisect good
Bisecting: 1417 revisions left to test after this (roughly 11 steps)
[119b225f01e4d3ce974cd3b4d982c76a380c796d] Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

I would suggest rechecking 119b225f01e4d3ce974cd3b4d982c76a380c796d v6.8-rc6-1453-g119b225f01e4.

Last edited by loqs (2024-05-24 12:42:58)

Offline

#10 2024-05-24 13:35:21

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

loqs wrote:
$ git bisect start
status: waiting for both good and bad commits
$ git bisect good v6.8
status: waiting for bad commit, 1 good commit known
$ git bisect bad v6.9
Bisecting: 7604 revisions left to test after this (roughly 13 steps)
[480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
$ git bisect bad
Bisecting: 2870 revisions left to test after this (roughly 12 steps)
[9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
$ git bisect good
Bisecting: 1417 revisions left to test after this (roughly 11 steps)
[119b225f01e4d3ce974cd3b4d982c76a380c796d] Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

I would suggest rechecking 119b225f01e4d3ce974cd3b4d982c76a380c796d v6.8-rc6-1453-g119b225f01e4.

I'm currently up and running on the last good commit bisect found, which is e5e038b7ae9d and well after 119b225.

Offline

#11 2024-05-24 14:59:42

loqs
Member
Registered: 2014-03-06
Posts: 18,633

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

The problem with the bisection result is it has found the merge commit is broken while all the commits that were in the merge were good (as the commits that make up the merge are always before the merge commit) In this case the parents of 480e035fc4c714fb5536e64ab9db04fedc89e910 are e5e038b7ae9da96b93974bf072ca1876899a01a3 `Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs` and 119b225f01e4d3ce974cd3b4d982c76a380c796d `Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next`:

$ git bisect log
git bisect start
# status: waiting for both good and bad commits
# good: [e8f897f4afef0031fe618a8e94127a0934896aba] Linux 6.8
git bisect good e8f897f4afef0031fe618a8e94127a0934896aba
# status: waiting for bad commit, 1 good commit known
# bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# bad: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910
# good: [9187210eee7d87eea37b45ea93454a88681894a4] Merge tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect good 9187210eee7d87eea37b45ea93454a88681894a4
# good: [119b225f01e4d3ce974cd3b4d982c76a380c796d] Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect good 119b225f01e4d3ce974cd3b4d982c76a380c796d
# good: [6cdebf62a159f31351946685b02941c968b96e49] Merge tag 'spi-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
git bisect good 6cdebf62a159f31351946685b02941c968b96e49
# good: [943446795909929f261565cebafb3b56d66cc513] Merge tag 'acpi-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
git bisect good 943446795909929f261565cebafb3b56d66cc513
# good: [ef2d4a00df38dfa79ce08fbd8c03278e2d87126a] xfs: split tracepoint classes for deferred items
git bisect good ef2d4a00df38dfa79ce08fbd8c03278e2d87126a
# good: [279d44ceb8a495d287ec563964f2ed04b0d53b0e] Merge tag '6.9-rc-smb3-client-fixes-part1' of git://git.samba.org/sfrench/cifs-2.6
git bisect good 279d44ceb8a495d287ec563964f2ed04b0d53b0e
# good: [69fc23efc7e5030194ecaf4c108d4c23cfcd1a21] kernel-doc: Add unary operator * to $type_param_ref
git bisect good 69fc23efc7e5030194ecaf4c108d4c23cfcd1a21
# good: [d27f41eed5d64f0f4ca2fcb44f417e7dd9d23e11] MAINTAINERS: add missing git address for ext2 entry
git bisect good d27f41eed5d64f0f4ca2fcb44f417e7dd9d23e11
# good: [1715f710e787493f3631d5890c86c9bdb30a36d8] Merge tag 'fsnotify_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect good 1715f710e787493f3631d5890c86c9bdb30a36d8
# good: [0d5fb7720b636578957a1a2409a397eea581be4d] ext2: remove SLAB_MEM_SPREAD flag usage
git bisect good 0d5fb7720b636578957a1a2409a397eea581be4d
# good: [e29dd522c1d1f1d5dc59ab300a77889d80e80995] quota: remove SLAB_MEM_SPREAD flag usage
git bisect good e29dd522c1d1f1d5dc59ab300a77889d80e80995
# good: [e5e038b7ae9da96b93974bf072ca1876899a01a3] Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
git bisect good e5e038b7ae9da96b93974bf072ca1876899a01a3
# first bad commit: [480e035fc4c714fb5536e64ab9db04fedc89e910] Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel

Last edited by loqs (2024-05-24 15:06:06)

Offline

#12 2024-05-24 15:05:42

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

I can try a bisect again, just checking between 480e035fc4c714fb5536e64ab9db04fedc89e910 as bad and e5e038b7ae9da96b93974bf072ca1876899a01a3 as good.

Offline

#13 2024-05-24 15:13:22

loqs
Member
Registered: 2014-03-06
Posts: 18,633

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Do you have the bisection builds saved for easy reuse?  If not you could reuse gromit's https://pkgbuild.com/~gromit/linux-56/l … kg.tar.zst
Edit:
e5e038b7ae9da96b93974bf072ca1876899a01a3 `Merge tag 'fs_for_v6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs` being good is expected the issue is why 119b225f01e4d3ce974cd3b4d982c76a380c796d `Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next is also good.

Last edited by loqs (2024-05-24 15:17:36)

Offline

#14 2024-05-24 15:21:11

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

The builds here don't really have what I need. And my previous builds aren't going to be off the same commits if I bisect between two different different commits. For example, my first build in this newer bisect is 9ac4beb7578a, which was not tested in the previous bisect.

Offline

#15 2024-05-24 15:33:32

loqs
Member
Registered: 2014-03-06
Posts: 18,633

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

I was worried 480e035fc4c714fb5536e64ab9db04fedc89e910 as bad and e5e038b7ae9da96b93974bf072ca1876899a01a3 as good would not test 119b225f01e4d3ce974cd3b4d982c76a380c796d but it can reach that merge commit

$ git bisect start
$ git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910
status: waiting for good commit(s), bad commit known
$ git bisect good e5e038b7ae9da96b93974bf072ca1876899a01a3
Bisecting: 742 revisions left to test after this (roughly 10 steps)
[9ac4beb7578a88baa4f7e6a59eeb5be79d7b011a] Merge tag 'drm-misc-next-2024-02-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
$ git bisect good
Bisecting: 366 revisions left to test after this (roughly 9 steps)
[0475184905387dc481927f87e4abd63c3d8fa51d] Merge drm/drm-next into drm-misc-next
$ git bisect good
Bisecting: 171 revisions left to test after this (roughly 8 steps)
[ca66211a55b9e582a560b0f341dd9058cab78f39] Merge tag 'drm-msm-next-2024-02-29' of https://gitlab.freedesktop.org/drm/msm into drm-next
$ git bisect good
Bisecting: 85 revisions left to test after this (roughly 7 steps)
[af165fb00a1eb390976f6016fc69df0da0d27fad] Merge tag 'amd-drm-next-6.9-2024-03-01' of https://gitlab.freedesktop.org/agd5f/linux into drm-next
$ git bisect good
Bisecting: 36 revisions left to test after this (roughly 6 steps)
[b9511c6d277c31b13d4f3128eba46f4e0733d734] Merge tag 'drm-msm-next-2024-03-07' of https://gitlab.freedesktop.org/drm/msm into drm-next
$ git bisect good
Bisecting: 18 revisions left to test after this (roughly 4 steps)
[45bbf800c5f933de0002b26a44ff04f569247964] drm/amdkfd: Use SQC when TCP would fail in gfx10.1 context save
$ git bisect good
Bisecting: 9 revisions left to test after this (roughly 3 steps)
[72f4ae0a64b93dee25a5d2fed9d5c0d90eaa0fdb] drm/amdgpu/vpe: add PRED_EXE and COLLAB_SYNC OPCODE
$ git bisect good
Bisecting: 4 revisions left to test after this (roughly 2 steps)
[1e84112e53d220c8b8d62fe1ff35b0d43fdb7bc4] drm/amdgpu: add smu 14.0.1 support
$ git bisect good
Bisecting: 2 revisions left to test after this (roughly 1 step)
[2c79b0bca2bac73b1c31b3a92df8f101c1261b93] drm/amd/pm: wait for completion of the EnableGfxImu message
$ git bisect good
Bisecting: 0 revisions left to test after this (roughly 1 step)
[119b225f01e4d3ce974cd3b4d982c76a380c796d] Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

If you tell git 119b225f01e4d3ce974cd3b4d982c76a380c796d is good as well as 480e035fc4c714fb5536e64ab9db04fedc89e910 is bad and e5e038b7ae9da96b93974bf072ca1876899a01a3 is good  then:

$ git bisect start
status: waiting for both good and bad commits
$ git bisect bad 480e035fc4c714fb5536e64ab9db04fedc89e910
status: waiting for good commit(s), bad commit known
$ git bisect good e5e038b7ae9da96b93974bf072ca1876899a01a3
Bisecting: 742 revisions left to test after this (roughly 10 steps)
[9ac4beb7578a88baa4f7e6a59eeb5be79d7b011a] Merge tag 'drm-misc-next-2024-02-15' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
$ git bisect good 119b225f01e4d3ce974cd3b4d982c76a380c796d
480e035fc4c714fb5536e64ab9db04fedc89e910 is the first bad commit

Offline

#16 2024-05-24 15:44:25

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

I must have some misunderstanding of how git merges or bisect works. Just looking at the commit order it looks like 119b225f0 is before e5e038b7. So if e5e038b7 is good, I wouldn't expect 119b225f0 to be tested.

Offline

#17 2024-05-24 16:24:35

loqs
Member
Registered: 2014-03-06
Posts: 18,633

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Perhaps using gitk to visualize the git tree for 480e035fc4c714fb5536e64ab9db04fedc89e910 will help:

$ git checkout 480e035fc4c714fb5536e64ab9db04fedc89e910
HEAD is now at 480e035fc4c7 Merge tag 'drm-next-2024-03-13' of https://gitlab.freedesktop.org/drm/kernel
$ gitk

Offline

#18 2024-05-25 00:04:14

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

I must have done something wrong during the first bisect. Probably accidentally marked something as bad that was good. This second bisect has a much more logical commit, since it deals with MST. This is the first bad commit https://github.com/torvalds/linux/commi … 0014d1fa6a

git bisect start
# status: waiting for both good and bad commits
# bad: [a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6] Linux 6.9
git bisect bad a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
# status: waiting for good commit(s), bad commit known
# good: [119b225f01e4d3ce974cd3b4d982c76a380c796d] Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https:/
/gitlab.freedesktop.org/agd5f/linux into drm-next
git bisect good 119b225f01e4d3ce974cd3b4d982c76a380c796d
# skip: [a3df5d5422b4edfcfe658d5057e7e059571e32ce] Merge tag 'pinctrl-v6.9-1' of git://git.kernel.org/p
ub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect skip a3df5d5422b4edfcfe658d5057e7e059571e32ce
# skip: [4adee4e1a354bd318205afd3f8defc99299fb47a] MAINTAINERS: Drop redundant hwmon entries
git bisect skip 4adee4e1a354bd318205afd3f8defc99299fb47a
# skip: [a4ec240f6b7c21cf846d10017c3ce423a0eae92c] drm/prime: Unbreak virtgpu dma-buf export
git bisect skip a4ec240f6b7c21cf846d10017c3ce423a0eae92c
# skip: [5f813b0447feef3a4883b66e600c7317a4d7d76b] wifi: ath10k: correctly document enum wmi_tlv_tx_pau
se_id
git bisect skip 5f813b0447feef3a4883b66e600c7317a4d7d76b
# skip: [4b0bf9a0127029054c2fa18ba5b3f3ddc45f54ed] riscv: compat_vdso: install compat_vdso.so.dbg to /l
ib/modules/*/vdso/
git bisect skip 4b0bf9a0127029054c2fa18ba5b3f3ddc45f54ed
# skip: [1c4025d4ea0cabe05b2425889eed9298c713c771] ALSA: seq: oss: Use automatic cleanup of kfree()
git bisect skip 1c4025d4ea0cabe05b2425889eed9298c713c771
# skip: [caabd859c41b50a571cfdf7747de9f245c5d531b] tcp: Add skb addr and sock addr to arguments of trac
epoint tcp_probe.
git bisect skip caabd859c41b50a571cfdf7747de9f245c5d531b
# skip: [66b53cb790e794b180cbd4d6bffa34dadbc7ab3d] iio: pressure: hsc030pa: use signed type to hold div
_64() result
git bisect skip 66b53cb790e794b180cbd4d6bffa34dadbc7ab3d
# skip: [5f20e6ab1f65aaaaae248e6946d5cb6d039e7de8] Merge tag 'for-netdev' of https://git.kernel.org/pub
/scm/linux/kernel/git/bpf/bpf-next
git bisect skip 5f20e6ab1f65aaaaae248e6946d5cb6d039e7de8
# bad: [775a0eca3357d79311c0225458f8fe90791a8857] Merge tag 'x86_urgent_for_v6.9' of git://git.kernel.o
rg/pub/scm/linux/kernel/git/tip/tip
git bisect bad 775a0eca3357d79311c0225458f8fe90791a8857
# bad: [cf87f46fd34d6c19283d9625a7822f20d90b64a4] Merge tag 'drm-fixes-2024-05-11' of https://gitlab.fr
eedesktop.org/drm/kernel
git bisect bad cf87f46fd34d6c19283d9625a7822f20d90b64a4
# good: [de120e1d692d73c7eefa3278837b1eb68f90728a] KVM: x86/pmu: Set enable bits for GP counters in PER
F_GLOBAL_CTRL at "RESET"
git bisect good de120e1d692d73c7eefa3278837b1eb68f90728a
# good: [e33c4963bf536900f917fb65a687724d5539bc21] Merge tag 'nfsd-6.9-5' of git://git.kernel.org/pub/s
cm/linux/kernel/git/cel/linux
git bisect good e33c4963bf536900f917fb65a687724d5539bc21
# good: [545c494465d24b10a4370545ba213c0916f70b95] Merge tag 'net-6.9-rc7' of git://git.kernel.org/pub/
scm/linux/kernel/git/netdev/net
git bisect good 545c494465d24b10a4370545ba213c0916f70b95
# good: [d099637d074b9d8170b06365f575f6cf03d614f5] Merge tag 'x86-urgent-2024-05-05' of git://git.kerne
l.org/pub/scm/linux/kernel/git/tip/tip
git bisect good d099637d074b9d8170b06365f575f6cf03d614f5
# good: [8c3b7565f81e030ef448378acd1b35dabb493e3b] Merge tag 'net-6.9-rc8' of git://git.kernel.org/pub/
scm/linux/kernel/git/netdev/net
git bisect good 8c3b7565f81e030ef448378acd1b35dabb493e3b
# good: [cfb4be1a61200fbbd29f2699b11899789855bbe4] Merge tag 'gpio-fixes-for-v6.9' of git://git.kernel.
org/pub/scm/linux/kernel/git/brgl/linux
git bisect good cfb4be1a61200fbbd29f2699b11899789855bbe4
# bad: [b61821bb32c5577272408e1b05e6a0879a64257f] Merge tag 'drm-misc-fixes-2024-05-10' of https://gitl
ab.freedesktop.org/drm/misc/kernel into drm-fixes
git bisect bad b61821bb32c5577272408e1b05e6a0879a64257f
# bad: [8d2c930735f850e5be6860aeb39b27ac73ca192f] drm/amdgpu: Fix comparison in amdgpu_res_cpu_visible
git bisect bad 8d2c930735f850e5be6860aeb39b27ac73ca192f
# bad: [cf37a5318dd68aa0eb909e210aebd219bc0ff64a] drm/amd/display: MST DSC check for older devices
git bisect bad cf37a5318dd68aa0eb909e210aebd219bc0ff64a
# bad: [3f0b5af17575c95457538335750c630014d1fa6a] drm/amd/display: Fix DSC-re-computing
git bisect bad 3f0b5af17575c95457538335750c630014d1fa6a
# good: [284f141f5ce5f416c336e1539eb3a6d74c51fe6e] drm/amd/display: Enable urgent latency adjustments f
or DCN35
git bisect good 284f141f5ce5f416c336e1539eb3a6d74c51fe6e
# first bad commit: [3f0b5af17575c95457538335750c630014d1fa6a] drm/amd/display: Fix DSC-re-computing

Offline

#19 2024-05-25 00:14:23

gromit
Administrator
From: Germany
Registered: 2024-02-10
Posts: 1,334
Website

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Yeah that looks like it is much more likely for being the culprit of your issue, good job on the bisection! smile

Offline

#20 2024-05-25 16:15:08

loqs
Member
Registered: 2014-03-06
Posts: 18,633

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Offline

#21 2024-05-25 17:35:18

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

Offline

#22 2024-05-26 04:14:04

clinta
Member
Registered: 2018-04-02
Posts: 10

Re: Kernel 6.9 amdgpu crash with multiple monitors via MST

ddimi wrote:

Same Problem here. Kernel 6.9 and 6.9.1
B550 with 7900xtx
I have to unplug one DP and start with 1 Monitor
after login, I can add the second Monitor without a problem.

I have 2 4K monitors connected with a KVM switch

A Second System on the same KVM switch (X399 with 6950 xt) have no Problems

If you are having the same issue, your comments may be appreciated on this bug report. With details about your exact hardware.

https://gitlab.freedesktop.org/drm/amd/-/issues/3405

Offline

Board footer

Powered by FluxBB