You are not logged in.

#1 2024-11-25 03:59:15

Hacksign
Member
Registered: 2012-07-30
Posts: 136

AMD GPU freezed

Hi there,

I saw 2 threads related amdgpu problem, but it seems no my scenarios.

I've 3 monitors, one laptop monitor and 2 extended monitors, which are Lenovo & Xiaomi below:

You can consider the layout is something like below (from /etc/X11/xorg.conf.d/monitor.conf), BTW I use nvidia OFFLOAD:

  Section "Device"  
      Identifier "AMD-GPU-DEVICE"
      Driver "amdgpu"
      BusId "PCI:64:0:0"
      Option "Monitor-eDP-1-0" "Laptop"
      Option "Monitor-DisplayPort-1-0" "Lenovo"
  EndSection        
  Section "Device"  
      Identifier "NVIDIA-GPU-DEVICE"
      Driver "nvidia"
      BusId "PCI:1:0:0"
      Option "PrimaryGPU" "1"
      Option "AllowEmptyInitialConfiguration"
      Option "Monitor-HDMI-0" "Xiaomi"
  EndSection        

  Section "Screen"
      Identifier "AMD-GPU-SCREEN"
      Device "AMD-GPU-DEVICE"                                                                                                                                                                               
      Monitor "Laptop"
      Monitor "Lenovo"
          SubSection "Display"
              Depth       24
              Modes       "2560x1440" "1920x1080"
          EndSubSection
  EndSection
  Section "Screen"
      Identifier "NVIDIA-GPU-SCREEN"
      Device "NVIDIA-GPU-DEVICE"
          DefaultDepth    24
          Monitor "Xiaomi"
          SubSection "Display"
              Depth       24
              Modes       "3440x1440" "2560x1080" "1920x1080"
          EndSubSection
  EndSection

And my system info:

>> pacman -Q|grep -P 'linux|nvidia'
archlinux-keyring 20241015-1
archlinuxcn-keyring 20240531-2
lib32-util-linux 2.40.2-1
linux 6.12.1.arch1-1
linux-api-headers 6.10-1
linux-firmware 20241111.b5885ec5-1
linux-firmware-whence 20241111.b5885ec5-1
linux-headers 6.12.1.arch1-1
nvidia 565.57.01-8
nvidia-utils 565.57.01-2
opencl-nvidia 565.57.01-2
util-linux 2.40.2-1
util-linux-libs 2.40.2-1

Now the problem is that monitors which is using amdgpu freezed after about 40 seconds every time I login to my desktop environment.

The dmesg log is:

[    9.002013] amdgpu 0000:64:00.0: [drm] REG_WAIT timeout 1us * 100 tries - dcn31_program_compbuf_size line:142
[    9.002077] ------------[ cut here ]------------
[    9.002078] WARNING: CPU: 0 PID: 832 at drivers/gpu/drm/amd/amdgpu/../display/dc/hubbub/dcn31/dcn31_hubbub.c:151 dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[    9.002308] Modules linked in: qrtr_mhi cmac algif_hash algif_skcipher af_alg bnep qrtr ath11k_pci ath11k qmi_helpers vfat fat snd_ctl_led mac80211 btusb btrtl btintel libarc4 btbcm btmtk cfg80211 amd_atl intel_rapl_msr bluetooth intel_rapl_common snd_soc_dmic snd_soc_acp6x_mach snd_acp6x_pdm_dma snd_sof_amd_acp70 snd_sof_amd_acp63 snd_soc_acpi_amd_match snd_sof_amd_vangogh cdc_mbim snd_sof_amd_rembrandt cdc_wdm snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp vboxnetflt(OE) vboxnetadp(OE) snd_sof snd_sof_utils snd_pci_ps snd_amd_sdw_acpi soundwire_amd kvm_amd snd_hda_codec_realtek vboxdrv(OE) soundwire_generic_allocation nvidia_drm(POE) soundwire_bus snd_hda_codec_generic nvidia_uvm(POE) snd_hda_scodec_component kvm nvidia_modeset(POE) snd_hda_codec_hdmi snd_soc_core crct10dif_pclmul snd_hda_intel snd_compress crc32_pclmul ac97_bus snd_intel_dspcfg polyval_clmulni snd_intel_sdw_acpi snd_pcm_dmaengine uvcvideo polyval_generic ghash_clmulni_intel snd_rpl_pci_acp6x videobuf2_vmalloc snd_hda_codec
[    9.002364]  sha512_ssse3 snd_acp_pci uvc sha256_ssse3 videobuf2_memops snd_acp_legacy_common snd_hda_core sha1_ssse3 videobuf2_v4l2 snd_pci_acp6x aesni_intel snd_hwdep videobuf2_common snd_pci_acp5x cdc_ncm gf128mul snd_pcm snd_rn_pci_acp3x cdc_ether crypto_simd videodev snd_timer usbnet sp5100_tco cryptd snd_acp_config hid_multitouch snd i2c_piix4 snd_soc_acpi mii razermouse(OE) joydev mousedev mc razerkbd(OE) rfkill rapl wmi_bmof pcspkr thunderbolt mhi k10temp soundcore snd_pci_acp3x i2c_smbus ccp i2c_hid_acpi i2c_hid amd_pmc serio acpi_tad mac_hid nvidia(POE) crypto_user loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2 hid_generic usbhid amdgpu amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy nvme drm_display_helper nvme_core crc32c_intel cec video crc16 nvme_auth wmi
[    9.002427] CPU: 0 UID: 0 PID: 832 Comm: Xorg Tainted: P           OE      6.12.1-arch1-1 #1 33f4a68ee85c59cb5d6edb747af0349869779b24
[    9.002431] Tainted: [P]=PROPRIETARY_MODULE, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[    9.002433] Hardware name: Razer Blade 14 (2022) - RZ09-0427/PI480, BIOS 3.06 01/15/2023
[    9.002434] RIP: 0010:dcn31_program_compbuf_size+0xd1/0x230 [amdgpu]
[    9.002637] Code: 00 48 8b 43 28 8b 88 b0 01 00 00 48 8b 43 20 0f b6 50 6c 48 8b 43 18 8b b0 14 01 00 00 e8 07 45 0e 00 85 c0 0f 85 33 01 00 00 <0f> 0b 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 0f 85 35 01 00 00
[    9.002638] RSP: 0018:ffffb559c370f430 EFLAGS: 00010202
[    9.002641] RAX: 0000000000000001 RBX: ffff8e1b934ad400 RCX: 000000000000001f
[    9.002642] RDX: 0000000000000000 RSI: 000000000000398b RDI: ffff8e1b92f80000
[    9.002643] RBP: 0000000000000004 R08: ffffb559c370f434 R09: 0000000000000019
[    9.002644] R10: ffffffffb10b54a8 R11: 0000000000000003 R12: ffff8e1bc07c0000
[    9.002646] R13: ffff8e1b94000000 R14: ffff8e1b934ad400 R15: 0000000000000001
[    9.002647] FS:  0000719f861b39c0(0000) GS:ffff8e1eae200000(0000) knlGS:0000000000000000
[    9.002649] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.002650] CR2: 00007a441b116168 CR3: 0000000132c44000 CR4: 0000000000f50ef0
[    9.002651] PKRU: 55555554
[    9.002653] Call Trace:
[    9.002655]  <TASK>
[    9.002657]  ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.002854]  ? __warn.cold+0x93/0xf6
[    9.002857]  ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.003055]  ? report_bug+0xff/0x140
[    9.003058]  ? handle_bug+0x58/0x90
[    9.003061]  ? exc_invalid_op+0x17/0x70
[    9.003063]  ? asm_exc_invalid_op+0x1a/0x20
[    9.003067]  ? dcn31_program_compbuf_size+0xd1/0x230 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.003263]  dcn20_optimize_bandwidth+0xe7/0x220 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.003476]  dc_commit_state_no_check+0xc5f/0xeb0 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.003659]  dc_commit_streams+0x31f/0x420 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.003853]  amdgpu_dm_atomic_commit_tail+0x75f/0x3c30 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.004057]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004061]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004065]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004067]  ? amdgpu_dm_atomic_check+0x1493/0x1700 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.004268]  ? xas_load+0xd/0xd0
[    9.004272]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004274]  ? wait_for_completion_timeout+0x130/0x180
[    9.004279]  commit_tail+0x94/0x130
[    9.004282]  drm_atomic_helper_commit+0x11a/0x140
[    9.004285]  drm_atomic_commit+0xa9/0xe0
[    9.004288]  ? __pfx___drm_printfn_info+0x10/0x10
[    9.004292]  drm_atomic_helper_set_config+0x74/0xb0
[    9.004294]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[    9.004297]  drm_mode_setcrtc+0x46f/0x8a0
[    9.004299]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004302]  ? __pte_offset_map+0x1b/0x180
[    9.004304]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004308]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[    9.004311]  drm_ioctl_kernel+0xb0/0x100
[    9.004314]  drm_ioctl+0x277/0x4d0
[    9.004316]  ? __pfx_drm_mode_setcrtc+0x10/0x10
[    9.004321]  amdgpu_drm_ioctl+0x4b/0x80 [amdgpu 84e88e0534dc2928d32f8b075d0992f565877334]
[    9.004454]  __x64_sys_ioctl+0x94/0xd0
[    9.004458]  do_syscall_64+0x82/0x190
[    9.004462]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004465]  ? __x64_sys_epoll_ctl+0x73/0xb0
[    9.004467]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004469]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004472]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004474]  ? do_syscall_64+0x8e/0x190
[    9.004476]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004478]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004480]  ? __sys_setsockopt+0xd2/0xe0
[    9.004483]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004486]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004488]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004490]  ? do_syscall_64+0x8e/0x190
[    9.004492]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004494]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004497]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004499]  ? do_syscall_64+0x8e/0x190
[    9.004501]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004503]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004505]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004507]  ? do_syscall_64+0x8e/0x190
[    9.004510]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004512]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004514]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004516]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004518]  ? do_syscall_64+0x8e/0x190
[    9.004520]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004522]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004525]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004527]  ? do_syscall_64+0x8e/0x190
[    9.004529]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004531]  ? syscall_exit_to_user_mode+0x37/0x1c0
[    9.004533]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004535]  ? do_syscall_64+0x8e/0x190
[    9.004537]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004539]  ? do_syscall_64+0x8e/0x190
[    9.004541]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004543]  ? arch_exit_to_user_mode_prepare.isra.0+0x79/0x90
[    9.004546]  ? srso_alias_return_thunk+0x5/0xfbef5
[    9.004549]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[    9.004551] RIP: 0033:0x719f86964ced
[    9.004571] Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
[    9.004572] RSP: 002b:00007ffcd8075e90 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[    9.004574] RAX: ffffffffffffffda RBX: 00006130c5bc4d90 RCX: 0000719f86964ced
[    9.004576] RDX: 00007ffcd8075f20 RSI: 00000000c06864a2 RDI: 0000000000000011
[    9.004577] RBP: 00007ffcd8075ee0 R08: 0000000000000000 R09: 0000000000000000
[    9.004578] R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffcd8075f20
[    9.004579] R13: 00000000c06864a2 R14: 0000000000000011 R15: 0000000000000000
[    9.004583]  </TASK>
[    9.004584] ---[ end trace 0000000000000000 ]---

Monitor that using nvidia (Xiaomi in my case) are still working ...

Is this a kernel bug ? how to fix it ?

Offline

Board footer

Powered by FluxBB