You are not logged in.

#1 2022-08-15 20:36:31

GeorgeJP
Member
From: Czech Republic
Registered: 2020-01-28
Posts: 143

[SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

Hi,

I have following error in journal since kernel update from 5.18.16 to 5.19.1:

srp 15 22:12:52 ryzen kernel: ------------[ cut here ]------------
srp 15 22:12:52 ryzen kernel: WARNING: CPU: 2 PID: 216 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.c:98 rn_vbios_smu_send_msg_with_param+0xf1/0x100 [amdgpu]
srp 15 22:12:52 ryzen kernel: Modules linked in: amdgpu(+) drm_ttm_helper ttm gpu_sched drm_display_helper cec
srp 15 22:12:52 ryzen kernel: CPU: 2 PID: 216 Comm: modprobe Not tainted 5.19.1-arch2-1 #1 e053941816231cdb69988a866d07465c3100e80c
srp 15 22:12:52 ryzen kernel: Hardware name: ASUS System Product Name/TUF GAMING B550M-PLUS, BIOS 2803 04/27/2022
srp 15 22:12:52 ryzen kernel: RIP: 0010:rn_vbios_smu_send_msg_with_param+0xf1/0x100 [amdgpu]
srp 15 22:12:52 ryzen kernel: Code: f8 01 75 1b 48 8b 7d 00 5b be 93 62 01 00 48 c7 c2 a0 f6 a5 c0 5d 41 5c 41 5d e9 aa e1 f4 ff 3d fe 00 00 00 74 de 0f 0b eb da <0f> 0b e9 58 ff ff ff 0f 1f 84 00 00 00 00 00 66 0f 1f 00 0f 1f 44
srp 15 22:12:52 ryzen kernel: RSP: 0018:ffffbba7c1a136a8 EFLAGS: 00010202
srp 15 22:12:52 ryzen kernel: RAX: 00000000000000fe RBX: 0000000000030d41 RCX: ffffffffc0cbe118
srp 15 22:12:52 ryzen kernel: RDX: 0000000000000000 RSI: 000000000001629b RDI: ffff9eeb4a320000
srp 15 22:12:52 ryzen kernel: RBP: ffff9eeb4a340600 R08: ffff9eeb4a4d9000 R09: 0000000000000f9f
srp 15 22:12:52 ryzen kernel: R10: 000000000000001a R11: 0036ee8000000000 R12: 000000000000000d
srp 15 22:12:52 ryzen kernel: R13: 0000000000000001 R14: 0000000000000190 R15: ffff9eeb4a340600
srp 15 22:12:52 ryzen kernel: FS:  00007fea84184740(0000) GS:ffff9ef151c80000(0000) knlGS:0000000000000000
srp 15 22:12:52 ryzen kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
srp 15 22:12:52 ryzen kernel: CR2: 00007fea84264160 CR3: 0000000103ddc000 CR4: 0000000000750ee0
srp 15 22:12:52 ryzen kernel: PKRU: 55555554
srp 15 22:12:52 ryzen kernel: Call Trace:
srp 15 22:12:52 ryzen kernel:  <TASK>
srp 15 22:12:52 ryzen kernel:  rn_clk_mgr_construct+0x154/0x620 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  dc_clk_mgr_create+0x42f/0x5d0 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  dc_create+0x23c/0x5b0 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  amdgpu_dm_init.isra.0+0x240/0x360 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  ? dev_vprintk_emit+0x177/0x19f
srp 15 22:12:52 ryzen kernel:  dm_hw_init+0x12/0x30 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  amdgpu_device_init.cold+0x17ba/0x1d5a [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  amdgpu_driver_load_kms+0x19/0x130 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  amdgpu_pci_probe+0x14b/0x360 [amdgpu b5e4d5032e0a8061c2414795c43a504984686aac]
srp 15 22:12:52 ryzen kernel:  local_pci_probe+0x45/0x80
srp 15 22:12:52 ryzen kernel:  pci_device_probe+0xc1/0x220
srp 15 22:12:52 ryzen kernel:  ? sysfs_do_create_link_sd+0x6e/0xe0
srp 15 22:12:52 ryzen kernel:  really_probe+0x1c2/0x390
srp 15 22:12:52 ryzen kernel:  __driver_probe_device+0xff/0x170
srp 15 22:12:52 ryzen kernel:  driver_probe_device+0x1f/0x90
srp 15 22:12:52 ryzen kernel:  __driver_attach+0xc2/0x1b0
srp 15 22:12:52 ryzen kernel:  ? __device_attach_driver+0xe0/0xe0
srp 15 22:12:52 ryzen kernel:  bus_for_each_dev+0x8b/0xd0
srp 15 22:12:52 ryzen kernel:  bus_add_driver+0x164/0x220
srp 15 22:12:52 ryzen kernel:  driver_register+0x8d/0xe0
srp 15 22:12:52 ryzen kernel:  ? 0xffffffffc03f1000
srp 15 22:12:52 ryzen kernel:  do_one_initcall+0x5d/0x220
srp 15 22:12:52 ryzen kernel:  do_init_module+0x4a/0x1e0
srp 15 22:12:52 ryzen kernel:  __do_sys_finit_module+0xac/0x120
srp 15 22:12:52 ryzen kernel:  do_syscall_64+0x5f/0x90
srp 15 22:12:52 ryzen kernel:  ? ksys_lseek+0x82/0xc0
srp 15 22:12:52 ryzen kernel:  ? syscall_exit_to_user_mode+0x1b/0x40
srp 15 22:12:52 ryzen kernel:  ? do_syscall_64+0x6b/0x90
srp 15 22:12:52 ryzen kernel:  ? do_syscall_64+0x6b/0x90
srp 15 22:12:52 ryzen kernel:  ? do_syscall_64+0x6b/0x90
srp 15 22:12:52 ryzen kernel:  entry_SYSCALL_64_after_hwframe+0x63/0xcd
srp 15 22:12:52 ryzen kernel: RIP: 0033:0x7fea8428956d
srp 15 22:12:52 ryzen kernel: Code: 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 03 78 0d 00 f7 d8 64 89 01 48
srp 15 22:12:52 ryzen kernel: RSP: 002b:00007fff1a464bb8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
srp 15 22:12:52 ryzen kernel: RAX: ffffffffffffffda RBX: 000056398fd05b80 RCX: 00007fea8428956d
srp 15 22:12:52 ryzen kernel: RDX: 0000000000000000 RSI: 000056398eda1cb2 RDI: 0000000000000008
srp 15 22:12:52 ryzen kernel: RBP: 000056398eda1cb2 R08: 0000000000000000 R09: 00007fff1a464cf0
srp 15 22:12:52 ryzen kernel: R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000060000
srp 15 22:12:52 ryzen kernel: R13: 000056398fd05ac0 R14: 0000000000000000 R15: 000056398fd067a0
srp 15 22:12:52 ryzen kernel:  </TASK>
srp 15 22:12:52 ryzen kernel: ---[ end trace 0000000000000000 ]---

Full journals from last boot before update and first boot after update:

http://ix.io/47Ig
http://ix.io/47Ih

My system:
Archlinux only
MB: ASUS TUF GAMING B550M-PLUS
CPU: AMD Ryzen 5 5600G with Radeon Graphics (12) @ 3.900GHz
GPU: AMD ATI 07:00.0 Cezanne
(Integrated GPU only)
RAM 32GB (4GB reserved for GPU in UEFI)
1TB nvme WD Blue

My system works normally (at least looks like that)

Do anybody know cause of this or some workaround?

Last edited by GeorgeJP (2022-09-16 17:35:16)

Offline

#2 2022-08-16 05:46:32

loqs
Member
Registered: 2014-03-06
Posts: 15,331

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

Offline

#3 2022-08-16 18:51:03

GeorgeJP
Member
From: Czech Republic
Registered: 2020-01-28
Posts: 143

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

Thanks, as I see, there is change in

drivers/gpu/drm/amd/display/dc/clk_mgr/dcn21/rn_clk_mgr_vbios_smu.c

and some corrections in linux 6.0rc.
I will keep eyes on it.

Offline

#4 2022-09-15 11:53:08

loqs
Member
Registered: 2014-03-06
Posts: 15,331

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

Offline

#5 2022-09-15 14:56:45

tucuxi
Member
From: Switzerland
Registered: 2020-03-08
Posts: 266

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

At the exact moment when I clicked on the thread, my computer froze.  I found a sequence of these messages in the log:

Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub0] no-retry page fault (src_id:0 ring:40 vmid:2 pasid:32777, for process fir>
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:   in page starting at address 0x0000800103100000 from IH client 0x1b (UTCL2)
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00241051
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:          MORE_FAULTS: 0x1
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:          WALKER_ERROR: 0x0
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:          MAPPING_ERROR: 0x0
Sep 15 16:52:11 arch kernel: amdgpu 0000:0c:00.0: amdgpu:          RW: 0x1

AMD Radeon VII

Offline

#6 2022-09-15 18:14:22

loqs
Member
Registered: 2014-03-06
Posts: 15,331

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

@tucuxi please post all the kernel messages from that boot.  It looks to be a different issue to GeorgeJP's but that could just be due to the lack of context.

Offline

#7 2022-09-15 18:54:21

tucuxi
Member
From: Switzerland
Registered: 2020-03-08
Posts: 266

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

journalctl -k -b-1: http://0x0.st/oO40.txt
journalctl -k -b-2: http://ix.io/4aAl
journalctl -k -b-12: http://ix.io/4aAm

I observed the same effect on three distinct boots since September 12th.

Offline

#8 2022-09-15 19:09:25

loqs
Member
Registered: 2014-03-06
Posts: 15,331

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

All the logs contain

[drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!

Which looks to be https://gitlab.freedesktop.org/drm/amd/-/issues/934 although that message could simply indicate something previously went wrong and now the GPU is stuck and the driver needs to reset / recover it.

Offline

#9 2022-09-16 17:36:05

GeorgeJP
Member
From: Czech Republic
Registered: 2020-01-28
Posts: 143

Re: [SOLVED] Amdgpu problems after kernel update 5.18.16->5.19.1

I can confirm - problem resolved in linux 5.19.9

Offline

Board footer

Powered by FluxBB