You are not logged in.

#1 2024-03-03 13:45:21

TheAirBlow
Member
Registered: 2022-07-05
Posts: 46

[GPU] amdgpu error log spam

I have a laptop with an AMD Radeon RX 560X card, and my CPU is AMD Ryzen 5 3550H (has integrated graphics)

Recently I've noticed dmesg log spam:

[61347.928622] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[61348.245714] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[61348.487506] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)
[61348.610527] [drm] UVD and UVD ENC initialized successfully.
[61348.711527] [drm] VCE initialized successfully.
[61375.619883] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[61375.930758] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[61376.184808] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)
[61376.307355] [drm] UVD and UVD ENC initialized successfully.
[61376.408350] [drm] VCE initialized successfully.
[61429.027900] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[61429.336231] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[61429.578217] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)
[61429.701263] [drm] UVD and UVD ENC initialized successfully.
[61429.802265] [drm] VCE initialized successfully.
[61445.771738] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[61446.096395] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[61446.349930] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)
[61446.473521] [drm] UVD and UVD ENC initialized successfully.
[61446.574511] [drm] VCE initialized successfully.
[61458.878171] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[61459.193242] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[61459.433906] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)
[61459.556175] [drm] UVD and UVD ENC initialized successfully.
[61459.657173] [drm] VCE initialized successfully.
[61470.477984] [drm] PCIE GART of 256M enabled (table at 0x000000F400000000).
[61470.785612] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[61471.026433] amdgpu 0000:01:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.3 test failed (-110)
[61471.148739] [drm] UVD and UVD ENC initialized successfully.
[61471.249731] [drm] VCE initialized successfully.

And after going a bit higher up I found this:

[61048.376464] ==================================================================
[61048.376474] BUG: KFENCE: use-after-free read in amdgpu_bo_move+0x1db/0x810 [amdgpu]
[61048.377130] Use-after-free read at 0x000000001e216946 (in kfence-#85):
[61048.377138]  amdgpu_bo_move+0x1db/0x810 [amdgpu]
[61048.377410]  ttm_bo_handle_move_mem+0xcc/0x1b0 [ttm]
[61048.377410]  ttm_mem_evict_first+0x2b8/0x6c0 [ttm]
[61048.377410]  ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm]
[61048.377410]  amdgpu_device_prepare+0x4e/0xd0 [amdgpu]
[61048.377410]  amdgpu_pmops_runtime_suspend+0xbc/0x1c0 [amdgpu]
[61048.377410]  pci_pm_runtime_suspend+0x6a/0x1e0
[61048.377410]  __rpm_callback+0x44/0x170
[61048.377410]  rpm_suspend+0x444/0x8f0
[61048.377410]  pm_runtime_work+0x98/0xb0
[61048.377410]  process_one_work+0x17b/0x340
[61048.377410]  worker_thread+0x301/0x490
[61048.377410]  kthread+0xe8/0x120
[61048.377410]  ret_from_fork+0x34/0x50
[61048.377410]  ret_from_fork_asm+0x1b/0x30
[61048.377410] kfence-#85: 0x0000000000e900ff-0x0000000065e688c4, size=96, cache=kmalloc-96
[61048.377410] allocated by task 247661 on cpu 7 at 60763.746467s:
[61048.377410]  __kmem_cache_alloc_node+0x304/0x330
[61048.377410]  kmalloc_trace+0x2a/0xa0
[61048.377410]  amdgpu_vram_mgr_new+0x91/0x3a0 [amdgpu]
[61048.377410]  ttm_resource_alloc+0x45/0x190 [ttm]
[61048.377410]  ttm_bo_mem_space+0x89/0x230 [ttm]
[61048.377410]  ttm_bo_validate+0x9a/0x370 [ttm]
[61048.377410]  amdgpu_cs_bo_validate+0x9c/0x2e0 [amdgpu]
[61048.377410]  amdgpu_vm_validate_pt_bos+0xc2/0x4a0 [amdgpu]
[61048.377410]  amdgpu_cs_parser_bos.isra.0+0x496/0x820 [amdgpu]
[61048.377410]  amdgpu_cs_ioctl+0xa7c/0x1cc0 [amdgpu]
[61048.377410]  drm_ioctl_kernel+0xd6/0x180
[61048.377410]  drm_ioctl+0x26d/0x4b0
[61048.377410]  amdgpu_drm_ioctl+0x4e/0x90 [amdgpu]
[61048.377410]  __x64_sys_ioctl+0x97/0xd0
[61048.377410]  do_syscall_64+0x64/0xe0
[61048.377410]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
[61048.377410] freed by task 268836 on cpu 1 at 61048.376446s:
[61048.385380]  ttm_resource_free+0x83/0x190 [ttm]
[61048.385380]  ttm_bo_move_accel_cleanup+0xc8/0x2a0 [ttm]
[61048.385380]  amdgpu_bo_move+0x1a3/0x810 [amdgpu]
[61048.385380]  ttm_bo_handle_move_mem+0xcc/0x1b0 [ttm]
[61048.385380]  ttm_mem_evict_first+0x2b8/0x6c0 [ttm]
[61048.385380]  ttm_resource_manager_evict_all+0xa7/0x1d0 [ttm]
[61048.385380]  amdgpu_device_prepare+0x4e/0xd0 [amdgpu]
[61048.385380]  amdgpu_pmops_runtime_suspend+0xbc/0x1c0 [amdgpu]
[61048.385380]  pci_pm_runtime_suspend+0x6a/0x1e0
[61048.385380]  __rpm_callback+0x44/0x170
[61048.385380]  rpm_suspend+0x444/0x8f0
[61048.385380]  pm_runtime_work+0x98/0xb0
[61048.385380]  process_one_work+0x17b/0x340
[61048.385380]  worker_thread+0x301/0x490
[61048.385380]  kthread+0xe8/0x120
[61048.385380]  ret_from_fork+0x34/0x50
[61048.385380]  ret_from_fork_asm+0x1b/0x30
[61048.385380] CPU: 1 PID: 268836 Comm: kworker/1:1 Tainted: G        W  OE      6.7.6-zen1-2-zen #1 e48c58ba9fe36ec29ab17d4c69cf886bb594b7eb
[61048.385380] Hardware name: ASUSTeK COMPUTER INC. TUF Gaming FX705DY_FX705DY/FX705DY, BIOS FX705DY.315 03/09/2020
[61048.385380] Workqueue: pm pm_runtime_work
[61048.385380] ==================================================================

Is there anything I can do to fix this, and should I be worried?
DRI_PRIME=1 seems to be functioning properly, but I've noticed weird graphical bugs and that the dGPU runs worse than my iGPU.

Offline

#2 2024-03-03 15:26:27

seth
Member
Registered: 2012-09-03
Posts: 51,456

Re: [GPU] amdgpu error log spam

Add "amdgpu.runpm=0" to the https://wiki.archlinux.org/title/Kernel_parameters


Is 01:00 the AP or the RX 560X ?
ttm_mem_evict_first smells a bit like https://bbs.archlinux.org/viewtopic.php … 0#p2152270 (here apparently triggered by the runpm)

Offline

Board footer

Powered by FluxBB