You are not logged in.

#1 2024-06-22 22:36:11

DeKay
Member
Registered: 2008-11-15
Posts: 50

Yet another amdgpu error - [drm:amdgpu_job_timedout [amdgpu]]

Hi.

I did a full system upgrade a few days ago. Today I tried to run xemu, the open source OG XBox Emulator, and it causes a GPU reset after maybe five to ten seconds. This is 100% repeatable and happens in both X and Wayland. With Wayland, it boots me back to SDDM but the screen is so corrupted graphically that I can't log back in so I have to power off and on. At least with X, it just black-screens for a while and goes back to SDDM so I can log back in again. This xemu setup used to run just fine before but it has been some time since I last fired it up and I've done multiple system upgrades since then. I also git pull'ed the latest version from https://github.com/xemu-project/xemu, compiled it from source as I always do, and no change. You can see in the second line of the dmesg output that it is indeed xemu that takes the card down.

[ 2516.273854] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=31440, emitted seq=31442
[ 2516.274694] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process xemu pid 11594 thread xemu:cs0 pid 11599
[ 2516.275509] amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
[ 2520.275500] amdgpu 0000:0c:00.0: amdgpu: failed to suspend display audio
[ 2520.642969] amdgpu: cp is busy, skip halt cp
[ 2520.906576] amdgpu: rlc is busy, skip halt rlc
[ 2520.907599] amdgpu 0000:0c:00.0: amdgpu: BACO reset
[ 2521.167938] amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 2521.169929] [drm] PCIE GART of 256M enabled (table at 0x000000F401780000).
[ 2521.170006] [drm] VRAM is lost due to GPU reset!
[ 2521.501138] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110)
[ 2521.767721] amdgpu 0000:0c:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.0.2 test failed (-110)
[ 2521.948606] [drm] UVD and UVD ENC initialized successfully.
[ 2522.049618] [drm] VCE initialized successfully.
[ 2522.056207] amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow start
[ 2522.061924] amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow done
[ 2522.061948] amdgpu 0000:0c:00.0: amdgpu: GPU reset(6) succeeded!
[ 2522.063392] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 2523.169805] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2523.169805] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2523.169810] [drm] scheduler comp_1.0.2 is not ready, skipping
[ 2523.189783] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2523.189790] [drm] scheduler comp_1.0.2 is not ready, skipping
...

For the time that xemu is running, my mouse slows way down like the system is under heavy load and I see dmesg's like this just before the bit posted above.

[ 2503.403587] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2503.420245] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2503.504511] snd_hdac_bus_update_rirb: 41 callbacks suppressed
[ 2503.504521] snd_hda_intel 0000:0c:00.1: spurious response 0x0:0x0, last cmd=0x770100
[ 2503.510203] snd_hda_intel 0000:0c:00.1: spurious response 0x0:0x0, last cmd=0x6f2d00
[ 2503.510215] snd_hda_intel 0000:0c:00.1: spurious response 0x0:0x0, last cmd=0x777800
<snip>
[ 2503.510272] snd_hda_intel 0000:0c:00.1: spurious response 0x0:0x0, last cmd=0x7f7c00
[ 2503.657958] input input17: unable to receive magic message: -32
[ 2503.778982] input input17: unable to receive magic message: -32
[ 2503.780641] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2503.780808] [drm] scheduler comp_1.0.1 is not ready, skipping
[ 2503.781377] [drm] scheduler comp_1.0.1 is not ready, skipping
<snip>
[ 2506.125001] [drm] scheduler comp_1.0.1 is not ready, skipping

CPU: Ryzen 1700 Desktop PC
MB: Asrock X370 Taichi
Memory: 16 GB
GPU: AMD RX560
Kernel: 6.9.5-arch1-1
Mesa: 24.1.1-1
KDE: 6.3.01
Plasma: 6.1.01

Everything stock: no undervolting, overclocking etc. My desktop is otherwise stable.

A search for some of the error messages in here gets hits going back a long way. I get the sense it is a bit of whack-a-mole issue. They fix it somewhere only to have it pop up somewhere else later. First time I think I've been hit with it though.

Any ideas on how I can get this working again?

Offline

#2 2024-06-22 23:44:03

DeKay
Member
Registered: 2008-11-15
Posts: 50

Re: Yet another amdgpu error - [drm:amdgpu_job_timedout [amdgpu]]

I found a workaround in the meantime with help from someone on the xemu discord. If I run KDE Plasma in a Wayland session, I can do this to get xemu to run

$ ZINK_DESCRIPTORS=lazy __GLX_VENDOR_LIBRARY_NAME=mesa MESA_LOADER_DRIVER_OVERRIDE=zink GALLIUM_DRIVER=zink ./dist/xemu

Offline

#3 2024-06-23 10:43:49

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 14,971

Re: Yet another amdgpu error - [drm:amdgpu_job_timedout [amdgpu]]

That suggest the error may be in the radeonsi driver.

Check /var/cache/pacman/pkg for mesa versions in the 1:24.0.x range and downgrade to that.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Online

#4 2024-06-24 15:10:15

DeKay
Member
Registered: 2008-11-15
Posts: 50

Re: Yet another amdgpu error - [drm:amdgpu_job_timedout [amdgpu]]

Thanks! I'll give this a try and report back.

Offline

#5 2024-06-25 02:30:52

DeKay
Member
Registered: 2008-11-15
Posts: 50

Re: Yet another amdgpu error - [drm:amdgpu_job_timedout [amdgpu]]

Lone_Wolf wrote:

Check /var/cache/pacman/pkg for mesa versions in the 1:24.0.x range and downgrade to that.

I tried 24.0.5-1 and had the same problem. Then I tried 23.3.5-1 and that really messed things up: I couldn't even log into KDE from SDDM. So I'm back now on 24.1.1 and at least I can get into KDE again. I'm guessing I should go to the mesa bug tracker and file a bug?

Update: Issue filed as https://gitlab.freedesktop.org/mesa/mesa/-/issues/11390

Last edited by DeKay (2024-06-25 03:06:53)

Offline

Board footer

Powered by FluxBB