You are not logged in.

#1 2025-01-20 22:41:52

maxrd2
Member
Registered: 2014-09-16
Posts: 12

Issues with RX 580 & mesa 24.3.x

I have also been experiencing these crashes and lockups for few weeks since upgrading to mesa 24.3.x;

I have AMD Radeon RX 580 gfx card - not the APU. The CPU is AMD Ryzen 5 5600X.

Downgrading mesa and vulkan-radeon to 24.2.7 seems to fix the lockups (also had to install llvm18-libs).

pacman -U https://archive.archlinux.org/packages/m/mesa/mesa-1%3A24.2.7-1-x86_64.pkg.tar.zst
pacman -U https://archive.archlinux.org/packages/v/vulkan-radeon/vulkan-radeon-1%3A24.2.7-1-x86_64.pkg.tar.zst
pacman -S llvm18-libs

Here's the log of one crash:

Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Dumping IP State
Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Dumping IP State Completed
Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: ring gfx timeout, signaled seq=185596, emitted seq=185598
Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Process information: process kwin_wayland pid 1318 thread kwin_wayla:cs0 pid 1369
Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: BACO reset
Jan 19 11:45:23 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 19 11:45:23 beeblebrox kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
Jan 19 11:45:23 beeblebrox kernel: [drm] VRAM is lost due to GPU reset!
Jan 19 11:45:24 beeblebrox kernel: [drm] UVD and UVD ENC initialized successfully.
Jan 19 11:45:24 beeblebrox kernel: [drm] VCE initialized successfully.
Jan 19 11:45:24 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(2) succeeded!
Jan 19 11:45:24 beeblebrox kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 19 11:45:24 beeblebrox kwin_wayland[1318]: kwin_scene_opengl: A graphics reset not attributable to the current GL context occurred.
Jan 19 11:45:24 beeblebrox kwin_wayland[1318]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost
....
Jan 19 11:45:24 beeblebrox kwin_wayland[1318]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost
Jan 19 11:45:24 beeblebrox kwin_wayland[1318]: kwin_wayland_drm: Checking test buffer failed!
....
Jan 19 11:45:24 beeblebrox plasmashell[1599]: amdgpu: The CS has cancelled because the context is lost. This context is innocent.
Jan 19 11:45:24 beeblebrox plasmashell[1599]: KCrash: Application 'plasmashell' crashing... crashRecursionCounter = 2

Executing this causes the freeze with 24.2.7:

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

Offline

#2 2025-01-21 10:46:33

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,275

Re: Issues with RX 580 & mesa 24.3.x


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#3 2025-01-22 15:57:53

algebro
Member
Registered: 2019-04-18
Posts: 3

Re: Issues with RX 580 & mesa 24.3.x

I am pretty sure I am affected by this too with the Radeon 6600XT. I have only been lurking because the consensus seemed to be that it only affected the APUs and I don't have any errors in my logs when it happens, but there are too many other similarities with the symptoms and timing for it to be a coincidence.

My computer was so unusable that I ended up hopping distros to Fedora Kinoite and the issue disappeared, confirming that it's a software issue and not a hardware failure. I'm still anxiously following these threads and the various Mesa gitlab issues, hoping the problem is found and fixed before it hits the Fedora packages.

Offline

#4 2025-01-22 17:24:01

orbit-oc
Member
Registered: 2024-12-15
Posts: 62

Re: Issues with RX 580 & mesa 24.3.x

@maxrd2 @algebro

If I'm not mistaken about the names, then gfx8, gfx9 and gfx10 cards would be affected. The errors also seem comparable to me. Who knows if this even is not the same error. At least one of them.

It would be nice if you could get back here if you gain new insights or if the problem is solved.

Good luck: mesa seems to have a lot of problems at the moment.
I'm pretty annoyed by now...

Offline

#5 2025-01-23 14:03:55

maxrd2
Member
Registered: 2014-09-16
Posts: 12

Re: Issues with RX 580 & mesa 24.3.x

After being sure that rollback to 24.2.7 stopped random crashes/freezes, have installed 24.2.8 from this post:
https://bbs.archlinux.org/viewtopic.php … 3#p2220233

OpenGL renderer string: AMD Radeon RX 580 Series (radeonsi, polaris10, LLVM 19.1.6, DRM 3.59, 6.12.10-zen1-1-zen)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 24.2.8 (git-6552e32957)

Executing this causes the freeze with 24.2.8 and gfx card never recovered I had to reboot afterwards (with 24.2.7 it would recover after 10sec or so):

sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover

Other than that have used it for a day and I didn't notice crashes/freezes, will try mesa-git now.

Logs during crash:

Jan 22 23:09:14 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Jan 22 23:09:14 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: BACO reset
Jan 22 23:09:14 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 22 23:09:14 beeblebrox kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
Jan 22 23:09:14 beeblebrox kernel: [drm] VRAM is lost due to GPU reset!
Jan 22 23:09:14 beeblebrox kernel: [drm] UVD and UVD ENC initialized successfully.
Jan 22 23:09:14 beeblebrox kernel: [drm] VCE initialized successfully.
Jan 22 23:09:14 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(1) succeeded!
Jan 22 23:09:14 beeblebrox kwin_wayland[1329]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost
Jan 22 23:09:14 beeblebrox kwin_wayland[1329]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost
...
Jan 22 23:09:21 beeblebrox systemd-coredump[5407]: [?] Process 1609 (plasmashell) of user 1000 dumped core.
...
Jan 22 23:09:24 beeblebrox kwin_wayland[1329]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
...
Jan 22 23:09:21 beeblebrox systemd[1270]: plasma-ksystemstats.service: Consumed 1.210s CPU time, 9.3M memory peak.
Jan 22 23:09:21 beeblebrox systemd[1270]: plasma-plasmashell.service: Main process exited, code=dumped, status=6/ABRT
Jan 22 23:09:21 beeblebrox systemd[1270]: plasma-plasmashell.service: Failed with result 'core-dump'.
Jan 22 23:09:21 beeblebrox systemd[1270]: plasma-plasmashell.service: Consumed 10.824s CPU time, 864.4M memory peak.
Jan 22 23:09:21 beeblebrox drkonqi-coredump-processor[5408]: "/usr/bin/plasmashell" 1609 "/var/lib/systemd/coredump/core.plasmashell.1000.6adbfb2037be469696ee33487c73f43e.1609.1737583754000000.zst"
Jan 22 23:09:21 beeblebrox systemd[1270]: Started Launch DrKonqi for a systemd-coredump crash (PID 5408/UID 0).
Jan 22 23:09:21 beeblebrox systemd[1]: drkonqi-coredump-processor@1-5406-0.service: Deactivated successfully.
Jan 22 23:09:21 beeblebrox systemd[1270]: plasma-plasmashell.service: Scheduled restart job, restart counter is at 1.
Jan 22 23:09:21 beeblebrox systemd[1270]: Starting KDE Plasma Workspace...
Jan 22 23:09:24 beeblebrox kwin_wayland[1329]: kwin_wayland_drm: Pageflip timed out! This is a kernel bug
Jan 22 23:09:24 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Dumping IP State
Jan 22 23:09:24 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Dumping IP State Completed
Jan 22 23:09:24 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: ring gfx timeout, signaled seq=65723, emitted seq=65726
Jan 22 23:09:24 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Process information: process kwin_wayland pid 1329 thread kwin_wayla:cs0 pid 1380
Jan 22 23:09:24 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Jan 22 23:09:25 beeblebrox kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_0.2.1.0 test failed (-110)
Jan 22 23:09:25 beeblebrox kernel: [drm:gfx_v8_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jan 22 23:09:25 beeblebrox kernel: amdgpu: cp is busy, skip halt cp
Jan 22 23:09:25 beeblebrox kernel: amdgpu: rlc is busy, skip halt rlc
Jan 22 23:09:25 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: BACO reset
Jan 22 23:09:26 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume

Offline

#6 2025-01-23 21:46:41

maxrd2
Member
Registered: 2014-09-16
Posts: 12

Re: Issues with RX 580 & mesa 24.3.x

Just had a crash with mesa-git 25.0.0-devel (git-94da1edbe4) installed from post:
https://bbs.archlinux.org/viewtopic.php … 3#p2220233

Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Dumping IP State
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Dumping IP State Completed
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: ring gfx timeout, signaled seq=3386911, emitted seq=3386913
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: Process information: process kwin_wayland pid 1296 thread kwin_wayla:cs0 pid 1347
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset begin!
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: BACO reset
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 23 22:40:55 beeblebrox kernel: [drm] PCIE GART of 256M enabled (table at 0x000000F400800000).
Jan 23 22:40:55 beeblebrox kernel: [drm] VRAM is lost due to GPU reset!
Jan 23 22:40:55 beeblebrox kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.3.0 test failed (-110)
Jan 23 22:40:56 beeblebrox kernel: amdgpu 0000:07:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring comp_1.2.1 test failed (-110)
Jan 23 22:40:56 beeblebrox kernel: [drm] UVD and UVD ENC initialized successfully.
Jan 23 22:40:56 beeblebrox kernel: [drm] VCE initialized successfully.
Jan 23 22:40:56 beeblebrox kernel: amdgpu 0000:07:00.0: amdgpu: GPU reset(2) succeeded!
Jan 23 22:40:56 beeblebrox kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
Jan 23 22:40:56 beeblebrox kwin_wayland[1296]: kwin_scene_opengl: A graphics reset not attributable to the current GL context occurred.
Jan 23 22:40:56 beeblebrox kwin_wayland[1296]: kwin_scene_opengl: 0x2: GL_CONTEXT_LOST in context lost

Last edited by maxrd2 (2025-01-24 08:31:23)

Offline

#7 2025-01-23 22:58:19

kclisp
Member
Registered: 2025-01-04
Posts: 33

Re: Issues with RX 580 & mesa 24.3.x

@algebro

Please try a recent trunk build, e.g. Lone_Wolf's unpatched build #152 (https://bbs.archlinux.org/viewtopic.php?id=301798&p=7) and see if the error still occurs with that.

@algebro @maxrd2

If a freeze/crash still occurs on trunk, you'll probably need to make a new mesa issue, assuming it's different from the gfx9 one.

Offline

#8 2025-01-23 23:11:49

kclisp
Member
Registered: 2025-01-04
Posts: 33

Re: Issues with RX 580 & mesa 24.3.x

@algebro

Oh right, you can also wait for linux 6.13 in case it's this issue https://gitlab.freedesktop.org/drm/amd/-/issues/3693.

Offline

#9 2025-01-24 10:02:22

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,275

Re: Issues with RX 580 & mesa 24.3.x

Sofar I haven't seen reports from the gfx9 users of breakage with the 24.2.8 I build.

That strongly suggests maxrd2 is facing a different bug.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#10 2025-01-26 01:34:07

algebro
Member
Registered: 2019-04-18
Posts: 3

Re: Issues with RX 580 & mesa 24.3.x

Lone_Wolf wrote:

Sofar I haven't seen reports from the gfx9 users of breakage with the 24.2.8 I build.

That strongly suggests maxrd2 is facing a different bug.

Does it? Couldn't they just be the first one reporting it? I lurked for a long time before posting because everyone seemed convinced that only APUs were affected

Offline

#11 2025-01-26 11:41:08

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,275

Re: Issues with RX 580 & mesa 24.3.x

Mesa 24.3.x has multiple bugs that 24.2.x doesn't have . most related to radeonsi / amd cards .

Atleast 3 separate bugs have been identified, 2 of which didn't affect vega/apus at all. (those 2 have been solved already).
The 3rd is the one found by kclisp .

My personal estimate is that there are atleast 2 other bugs in 24.3 for which the cause hasn't been found yet, possibly more .

The best tool to pinpoint causes is bisecting but that only works if testers can reproduce the crashes .


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#12 Yesterday 22:26:16

maxrd2
Member
Registered: 2014-09-16
Posts: 12

Re: Issues with RX 580 & mesa 24.3.x

Lone_Wolf wrote:

Mesa 24.3.x has multiple bugs that 24.2.x doesn't have . most related to radeonsi / amd cards .
The best tool to pinpoint causes is bisecting but that only works if testers can reproduce the crashes .

I have finished bisecting. The crash occurs during 12 hrs, so it took me awhile to test everything, but have managed.
I'm not 100% sure that "good" commits are good, I have intensively used the PC for at least 1 day on "good" commits without crash so they should be good.

It seems that:

505fd350bc9634a75d73fa92461fc4819309c2f5 is the first bad commit
commit 505fd350bc9634a75d73fa92461fc4819309c2f5
Author: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Date:   Sat Aug 17 15:09:39 2024 -0400

    hk: handle compressed eMRT

    Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30981>

 src/asahi/vulkan/hk_cmd_draw.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 60 insertions(+)

EDIT: Have used kernel 6.12.10-zen1-1-zen and haven't updated the system between start-finish bisecting

Last edited by maxrd2 (Yesterday 22:29:56)

Offline

#13 Today 10:38:03

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,275

Re: Issues with RX 580 & mesa 24.3.x

That commit is part of MR 30981 which is for apple macs with apple M* arm processors.
It's very unlikely to have any effect on AMD gpus on X86 .

maxrd2 wrote:

EDIT: Have used kernel 6.12.10-zen1-1-zen and haven't updated the system between start-finish bisecting

Good call, not introducing new factors helps with bisecting.

For now I suggest you update your system, then build Mesa 25.0 - rc2 and see if the crash still occurs .

The 25.0 branch will be released as stable in a few weeks and has many amd related commits/fixes that were not in 24.3.x .


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#14 Today 19:40:09

maxrd2
Member
Registered: 2014-09-16
Posts: 12

Re: Issues with RX 580 & mesa 24.3.x

Lone_Wolf wrote:

For now I suggest you update your system, then build Mesa 25.0 - rc2 and see if the crash still occurs .

Have built/installed it (1d051e5cb1), and had a crash just now.

Lone_Wolf wrote:

That commit is part of MR 30981 which is for apple macs with apple M* arm processors.
It's very unlikely to have any effect on AMD gpus on X86 .

I have noticed that too, but still 48d4c5b4898 worked for couple days.
Are sources under `src/asahi` used exclusively only for asahi platform?

Am reverting back to 48d4c5b4898ec1e60241f101e4a3b67c0210ab1c will see if it still works.

Should I report this upstream?

Last edited by maxrd2 (Today 19:41:51)

Offline

Board footer

Powered by FluxBB