You are not logged in.

#26 2024-12-22 19:30:47

pacmancrashedagain
Member
Registered: 2024-12-14
Posts: 12

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Thanks for trying @orbit-oc and for the mesa changelog.

Yeah i made a post on this thread(i'm randomcodepanda) but it might be better to make a new issue i guess, there's also a link to the issue in opensuse tracker and i've seen more people in reddit complaining so there's definitely something wrong with Mesa and Raven Ridge and Picasso. 

https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310

Last edited by pacmancrashedagain (2024-12-22 20:59:07)

Offline

#27 2024-12-23 21:26:52

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I am currently testing this hint from @seth
https://bbs.archlinux.org/viewtopic.php … 3#p2216003

At the moment the system is stable since about 3 hours.

If this test turns out to be successful, from my point of view this can only be a workaround.
At least then you wouldn't have to downgrade the mesa-package and prevent its upgrade.

I will run this test for a few days now to see whether the system actually stops freezing...

Thanks to @seth for this really non-obvious hint.

---
APU: AMD Ryzen 3000 series - Picasso (Zen+) - Vega graphics (gfx9)
Kernel-Parameter: 'amdgpu.mcbp = 0'
Package: mesa 24.3.2-1

Offline

#28 2024-12-23 23:04:24

pacmancrashedagain
Member
Registered: 2024-12-14
Posts: 12

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Thanks @orbit-oc if that works for you i'll try it as well, although i'm not sure if Raven Ridge is gfx9 since i can't find that information but i assume it is.

Offline

#29 2024-12-24 09:45:55

zyn
Member
Registered: 2024-12-24
Posts: 3

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Reporting for the "amdgpu.mcbp=0" parameter with the latest mesa. Freeze still occured.

Offline

#30 2024-12-24 11:02:13

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

@pacmancrashedagain

The terms 'vega' and 'gfx9' should be the same.

We are dealing here with AMD APU's of the series Raven-Ridge and Picasso, which all contain a 'vega' graphics chip. All of these APUs seems to be affected.
The 'vega' gpu series was released between Jun 2017 and Feb 2019.

Since at least in this thread most of the users here are affected by the mesa-bug with exactly these two APU types, the kernel parameter could be helpful for these users.

But there seem to be more bugs...

@zyn

Do you have a dedicated or a integrated GPU? What type?
see also here:
https://bbs.archlinux.org/viewtopic.php … 3#p2216253

@all

I'm feeling my way through the thicket here. I'm not a crack ... ;-)
This will still take us some time I think ...

Offline

#31 2024-12-24 14:20:13

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

...my system is  stable running for about 5 hours now... (see above)

...if you are interested in the problem of @four_bits told in this thread you can follow it here:
https://bbs.archlinux.org/viewtopic.php?id=302000

Offline

#32 2024-12-24 20:23:00

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

@pacmancrashedagain @OJaksch @zyn @all
Delighted too soon:

After setting the kernel parameter (mcbp = mid-command buffer preemption), the system was stable for about 10 hours (in total), before to freeze again. This time I even had to use 'fsck' again to repair the filesystem.

$ sudo journalctl -b -1 -e
Dez 24 20:08:44 ... kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.1 timeout, signaled seq=36222, emitted seq=36225

Another downgrade to mesa 24.2.7-1
But now I'm finally waiting for another bugfix...


merry christmas

Offline

#33 2024-12-24 21:05:29

seth
Member
Registered: 2012-09-03
Posts: 60,466

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Do you also have "amdgpu_device_ip_suspend_phase1" in that kernel backtrace (we might easily be dealing w/ two different issues - or adjacent ones…)

Offline

#34 2024-12-25 03:06:53

zyn
Member
Registered: 2024-12-24
Posts: 3

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

@orbit-oc

Oh, you did experienced freeze again too. Yeah I have Ryzen 3 2200g with Vega 8.

Happy Holidays too everyone. @all

Last edited by zyn (2024-12-25 04:34:07)

Offline

#35 2024-12-25 09:50:31

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

seth wrote:

Do you also have "amdgpu_device_ip_suspend_phase1" in that kernel backtrace (we might easily be dealing w/ two different issues - or adjacent ones…)

That could be. The error message also looks slightly different this time and the overall time until the system freezes is significantly longer.
But “amdgpu_device_ip_suspend_phase1” does not appear. I don't notice anything else.

@seth asks in another thread:

seth wrote:

Is it only electron clients that cause this?

No.

$ sudo journalctl -b -3 | grep amdgpu

--- remark: amdgpu.mcbp=0

Dez 24 18:50:48 ... kernel: [drm] amdgpu kernel modesetting enabled.
Dez 24 18:50:48 ... kernel: amdgpu: Virtual CRAT table created for CPU
Dez 24 18:50:48 ... kernel: amdgpu: Topology: Add CPU node
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: Fetched VBIOS from ROM BAR
Dez 24 18:50:48 ... kernel: amdgpu: ATOM BIOS: 113-PICASSO-118
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: vgaarb: deactivate vga console
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: Trusted Memory Zone (TMZ) feature enabled
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: VRAM: 2048M 0x000000F400000000 - 0x000000F47FFFFFFF (2048M used)
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: GART: 1024M 0x0000000000000000 - 0x000000003FFFFFFF
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: AGP: 267419648M 0x000000F800000000 - 0x0000FFFFFFFFFFFF
Dez 24 18:50:48 ... kernel: [drm] amdgpu: 2048M of VRAM memory ready
Dez 24 18:50:48 ... kernel: [drm] amdgpu: 6949M of GTT memory ready.
Dez 24 18:50:48 ... kernel: amdgpu: hwmgr_sw_init smu backed is smu10_smu
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: Will use PSP to load VCN firmware
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: RAS: optional ras ta ucode is not available
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: RAP: optional rap ta ucode is not available
--- remark: probably: DVI is used instead of HDMI/DP (DRM content)
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: Secure display: Generic Failure.
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: SECUREDISPLAY: query securedisplay TA failed. ret 0x0
---
Dez 24 18:50:48 ... kernel: snd_hda_intel 0000:26:00.1: bound 0000:26:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
Dez 24 18:50:48 ... kernel: amdgpu: HMM registered 2048MB device memory
Dez 24 18:50:48 ... kernel: kfd kfd: amdgpu: Allocated 3969056 bytes on gart
Dez 24 18:50:48 ... kernel: kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
Dez 24 18:50:48 ... kernel: amdgpu: Virtual CRAT table created for GPU
Dez 24 18:50:48 ... kernel: amdgpu: Topology: Add dGPU node [0x15d8:0x1002]
Dez 24 18:50:48 ... kernel: kfd kfd: amdgpu: added device 1002:15d8
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: SE 1, SH per SE 1, CU per SH 11, active_cu_number 8
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring gfx_low uses VM inv eng 1 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring gfx_high uses VM inv eng 4 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 5 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 6 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 7 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 8 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 9 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 10 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 11 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 12 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 13 on hub 0
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 8
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 8
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 8
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 8
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 8
Dez 24 18:50:48 ... kernel: [drm] Initialized amdgpu 3.54.0 20150101 for 0000:26:00.0 on minor 1
Dez 24 18:50:48 ... kernel: fbcon: amdgpudrmfb (fb0) is primary device
Dez 24 18:50:48 ... kernel: amdgpu 0000:26:00.0: [drm] fb0: amdgpudrmfb frame buffer device
--- remark: freezing the system
Dez 24 20:08:44 ... kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.1 timeout, signaled seq=36222, emitted seq=36225

Offline

#36 2024-12-25 10:14:24

seth
Member
Registered: 2012-09-03
Posts: 60,466

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Can we please see the context around the timeout because the amdgpu grep (bus errors, kernel OOPs, …)?

Offline

#37 2024-12-25 12:15:00

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

seth wrote:

Can we please see the context around the timeout because the amdgpu grep (bus errors, kernel OOPs, …)?

That day I was in the process of permanently configuring the kernel parameter amdgpu.mcbp=0. The system then froze during this action ;-). As I remember it, several windows were open.
In my opinion there is nothing to be seen in the logs, except that the rtkit.daemon is spamming the entries.
With journalctl -f I can see that the browser firefox initiates a lot of these entries. Someone had remarked that in his opinion Chrome based browsers could be responsible for the crashes examined here. I don't necessarily think so.

$ sudo journalctl --since="2024-12-24 20:01"

--- remark: amdgpu.mcbp=0
...
Dez 24 20:01:27 ... systemd[1]: systemd-timedated.service: Deactivated successfully.
Dez 24 20:03:27 ... systemd[1]: Starting Time & Date Service...
Dez 24 20:03:27 ... systemd[1]: Started Time & Date Service.
Dez 24 20:03:27 ... rtkit-daemon[1194]: Supervising 6 threads of 3 processes of 1 users.
Dez 24 20:03:27 ... rtkit-daemon[1194]: Supervising 6 threads of 3 processes of 1 users.
Dez 24 20:03:27 ... rtkit-daemon[1194]: Supervising 6 threads of 3 processes of 1 users.
Dez 24 20:03:27 ... rtkit-daemon[1194]: Supervising 6 threads of 3 processes of 1 users.
Dez 24 20:03:27 ... rtkit-daemon[1194]: Successfully made thread 12213 of process 12094 owned by '1000' RT at priority 10.
Dez 24 20:03:27 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:28 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:28 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:30 ... rtkit-daemon[1194]: Supervising 7 threads of 4 processes of 1 users.
Dez 24 20:03:57 ... systemd[1]: systemd-timedated.service: Deactivated successfully.
Dez 24 20:04:40 ... systemd[801]: Starting Xfce configuration service...
Dez 24 20:04:40 ... systemd[801]: Started Xfce configuration service.
Dez 24 20:04:48 ... sudo[12508]:  <user> : TTY=pts/0 ; PWD=/home/<user> ; USER=root ; COMMAND=/usr/bin/nano /etc/default/grub
Dez 24 20:04:48 ... sudo[12508]: pam_unix(sudo:session): session opened for user root(uid=0) by <user>(uid=1000)
--- remark: freezing the system
Dez 24 20:08:44 ... kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.1 timeout, signaled seq=36222, emitted seq=36225

Offline

#38 2024-12-25 14:16:54

seth
Member
Registered: 2012-09-03
Posts: 60,466

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

That's even less than https://bbs.archlinux.org/viewtopic.php … 8#p2216478 - there seems no build-up and also no flip timeout etc afterwards?
(Or did you reboot out of the freeze w/ the power button?)

Offline

#39 2024-12-25 14:33:41

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

seth wrote:

Or did you reboot out of the freeze w/ the power button?

No. I used SysRq and REISUB as before. However, I had previously tested if I could open a tty, which was not the case.
REISUB did not work as usual, as there were numerous errors in the file system afterwards (root and /home).

Offline

#40 2024-12-25 14:58:39

seth
Member
Registered: 2012-09-03
Posts: 60,466

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Dez 24 20:04:48 ... sudo[12508]: pam_unix(sudo:session): session opened for user root(uid=0) by <user>(uid=1000)
--- remark: freezing the system
Dez 24 20:08:44 ... kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring comp_1.0.1 timeout, signaled seq=36222, emitted seq=36225

Did the system clearly freeze within those 4 minutes or maybe (briefly) after 20:08:44 ?

If the journal could not be synced to disk, there might have been various other issues, notably wrt.

This time I even had to use 'fsck' again to repair the filesystem.

nvme ?

You could go for another run on amdgpu.mcbp=0/24.3 and see whether the issue returns - the timeout there isn't necessarily fatal so even if it's still mesa/amdgpu I'm not sure that message indicates the cause.

Offline

#41 2024-12-25 15:51:48

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

@seth
I can't give you the exact time. This would actually require another attempt.
An nvme is not involved. The system is on an SSD. The system ran absolutely perfect until mesa 24.2.7.

Offline

#42 2024-12-29 18:55:35

Nicky726
Member
From: Czech Republic
Registered: 2008-02-15
Posts: 145

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I also experienced issues with Mesa 24.3.x and amdgpu on Ryzen 5 3400G with Radeon Vega Graphics. This is an older CPU+RAM in a recently bought MoBo, tested cca 2 days with memtest before transfering Arch on a SSD from a different machine.

Image on the monitor freezes, no reaction to mouse or keyboard. System is reacheable by ssh, but it is in a "weird" state. Qemu headless virtual machine was uneffected and could be turned-off, no apparent issues with CLI commands, attempting to restart SDDM resulted in black screen, some zombie processes and some cores being used on 100%. Attemting reboot via command failes, have to reboot via button. Frequency of issues: about every other day (DE is not used all the time, issues happen with a user logged in KDE on X).

Dmesg before SDDM restart had messages such as those (on the 25th):

[122759.223381] INFO: task kworker/u33:7:19543 blocked for more than 122 seconds.
[122759.223393]       Not tainted 6.12.6-arch1-1 #1
[122759.223398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[122759.223402] task:kworker/u33:7   state:D stack:0     pid:19543 tgid:19543 ppid:2      flags:0x00004000
[122759.223416] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[122759.223438] Call Trace:
[122759.223442]  <TASK>
[122759.223451]  __schedule+0x3b0/0x12b0
[122759.223467]  ? srso_return_thunk+0x5/0x5f
[122759.223474]  ? __smpboot_create_thread+0x13f/0x140
[122759.223484]  ? srso_return_thunk+0x5/0x5f
[122759.223491]  ? timerqueue_add+0x71/0xc0
[122759.223504]  schedule+0x27/0xf0
[122759.223512]  schedule_timeout+0x12f/0x160
[122759.223525]  dma_fence_default_wait+0x1d8/0x250
[122759.223534]  ? __pfx_dma_fence_default_wait_cb+0x10/0x10
[122759.223545]  dma_fence_wait_timeout+0x108/0x140
[122759.223554]  dma_resv_wait_timeout+0xcc/0x1c0
[122759.223567]  ttm_bo_delayed_delete+0x2a/0x80 [ttm 00152e367bd49dad0e23264ca6c74ed471255b73]
[122759.223585]  process_one_work+0x17e/0x330
[122759.223597]  worker_thread+0x2ce/0x3f0
[122759.223607]  ? __pfx_worker_thread+0x10/0x10
[122759.223615]  kthread+0xd2/0x100
[122759.223624]  ? __pfx_kthread+0x10/0x10
[122759.223634]  ret_from_fork+0x34/0x50
[122759.223641]  ? __pfx_kthread+0x10/0x10
[122759.223650]  ret_from_fork_asm+0x1a/0x30
[122759.223668]  </TASK>
[122804.296972] amdgpu 0000:0a:00.0: amdgpu: Dumping IP State Completed

After SDDM restart those appeared  (on the 23rd)::

[117786.873821] amdgpu 0000:0a:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[117789.366829] ------------[ cut here ]------------
[117789.366833] WARNING: CPU: 0 PID: 19175 at drivers/gpu/drm/amd/amdgpu/../display/dc/clk_mgr/dcn10/rv1_clk_mgr_vbios_smu.c:119 rv1_vbios_smu_send_msg_with_param+0xa3/0xb0 [amdgpu]

Since the dowgrade of mesa to 24.2.7-1, the system runs 3+ days ok.


"Although the masters make the rules
For the wise men and the fools
I got nothing, Ma, to live up to."

Offline

#43 2025-01-03 19:32:53

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Of course, we don't want this thread to disappear into oblivion. The calls show that there seem to be quite a few user affected.
In the meantime, there have also been further reports from @flemingfleming (thread id=301849) and @Nicky726.

The release calendar had planned another bug fix for 2025/01/02, which has now been released today with 'mesa 24.3.3'. Here you can see the release notes:

https://docs.mesa3d.org/relnotes/24.3.3.html

I can't see a fix for our problem here, but I will certainly test this version again as soon as it is available.

Offline

#44 2025-01-04 12:55:23

zyn
Member
Registered: 2024-12-24
Posts: 3

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

24.3.3 seems quite promising, 9hours with mix of video and gaming, no freeze yet

~> uname -a && uptime
Linux eritodesuku 6.12.8-zen1-1-zen #1 ZEN SMP PREEMPT_DYNAMIC Thu, 02 Jan 2025 22:52:21 +0000 x86_64 GNU/Linux
 20:53:26 up  9:45,  1 user,  load average: 0.67, 0.84, 0.76
~> glxinfo | grep OpenGL
OpenGL vendor string: AMD
OpenGL renderer string: AMD Radeon Vega 8 Graphics (radeonsi, raven, LLVM 18.1.8, DRM 3.59, 6.12.8-zen1-1-zen)
OpenGL core profile version string: 4.6 (Core Profile) Mesa 24.3.3-arch1.1
OpenGL core profile shading language version string: 4.60
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6 (Compatibility Profile) Mesa 24.3.3-arch1.1
OpenGL shading language version string: 4.60
OpenGL context flags: (none)
OpenGL profile mask: compatibility profile
OpenGL extensions:
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 24.3.3-arch1.1
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20
OpenGL ES profile extensions:

EDIT:
spoke too soon, freeze occured while switching virtual desktops again

only related entry on the log i have

Jan 04 22:11:57 eritodesuku kernel: amdgpu 0000:0a:00.0: amdgpu: Dumping IP State

Last edited by zyn (2025-01-04 14:17:28)

Offline

#45 2025-01-04 16:44:51

nek0panchi
Member
Registered: 2020-08-07
Posts: 6

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I also have this problem.
I'm on a AMD Ryzen 5 3400G with Radeon Vega Graphics, 16GB of RAM. In most of the crashes there were no related error messages in the logs except for the occasional "amdgpu: Dumping IP State".
I wonder for how long we can keep holding mesa 24.2.7 until things start breaking.

Offline

#46 2025-01-04 17:04:34

kclisp
Member
Registered: 2025-01-04
Posts: 26

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Just chiming in that I've also had the same problem for weeks (Radeon Vega), and downgrading mesa fixed the issue. It seems that there are a couple upstream issues already. Hopefully someone can bisect it.

https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310
https://gitlab.freedesktop.org/drm/amd/-/issues/3861
https://gitlab.freedesktop.org/drm/amd/-/issues/3874

Navi 2x (Radeon RX) also has issues in drm/amd. Maybe related, maybe not.

Edit: From updates in the second link, this Vega issue is likely different from the RX issue.

Last edited by kclisp (2025-01-06 04:02:46)

Offline

#47 2025-01-04 21:46:46

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

nek0panchi wrote:

I wonder for how long we can keep holding mesa 24.2.7 until things start breaking.

I wonder anyway why there is no stable version [24.2.8] besides the current developer version.


--- Test setup ---
Kernel: 6.6.69-1-lts
mesa: 24.3.3-1 - vulkan-radeon 24.3.3-1
Kernel parameter: amdgpu.mcbp=0
integrated gpu: AMD Radeon Vega 8 Graphics (radeonsi, raven, LLVM 18.1.8, DRM 3.54, 6.6.69-1-lts)

After only 1 hour 16 minutes, the system crashed again. Exact time (utc+1) = 21:01:31. Logged SysRq (REISUB) = 21:02:17
There is no entry between 20:38:44 and 21:02:16. No error message - nothing!

For now I'm silent again....

Offline

#48 2025-01-05 10:53:28

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,155

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mesa calls .0 releases development releases, but .1 and up are stable releases.

So Mesa 24.3.x is the latest stable , 24.2.x is legacy and only gets bugfixes .

If someone wants the latest legacy version (currently 24.2.8 ) they need to built it themselves .

Unless someone with this exact hardware and can reproduce the issue volunteers to (help) bisect this, the chance of it being solved is very low .

Currently no one has come forward.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#49 2025-01-05 13:35:19

orbit-oc
Member
Registered: 2024-12-15
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Lone_Wolf wrote:

If someone wants the latest legacy version (currently 24.2.8 ) they need to built it themselves.

I can trace having to build the package (24.2.8) myself, at least for arch users. But when the version (mesa) spreads through the distributions, the description will spread rapidly and meet to users who don't just build a package themselves. By the way, I can't do that myself either. ;-)

Lone_Wolf wrote:

Currently no one has come forward.

@kclisp has pointed out that there are already messages in the mesa ticker. At one point, reference is also made to this thread here.
It has already been stated here in the forum that the error messages available to me are not sufficient for any analysis. The last test then showed no error messages at all.

My knowledge of graphical processing at system level is not sufficient to isolate the error. So I could only help to point out that integrated Vega Graphics = AMD APU's of the series Raven-Ridge and Picasso are affected and probably all versions. However, this has already been done just here and the link has been posted in the mesa ticker by @pacmancrashedagain.

However, as mesa 24.3.x and above spreads into the distributions, the bug will become more and more noticeable and someone will have the ability to better describe the problem to the mesa developers.

I will continue to follow this thread here, but probably have nothing more to contribute.

thanks to all...

Offline

#50 2025-01-05 15:12:18

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,155

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

The error messages are not the only (and often not the best) way to determine the cause of the issue.

As long as there is  a reproducible test case, bisecting is a tried and true technique to isolate the cause .

Two points are needed to start bisecting :
a point where things work (old/good)
and one were things break (new/bad)

For bisecting between multiple branches old & new work best.
A point between old and new is chosen, built and tested.

if the test shows the issue , the cause lies before it.
in case the test doesn't show the problem, the cause is after it.

Each step decreases the possible causes by half .

In case there are 6000 commits between old and new that means approx 13 tests are needed to pinpoint the cause.
Once the cause has been isolated people that understand the code can work on a solution.


If needed I (and others) can help building the to-be-tested versions, but I don't have vega hardware so can't test.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

Board footer

Powered by FluxBB