You are not logged in.

#401 2025-02-01 17:28:42

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Who has some time please test linux-ring-recovery build. https://bbs.archlinux.org/viewtopic.php … 1#p2223911

Offline

#402 2025-02-01 17:45:09

grayich
Member
Registered: 2012-02-24
Posts: 4

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Is it normal that after "cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover" the GPU frequency rises to the maximum (1240 MHz) and does not drop?
GPU load is 99%, according to nvtop
Ryzen 3 3200G, Vega8

Offline

#403 2025-02-01 18:20:44

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

Who has some time please test linux-ring-recovery build. https://bbs.archlinux.org/viewtopic.php … 1#p2223911

I'm about 20hr into testing linux-amdgpu 6.13.arch1-2. This setup is working so well including passing my overnight test, I hate to stop the testing early.

Considering these findings,  https://bbs.archlinux.org/viewtopic.php … 6#p2223896  would you prefer I move onto testing 'linux-ring-recovery' or would you rather have the results from longer term linux-amdgpu 6.13.arch1-2 testing?

I've also set the contents of /sys/module/drm/parameters/debug to '15'. Would there be any mesa debug logging available (location of file or data) other than the journal, and would you be interested in it at this point. If it's only in the journal, could I search/grep for a term that would filter it out?

Yes. 0xf in hexadecimal is 15 decimal.

EDIT:
I've found I can consistently induce a graphical freeze in XFCE by entering the command 'radeontop' in 'Application Finder' , [Alt]+[F2].
With my current setup, it's recoverable via switching tty and killing 'radeontop, and 'xinit', then rerunning 'startx'.
Since this isn't a complete system freeze with this setup anyway, it may not prove useful in producing a system freeze with different mesa setups.
Thought this was interesting none the less and worth passing on.

Last edited by NuSkool (2025-02-01 18:46:53)

Offline

#404 2025-02-01 18:37:30

flemingfleming
Member
Registered: 2024-12-27
Posts: 14

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

AMDGPU
Build: linux-ring-recovery-6.13.arch1-1, linux-ring-recovery-headers-6.13.arch1-1
Included patches:
- Extend amdgpu_ring_soft_recovery function wih PG control

What to check:
- sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover
- idle stability
- workflow stability
- glxgears window resizing
- vkcube window resizing

Kernel option to keep during testing period: fsck.mode=force

I'm able to cause the crash with this build: linux-ring-recovery-6.13.arch-1-1.

I did try with drm.debug=0xf but for some reason if I do that I can no longer cause the crash! So sorry about that. If this is the correct documentation I could try setting it to some other value if you want?

(Ryzen 5 2500U (Raven Ridge))

Offline

#405 2025-02-01 19:43:58

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

flemingfleming wrote:

I'm able to cause the crash with this build: linux-ring-recovery-6.13.arch-1-1.

I did try with drm.debug=0xf but for some reason if I do that I can no longer cause the crash! So sorry about that. If this is the correct documentation I could try setting it to some other value if you want?

(Ryzen 5 2500U (Raven Ridge))

Documentation link is correct. Well, if enabled logging changes the behavior of the driver - its another funny story. smile
I'm trying to find better place to use powergating state switch - I want it to be triggered only in case of problem, not on any GPU operation.
Was the crash similar to unpatched kernel? This build was aimed to check if it fixes amdgpu_gpu_recovery mechanism. But in case of crash, seems like the recovery problem is totally on firmware side.

Last edited by Mechanicus (2025-02-01 20:27:42)

Offline

#406 2025-02-01 19:47:33

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:
Mechanicus wrote:

Who has some time please test linux-ring-recovery build. https://bbs.archlinux.org/viewtopic.php … 1#p2223911

I'm about 20hr into testing linux-amdgpu 6.13.arch1-2. This setup is working so well including passing my overnight test, I hate to stop the testing early.

Considering these findings,  https://bbs.archlinux.org/viewtopic.php … 6#p2223896  would you prefer I move onto testing 'linux-ring-recovery' or would you rather have the results from longer term linux-amdgpu 6.13.arch1-2 testing?

I've also set the contents of /sys/module/drm/parameters/debug to '15'. Would there be any mesa debug logging available (location of file or data) other than the journal, and would you be interested in it at this point. If it's only in the journal, could I search/grep for a term that would filter it out?

Yes. 0xf in hexadecimal is 15 decimal.

EDIT:
I've found I can consistently induce a graphical freeze in XFCE by entering the command 'radeontop' in 'Application Finder' , [Alt]+[F2].
With my current setup, it's recoverable via switching tty and killing 'radeontop, and 'xinit', then rerunning 'startx'.
Since this isn't a complete system freeze with this setup anyway, it may not prove useful in producing a system freeze with different mesa setups.
Thought this was interesting none the less and worth passing on.

Continue your test of linux-amdgpu 6.13.arch1-2 please.
Regarding the freeze with radeontop - that's different story. Since the fixes included in this build and the fix that is going to be merged in mainstream added additional registers access running any application that tries to read GPU registers as well could be now tricky...
update: could you post dmesg when radeontop was running?

Last edited by Mechanicus (2025-02-01 20:46:18)

Offline

#407 2025-02-01 20:14:07

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

grayich wrote:

Is it normal that after "cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover" the GPU frequency rises to the maximum (1240 MHz) and does not drop?
GPU load is 99%, according to nvtop
Ryzen 3 3200G, Vega8

On what build?

Offline

#408 2025-02-01 20:43:52

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

AMDGPU
Build: linux-ring-recovery-6.13.1.arch1-2 - freezes
Included patches:
- Extend amdgpu_ring_soft_recovery function wih PG control v2 (now try to recover GFX and Compute rings)

What to check:
- stability

Kernel option to keep during testing period: fsck.mode=force

Last edited by Mechanicus (2025-02-01 22:34:09)

Offline

#409 2025-02-01 21:38:06

grayich
Member
Registered: 2012-02-24
Posts: 4

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

On what build?

Any

6.12.10-arch1-1
mesa 24.3.4
mesa-test-git 25.0.0_devel.200908.66775c89fce-1

Offline

#410 2025-02-01 21:50:40

flemingfleming
Member
Registered: 2024-12-27
Posts: 14

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

Was the crash similar to unpatched kernel? This build was aimed to check if it fixes amdgpu_gpu_recovery mechanism.

Sorry for the confusion. I was testing and reporting the "original" freeze that happens randomly, it makes sense if this build isn't supposed to try and fix that (so far only linux-amdgpu 6.13.arch1-2 seems to fix this).

All of the kernels I have tried so far have crashed on trying to read /sys/kernel/debug/dri/1/amdgpu_gpu_recover, regardless of logging output.

Mechanicus wrote:

Well, if enabled logging changes the behavior of the driver - its another funny story. smile

It's probably just affecting the way I was triggering it. When I enable all the logging, performance goes down massively (from about 60 to 30 fps when playing videos and similar).

With your latest linux-ring-recovery-6.13.1.arch1-2, I triggered the second type of crash (the recovery crash) by reading /sys/kernel/debug/dri/1/amdgpu_gpu_recover with the kernel parameter drm.debug=0x2 and it got some debug output I've recorded here:

https://gist.github.com/fleming-2/dc963 … tcrash-log

Offline

#411 2025-02-01 22:03:09

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Thank you, @flemingfleming!

Offline

#412 2025-02-02 03:45:22

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I tested linux-ring-recovery-6.13.arch1-1 with drm.debug=0x2 triggered by writing to the parameter post-boot. I tested amdgpu_gpu_recovery. It managed to bring my displays back up almost immediately, but CRTC flipping hung, and the GPU never recovered. I generated a dmesg log:

https://gist.github.com/kode54/ddc47a5c … dacf482888

And from this excerpt, you can see a message being spammed immediately the moment I turned on drm.debug=0x2:

[   92.165875] amdgpu 0000:03:00.0: [drm:amdgpu_dm_crtc_vblank_control_worker [amdgpu]] dc_allow_idle_optimizations_internal: enabled
[   92.239661] amdgpu 0000:03:00.0: [drm:amdgpu_dm_crtc_vblank_control_worker [amdgpu]] dc_allow_idle_optimizations_internal: disabled

Offline

#413 2025-02-02 04:53:18

NotAnArchUser
Member
Registered: 2025-01-25
Posts: 9

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
NotAnArchUser wrote:

Mechanicus, I'm now testing patches proposed by Alex Deucher on GitLab. So I suppose my kernel is now equivalent to your linux-amdgpu 6.13.arch1-2 to some degree?

Correct. The differences are that my changes affect all chips and puts GPU in power efficient mode faster.

Unfortunately I've got another freeze using mpv last night (freeze may be unrelated to bug)

Last edited by NotAnArchUser (2025-02-02 11:17:17)

Offline

#414 2025-02-02 07:35:42

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

6.13.arch1-2 holding steady for me. This includes gaming and running mpv HEVC decoding using gpu-next output.

Offline

#415 2025-02-02 08:26:54

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
grayich wrote:

Is it normal that after "cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover" the GPU frequency rises to the maximum (1240 MHz) and does not drop?
GPU load is 99%, according to nvtop
Ryzen 3 3200G, Vega8

On what build?

Same thing happens to me as well (also 6.12.10-arch1-1 and mesa from the repo), interesting is that according to the radeontop, only graphics pipe is 100% utilized, all the rest are at 0-variable%. 3400G.

Graphics pipe 100,00%
Event Engine   0,00%
Vertex Grouper + Tesselator   0,00%
Texture Addresser   0,00%
Shader Export   0,00%
Sequencer Instruction Cache   0,00%
Shader Interpolator   0,00%
Scan Converter   0,00%
Primitive Assembly   0,00%
Depth Block   0,00%
Color Block   0,00%
52M / 40M VRAM 129,43%
516M / 15950M GTT   3,24%
1,33G / 1,33G Memory Clock 100,00%
1,40G / 1,40G Shader Clock 100,00%                           

Confirmed with:

watch -n1 sudo 'cat /sys/kernel/debug/dri/1/amdgpu_pm_info | grep "MHz"'

Last edited by lpr1 (2025-02-02 08:30:48)

Offline

#416 2025-02-02 08:35:02

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

As an aside, also noting once again that amdgpu_top exists.

Offline

#417 2025-02-02 08:47:25

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Report: @ ~35 hours on the following setup:

pacman -Q linux-amdgpu : linux-amdgpu 6.13.arch1-2
uname -rs                            : Linux 6.13.0-arch1-2-amdgpu
pacman -Q mesa                : mesa 1:24.3.4-1

Kernel parameters. (nothing added to normal config)
cat /proc/cmdline  : .............. rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force


This test kernel seems pretty robust. Have not been able to induce system freeze.

Regarding the freeze mentioned previously  https://bbs.archlinux.org/viewtopic.php … 8#p2223938 
Seems it's only freezing the XFCE session, caused from trying to start the terminal application 'radentop' in the XFCE "Application Finder" which is intended for apps listed in the menu. Entering other terminal apps result in nothing, as if it ignores the commands altogether.  During a freeze, I can switch tty, log in as a different user and start a new xfce session via startx.  It's only the 'radeontop' application that causes this freeze, and there's nothing logged to journal/dmesg. I also believe it's unrelated to mesa or the kernel.

Let me know if I can provide any additional info and/or switch testing to something different.

Last edited by NuSkool (2025-02-02 08:57:10)

Offline

#418 2025-02-02 09:19:16

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NotAnArchUser wrote:
Mechanicus wrote:
NotAnArchUser wrote:

Mechanicus, I'm now testing patches proposed by Alex Deucher on GitLab. So I suppose my kernel is now equivalent to your linux-amdgpu 6.13.arch1-2 to some degree?

Correct. The differences are that my changes affect all chips and puts GPU in power efficient mode faster.

Unfortunately I've got another freeze using mpv last night

That's why I didn't support the proposed "fix". And that's why we are here and doing experiments.

Offline

#419 2025-02-02 09:22:07

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:

Report: @ ~35 hours on the following setup:

pacman -Q linux-amdgpu : linux-amdgpu 6.13.arch1-2
uname -rs                            : Linux 6.13.0-arch1-2-amdgpu
pacman -Q mesa                : mesa 1:24.3.4-1

Kernel parameters. (nothing added to normal config)
cat /proc/cmdline  : .............. rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force


This test kernel seems pretty robust. Have not been able to induce system freeze.

Regarding the freeze mentioned previously  https://bbs.archlinux.org/viewtopic.php … 8#p2223938 
Seems it's only freezing the XFCE session, caused from trying to start the terminal application 'radentop' in the XFCE "Application Finder" which is intended for apps listed in the menu. Entering other terminal apps result in nothing, as if it ignores the commands altogether.  During a freeze, I can switch tty, log in as a different user and start a new xfce session via startx.  It's only the 'radeontop' application that causes this freeze, and there's nothing logged to journal/dmesg. I also believe it's unrelated to mesa or the kernel.

Let me know if I can provide any additional info and/or switch testing to something different.

"there's nothing logged to journal/dmesg" - that's weird. I was expected to see warnings about register read timeouts.

Offline

#420 2025-02-02 09:32:45

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Sorry, but I'm just going to getting lost in this thread.
By now I'm not sure if I'm testing the same linux-amdgpu 6.13.arch1-2 than @NuSkool or which version is testing other people. I've also lost which Mesa versions are being testing now.

If you think it could help, I've done a github project for using only the issues part so we can use one issue per test and have all the results gathered together. I think this could made developers and testers job easier and anyone who wants to join the solution search can have a place to start without reading this ever-growing thread.

This is the link to the project: https://github.com/pacoandres/laikm.
Anyone who wants permissions on it, just ask me. Any suggestions or criticisms are welcome.

By now I think I'm testing the same configuration than @NuSkool with no issues.

Offline

#421 2025-02-02 10:21:18

NotAnArchUser
Member
Registered: 2025-01-25
Posts: 9

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

That's why I didn't support the proposed "fix". And that's why we are here and doing experiments.

Do you think I should try your kernel? If so can you share it like patches please?

Besides I've realized I could cause a sort of confusion here because I still experience freezes even with mesa 24.2.8

Last edited by NotAnArchUser (2025-02-02 10:26:17)

Offline

#422 2025-02-02 10:38:10

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:

linux-amdgpu 6.13.arch1-2

What to check:
- idle stability
- workflow stability
- glxgears window resizing
- vkcube window resizing

What to report:
- stability
- performance
- temperatures
- power consumption (if available)
- dmesg (if possible)

Kernel option to keep during testing period: fsck.mode=force

- idle stability: OK
- workflow stability: OK
- glxgears window resizing: OK
- vkcube window resizing: OK both X11 and -wayland variants

- Also played a 4k HEVC movie for about 20 minutes to test decoder stability

- stability: Stable for all intents and purposes
- performance: Seems to be fine
- temperatures: chart here
- power consumption: not available to Beszel for some reason
- dmesg: gist

Kernel commandline: (/proc/cmdline)

splash quiet root=UUID=23028a63-b140-422f-a6d6-9fc61a90db87 rw rootfstype=btrfs rootflags=rw,relatime,compress=zstd:3,ssd,discard=async,space_cache=v2,subvolid=256,subvol=/@ lsm=landlock,lockdown,yama,integrity,apparmor,bpf usbcore.autosuspend=-1 fsck.mode=force

Offline

#423 2025-02-02 10:44:56

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NotAnArchUser wrote:
Mechanicus wrote:

That's why I didn't support the proposed "fix". And that's why we are here and doing experiments.

Do you think I should try your kernel? If so can you share it like patches please?

Besides I've realized I could cause a sort of confusion here because I still experience freezes even with mesa 24.2.8

I update patches on my Linux branch. Also you can download them from here: https://drive.google.com/drive/folders/ … drive_link
These patches are included in linux-amdgpu 6.13.arch1-2, the most stable one.

Last edited by Mechanicus (2025-02-02 10:55:41)

Offline

#424 2025-02-02 11:06:54

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I'm building a new kernel from linux-zen from Arch repos, with the only change being to apply your latest three patches. Let me know if there's anything different I should test.

Offline

#425 2025-02-02 12:29:46

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

AMDGPU testing
Build: linux-amdgpu-6.13.1.arch1-5 - freezes.
Included patches:
- Disallow GFXOFF during amdgpu_ring_commit in amdgpu_ib_schedule

Build: linux-amdgpu-testing-6.13.1.arch1-1 - freezes.
Included patches:
- Deny GFXOFF for amdgpu_job_run

Last edited by Mechanicus (2025-02-02 15:33:58)

Offline

Board footer

Powered by FluxBB