You are not logged in.

#351 Yesterday 10:45:23

pacmancrashedagain
Member
Registered: 2024-12-14
Posts: 19

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

@orbit-oc

Yes, you are 100% right, sorry, i always thought mine was Raven Ridge.

    Device: AMD Radeon Vega 8 Graphics (radeonsi, raven, ACO, DRM 3.54, 6.6.72-1-lts) (0x15d8)
OpenGL renderer string: AMD Radeon Vega 8 Graphics (radeonsi, raven, ACO, DRM 3.54, 6.6.72-1-lts)

Last edited by pacmancrashedagain (Yesterday 12:15:52)

Offline

#352 Yesterday 11:47:53

NotAnArchUser
Member
Registered: 2025-01-25
Posts: 6

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Another freeze on unpatched kernel with amdgpu.ppfeaturemask=0xffff7bcf (which includes proposed 0xf7fff) after 3 days of uptime and active usage.

I've been using PP_GFXOFF_MASK feature mask (0xf7fff) for more than a month and freezes never stopped because of that. Now it just happens less frequently and I'm not sure if this exact mask alone is responsible for that.

Btw with glxinfo | grep radeonsi my system reports raven2 iGPU architecture, but CPU itself (Athlon 300u) has picasso architecture.

P.S.: Using amdgpu.mes=1 had no effect on system stability. Dropping it now.

Offline

#353 Yesterday 12:27:42

Mechanicus
Member
Registered: 2025-01-13
Posts: 48

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

For All AMDGPU

Changelog: https://github.com/SeryogaBrigada/linux … 13-amdgpu/

Don't forget to select the test kernel manually in boot menu.
Download link: https://drive.google.com/drive/folders/ … KOx34jmcRx

Last edited by Mechanicus (Yesterday 20:09:52)

Offline

#354 Yesterday 12:43:07

pacoandres
Member
Registered: 2020-03-05
Posts: 20

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

As my system gets hot with the kernel parameter I've monitored the APU power consumption with the Mesa patch and with the kernel parameter while working as usually. Each number is the 2 hours working average consumption.

  • Mesa patch with no kernel parameters ~8 W

  • Kernel parameter with no Mesa patch ~11 W

Now I'm going to test @Mechanicus kernel patch https://bbs.archlinux.org/viewtopic.php … 0#p2223480

Last edited by pacoandres (Yesterday 12:43:45)

Offline

#355 Yesterday 12:44:06

SnowF
Member
Registered: 2025-01-17
Posts: 10

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NotAnArchUser wrote:

Another freeze on unpatched kernel with amdgpu.ppfeaturemask=0xffff7bcf (which includes proposed 0xf7fff) after 3 days of uptime and active usage.

I've been using PP_GFXOFF_MASK feature mask (0xf7fff) for more than a month and freezes never stopped because of that. Now it just happens less frequently and I'm not sure if this exact mask alone is responsible for that.

Btw with glxinfo | grep radeonsi my system reports raven2 iGPU architecture, but CPU itself (Athlon 300u) has picasso architecture.

P.S.: Using amdgpu.mes=1 had no effect on system stability. Dropping it now.

Can you try the following and replicate a crash?

amdgpu.ppfeaturemask=0xfff77fff

Offline

#356 Yesterday 15:29:27

NotAnArchUser
Member
Registered: 2025-01-25
Posts: 6

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

SnowF wrote:

Can you try the following and replicate a crash?

I have no way to reliable reproduce the crash. Besides why are you proposing this feature flag?

Still, PP_GFX_DCS_MASK will be turned off in my mask at least until next freeze. I don't think it'll do any harm.

Offline

#357 Yesterday 17:42:16

pacoandres
Member
Registered: 2020-03-05
Posts: 20

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
For All AMDGPU

I've found that in every driver the registry access commands using 100ms delay to control GFXOFF (see https://github.com/torvalds/linux/blob/ … gfx.c#L812 and https://github.com/torvalds/linux/blob/ … gfx.c#L38).
The drivers control GFXOFF in the next way (https://github.com/torvalds/linux/blob/ … _0.c#L4287):
amdgpu_gfx_off_ctrl(adev, false); <-- turn GPU ON with delay
... read/write registers ..
amdgpu_gfx_off_ctrl(adev, true); <-- turn GPU OFF for power saving with delay

Since the calls like that are executing in every internal queue (kernel interface queue, graphics queue, compute queue, video core queue) there is a possibility that pending GFXOFF from one queue will be executed while registers in another queue are being accessed, resulting in a hang.
The current patch I made changes delay from 100ms to 0. Please test the updated build without extra kernel options. Please uninstall the previous build manually before installation.
Don't forget to select the test kernel manually in boot menu.
https://drive.google.com/drive/folders/ … KOx34jmcRx

After 1 hour 30 min, aprox, the system freeze with this kernel without parameters and official mesa 24.3.4.
The only log is:

ene 30 18:28:32 monelle kernel: amdgpu 0000:09:00.0: amdgpu: Dumping IP State
ene 30 18:28:41 monelle kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 3665974895 wd_nsec: 3665972630
ene 30 18:28:41 monelle kernel: sched: DL replenish lagged too much

(I think the last two lines aren't related to the bug).

As a reminder, I've a Ryzen 5 3400G (Picasso architecture)

Offline

#358 Yesterday 17:47:33

Mechanicus
Member
Registered: 2025-01-13
Posts: 48

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Big thanks, @pacoandres! Continue investigation.

Offline

#359 Yesterday 19:11:27

SnowF
Member
Registered: 2025-01-17
Posts: 10

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
For All AMDGPU

I've found that in every driver the registry access commands using 100ms delay to control GFXOFF (see https://github.com/torvalds/linux/blob/ … gfx.c#L812 and https://github.com/torvalds/linux/blob/ … gfx.c#L38).
The drivers control GFXOFF in the next way (https://github.com/torvalds/linux/blob/ … _0.c#L4287):
amdgpu_gfx_off_ctrl(adev, false); <-- turn GPU ON with delay
... read/write registers ..
amdgpu_gfx_off_ctrl(adev, true); <-- turn GPU OFF for power saving with delay

Since the calls like that are executing in every internal queue (kernel interface queue, graphics queue, compute queue, video core queue) there is a possibility that pending GFXOFF from one queue will be executed while registers in another queue are being accessed, resulting in a hang.
The current patch I made changes delay from 100ms to 0. Please test the updated build without extra kernel options. Please uninstall the previous build manually before installation.
Don't forget to select the test kernel manually in boot menu.
https://drive.google.com/drive/folders/ … KOx34jmcRx

Freeze watching a 4k video 20~25 minutes from the boot. I tried without hardware acceleration and faster crashing. Without parameters.
The only thing that still doesn't crash me under any load level is amdgpu.ppfeaturemask=0xf7fff
Ryzen 3 3200G (Picasso) gfx9 Vega 8.

Last edited by SnowF (Yesterday 19:11:53)

Offline

#360 Yesterday 19:23:46

flemingfleming
Member
Registered: 2024-12-27
Posts: 7

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I can now reproduce the issue more easily. Testing amdgpu.mes=1 seemed to work at first but I was able to trigger the freeze deliberately, so only an improvement. amdgpu.featuremask=0xfff73fff seems to work, I can't trigger a freeze with it.

I'll try the provided kernel next.

Offline

#361 Yesterday 19:54:08

Mechanicus
Member
Registered: 2025-01-13
Posts: 48

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Thanks guys! I went deeper and cleaned GFX9 part up from "temporary workarounds" and moved some calls to general part. If this works, the amdgpu driver will lose a lot of weight soon.
New build is ready on the same link.

Last edited by Mechanicus (Yesterday 20:24:33)

Offline

#362 Yesterday 22:50:32

NuSkool
Member
Registered: 2015-03-23
Posts: 195

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Test report for Lone_Wolf's latest patched mesa: https://bbs.archlinux.org/viewtopic.php … 0#p2223190

mesa-test-git 25.0.0_devel.200908.66775c89fce-1
All other relevant pkgs official Arch repos
No additional kernel parameters

Ran an 8HR long 4K vid overnight, and also intentionally heavy GPU loads trying to induce a freeze.
Running well with 24hr testing. No freezing or anything abnormal to report.
This will be my "reliable fallback" setup until a fix is released.


Switching setup to test Mechanicus patched kernel (above) next, with the following setup:

linux-test 6.13.arch1-1  uname -rs: Linux 6.13.0-arch1-1-test
mesa 1:24.3.4-1
All other relevant pkgs official Arch repos
No kernel parameters


Hardware:
CPU: Ryzen 5 PRO 2400GE w/ Radeon Vega 11 Graphics   Zen level: v3
Mesa chipset detection*: raven

$ glxinfo | grep radeonsi
    Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, ACO, DRM 3.59, 6.12.10-arch1-1) (0x15dd)
OpenGL renderer string: AMD Radeon Vega 11 Graphics (radeonsi, raven], ACO, DRM 3.59, 6.12.10-arch1-1)

* For details see: https://bbs.archlinux.org/viewtopic.php … 6#p2223446


EDIT:

I believe I've downloaded two different linux testing kernels that have the same "<name/version>pkg.tar.zst", at different times...
Could I get verification I'm testing the correct latest patched kernel please?

$ sudo md5sum /boot/vmlinuz-linux-test /boot/initramfs-linux-test.img           edit: Would this be/require reproducible?
01fbbd2254d12a8347102fe4c663d94d  /boot/vmlinuz-linux-test
85db26903d6d9f70ace421e0eeffe731  /boot/initramfs-linux-test.img

Or just the package name would work as well:

$ md5sum *
d132c21b04ddb401f1213e53dc47d0af  linux-test-6.13.arch1-1-x86_64.pkg.tar.zst
babfec3ec67a7ad42afd218ec77c02cf  linux-test-headers-6.13.arch1-1-x86_64.pkg.tar.zst

Last edited by NuSkool (Today 00:35:25)

Offline

#363 Yesterday 23:23:07

flemingfleming
Member
Registered: 2024-12-27
Posts: 7

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I was able to crash the most recent patched kernel from @Mechanicus (I didn't test the previous one since a new one was uploaded)

Only relevant message in journal was

Jan 30 23:14:44 v330-14arr kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State

Ryzen 5 2500U (Raven Ridge)

Offline

#364 Today 00:09:00

Mechanicus
Member
Registered: 2025-01-13
Posts: 48

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

ALL AMDGPU

Build: linux-amdgpu-6.13.arch1-1

Changelog:
- do not use 100ms delay for GFXOFF operations
- cleanup gfx_off workarounds in gfx_v9_0.c
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control
Details

What to report:
- stability
- performance
- temperatures
- power consumption (if available)

Download link

Last edited by Mechanicus (Today 00:36:37)

Offline

#365 Today 01:38:25

flemingfleming
Member
Registered: 2024-12-27
Posts: 7

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Was able to trigger the crash on this latest kernel from @Mechanicus too. Pretty bad freeze actually, REISUB didn't work, no messages in journal this time.

(Ryzen 5 2500U (Raven Ridge))

Offline

#366 Today 02:14:40

NuSkool
Member
Registered: 2015-03-23
Posts: 195

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Test report on Mechanicus patched kernel with the following setup:
This was the kernel before this new one "linux-amdgpu-6.13.arch1-1" was posted.

linux-test 6.13.arch1-1  uname -rs: Linux 6.13.0-arch1-1-test
official repo mesa 1:24.3.4-1
No kernel parameters

Ran about 3 hours before freezing.
No extra GPU stress/any testing other than watching youtube vids.
Nothing in journal to report.
Typical mesa freeze other than dropping to console on REISUB, but did not reboot.

Last edited by NuSkool (Today 02:27:16)

Offline

Board footer

Powered by FluxBB