You are not logged in.
@orbit-oc
Yes, you are 100% right, sorry, i always thought mine was Raven Ridge.
Device: AMD Radeon Vega 8 Graphics (radeonsi, raven, ACO, DRM 3.54, 6.6.72-1-lts) (0x15d8)
OpenGL renderer string: AMD Radeon Vega 8 Graphics (radeonsi, raven, ACO, DRM 3.54, 6.6.72-1-lts)
Last edited by pacmancrashedagain (2025-01-30 12:15:52)
Offline
Another freeze on unpatched kernel with amdgpu.ppfeaturemask=0xffff7bcf (which includes proposed 0xf7fff) after 3 days of uptime and active usage.
I've been using PP_GFXOFF_MASK feature mask (0xf7fff) for more than a month and freezes never stopped because of that. Now it just happens less frequently and I'm not sure if this exact mask alone is responsible for that.
Btw with glxinfo | grep radeonsi my system reports raven2 iGPU architecture, but CPU itself (Athlon 300u) has picasso architecture.
P.S.: Using amdgpu.mes=1 had no effect on system stability. Dropping it now.
Offline
Changelog: https://github.com/SeryogaBrigada/linux … 13-amdgpu/
Don't forget to select the test kernel manually in boot menu.
Download link: https://drive.google.com/drive/folders/ … KOx34jmcRx
Last edited by Mechanicus (2025-01-30 20:09:52)
Offline
As my system gets hot with the kernel parameter I've monitored the APU power consumption with the Mesa patch and with the kernel parameter while working as usually. Each number is the 2 hours working average consumption.
Mesa patch with no kernel parameters ~8 W
Kernel parameter with no Mesa patch ~11 W
Now I'm going to test @Mechanicus kernel patch https://bbs.archlinux.org/viewtopic.php … 0#p2223480
Last edited by pacoandres (2025-01-30 12:43:45)
Offline
Another freeze on unpatched kernel with amdgpu.ppfeaturemask=0xffff7bcf (which includes proposed 0xf7fff) after 3 days of uptime and active usage.
I've been using PP_GFXOFF_MASK feature mask (0xf7fff) for more than a month and freezes never stopped because of that. Now it just happens less frequently and I'm not sure if this exact mask alone is responsible for that.
Btw with glxinfo | grep radeonsi my system reports raven2 iGPU architecture, but CPU itself (Athlon 300u) has picasso architecture.
P.S.: Using amdgpu.mes=1 had no effect on system stability. Dropping it now.
Can you try the following and replicate a crash?
amdgpu.ppfeaturemask=0xfff77fff
Offline
Can you try the following and replicate a crash?
I have no way to reliable reproduce the crash. Besides why are you proposing this feature flag?
Still, PP_GFX_DCS_MASK will be turned off in my mask at least until next freeze. I don't think it'll do any harm.
Offline
For All AMDGPU
I've found that in every driver the registry access commands using 100ms delay to control GFXOFF (see https://github.com/torvalds/linux/blob/ … gfx.c#L812 and https://github.com/torvalds/linux/blob/ … gfx.c#L38).
The drivers control GFXOFF in the next way (https://github.com/torvalds/linux/blob/ … _0.c#L4287):
amdgpu_gfx_off_ctrl(adev, false); <-- turn GPU ON with delay
... read/write registers ..
amdgpu_gfx_off_ctrl(adev, true); <-- turn GPU OFF for power saving with delaySince the calls like that are executing in every internal queue (kernel interface queue, graphics queue, compute queue, video core queue) there is a possibility that pending GFXOFF from one queue will be executed while registers in another queue are being accessed, resulting in a hang.
The current patch I made changes delay from 100ms to 0. Please test the updated build without extra kernel options. Please uninstall the previous build manually before installation.
Don't forget to select the test kernel manually in boot menu.
https://drive.google.com/drive/folders/ … KOx34jmcRx
After 1 hour 30 min, aprox, the system freeze with this kernel without parameters and official mesa 24.3.4.
The only log is:
ene 30 18:28:32 monelle kernel: amdgpu 0000:09:00.0: amdgpu: Dumping IP State
ene 30 18:28:41 monelle kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 3665974895 wd_nsec: 3665972630
ene 30 18:28:41 monelle kernel: sched: DL replenish lagged too much
(I think the last two lines aren't related to the bug).
As a reminder, I've a Ryzen 5 3400G (Picasso architecture)
Offline
Big thanks, @pacoandres! Continue investigation.
Offline
For All AMDGPU
I've found that in every driver the registry access commands using 100ms delay to control GFXOFF (see https://github.com/torvalds/linux/blob/ … gfx.c#L812 and https://github.com/torvalds/linux/blob/ … gfx.c#L38).
The drivers control GFXOFF in the next way (https://github.com/torvalds/linux/blob/ … _0.c#L4287):
amdgpu_gfx_off_ctrl(adev, false); <-- turn GPU ON with delay
... read/write registers ..
amdgpu_gfx_off_ctrl(adev, true); <-- turn GPU OFF for power saving with delaySince the calls like that are executing in every internal queue (kernel interface queue, graphics queue, compute queue, video core queue) there is a possibility that pending GFXOFF from one queue will be executed while registers in another queue are being accessed, resulting in a hang.
The current patch I made changes delay from 100ms to 0. Please test the updated build without extra kernel options. Please uninstall the previous build manually before installation.
Don't forget to select the test kernel manually in boot menu.
https://drive.google.com/drive/folders/ … KOx34jmcRx
Freeze watching a 4k video 20~25 minutes from the boot. I tried without hardware acceleration and faster crashing. Without parameters.
The only thing that still doesn't crash me under any load level is amdgpu.ppfeaturemask=0xf7fff
Ryzen 3 3200G (Picasso) gfx9 Vega 8.
Last edited by SnowF (2025-01-30 19:11:53)
Offline
I can now reproduce the issue more easily. Testing amdgpu.mes=1 seemed to work at first but I was able to trigger the freeze deliberately, so only an improvement. amdgpu.featuremask=0xfff73fff seems to work, I can't trigger a freeze with it.
I'll try the provided kernel next.
Offline
Thanks guys! I went deeper and cleaned GFX9 part up from "temporary workarounds" and moved some calls to general part. If this works, the amdgpu driver will lose a lot of weight soon.
New build is ready on the same link.
Last edited by Mechanicus (2025-01-30 20:24:33)
Offline
Test report for Lone_Wolf's latest patched mesa: https://bbs.archlinux.org/viewtopic.php … 0#p2223190
mesa-test-git 25.0.0_devel.200908.66775c89fce-1
All other relevant pkgs official Arch repos
No additional kernel parameters
Ran an 8HR long 4K vid overnight, and also intentionally heavy GPU loads trying to induce a freeze.
Running well with 24hr testing. No freezing or anything abnormal to report.
This will be my "reliable fallback" setup until a fix is released.
Switching setup to test Mechanicus patched kernel (above) next, with the following setup:
linux-test 6.13.arch1-1 uname -rs: Linux 6.13.0-arch1-1-test
mesa 1:24.3.4-1
All other relevant pkgs official Arch repos
No kernel parameters
Hardware:
CPU: Ryzen 5 PRO 2400GE w/ Radeon Vega 11 Graphics Zen level: v3
Mesa chipset detection*: raven
$ glxinfo | grep radeonsi
Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, ACO, DRM 3.59, 6.12.10-arch1-1) (0x15dd)
OpenGL renderer string: AMD Radeon Vega 11 Graphics (radeonsi, raven], ACO, DRM 3.59, 6.12.10-arch1-1)
* For details see: https://bbs.archlinux.org/viewtopic.php … 6#p2223446
EDIT:
I believe I've downloaded two different linux testing kernels that have the same "<name/version>pkg.tar.zst", at different times...
Could I get verification I'm testing the correct latest patched kernel please?
$ sudo md5sum /boot/vmlinuz-linux-test /boot/initramfs-linux-test.img edit: Would this be/require reproducible?
01fbbd2254d12a8347102fe4c663d94d /boot/vmlinuz-linux-test
85db26903d6d9f70ace421e0eeffe731 /boot/initramfs-linux-test.img
Or just the package name would work as well:
$ md5sum *
d132c21b04ddb401f1213e53dc47d0af linux-test-6.13.arch1-1-x86_64.pkg.tar.zst
babfec3ec67a7ad42afd218ec77c02cf linux-test-headers-6.13.arch1-1-x86_64.pkg.tar.zst
Last edited by NuSkool (2025-01-31 00:35:25)
Scripts I use: https://github.com/Cody-Learner
Offline
I was able to crash the most recent patched kernel from @Mechanicus (I didn't test the previous one since a new one was uploaded)
Only relevant message in journal was
Jan 30 23:14:44 v330-14arr kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Ryzen 5 2500U (Raven Ridge)
Offline
Build: linux-amdgpu-6.13.arch1-1
Changelog:
- do not use 100ms delay for GFXOFF operations
- cleanup gfx_off workarounds in gfx_v9_0.c
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control
Details
What to report:
- stability
- performance
- temperatures
- power consumption (if available)
Last edited by Mechanicus (2025-01-31 00:36:37)
Offline
Was able to trigger the crash on this latest kernel from @Mechanicus too. Pretty bad freeze actually, REISUB didn't work, no messages in journal this time.
(Ryzen 5 2500U (Raven Ridge))
Offline
Test report on Mechanicus patched kernel with the following setup:
This was the kernel before this new one "linux-amdgpu-6.13.arch1-1" was posted.
linux-test 6.13.arch1-1 uname -rs: Linux 6.13.0-arch1-1-test
official repo mesa 1:24.3.4-1
No kernel parameters
Ran about 3 hours before freezing.
No extra GPU stress/any testing other than watching youtube vids.
Nothing in journal to report.
Typical mesa freeze other than dropping to console on REISUB, but did not reboot.
Last edited by NuSkool (2025-01-31 02:27:16)
Scripts I use: https://github.com/Cody-Learner
Offline
ALL AMDGPU
Build: linux-amdgpu-6.13.arch1-1
Changelog:
- do not use 100ms delay for GFXOFF operations
- cleanup gfx_off workarounds in gfx_v9_0.c
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control
DetailsWhat to report:
- stability
- performance
- temperatures
- power consumption (if available)
Freeze in less than 1 minute with just and empty desktop, twice. No logs about it.
I've been reading documentation but I can't find a way to force verbose logs for amdgpu. Does any one know if it's possible?
Offline
I've been reading documentation but I can't find a way to force verbose logs for amdgpu. Does any one know if it's possible?
Try
echo 0xf > /sys/module/drm/parameters/debug
Offline
pacoandres wrote:I've been reading documentation but I can't find a way to force verbose logs for amdgpu. Does any one know if it's possible?
Try
echo 0xf > /sys/module/drm/parameters/debug
Thanks. I'll turn it on when testing.
Offline
glibc-2.40
Build: linux-amdgpu 6.13.arch1-2 - stable.
Included patches:
- do not use 100ms delay for GFXOFF operations
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control (updated)
Build: linux-ring-test 6.13.arch1-1 - freezes.
Included patches:
- Add-extended-ring-test-to-amdgpu_ring_test_helper
Build: linux-amdgpu 6.13.arch1-3 - still freeze.
Included patches:
- Do-not-use-100ms-delay-for-GFXOFF-operations
- Cleanup-gfx_off-workarounds-in-gfx_v9_0.c
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control (v2)
What to check:
- idle stability
- workflow stability
- glxgears window resizing
- vkcube window resizing
What to report:
- stability
- performance
- temperatures
- power consumption (if available)
- dmesg (if possible)
Kernel option to keep during testing period: fsck.mode=force
Last edited by Mechanicus (2025-02-04 17:11:38)
Offline
@Mechanicus
Hardware: Vega 8 / Ryzen 3 3200G (Picasso)
Kernel options: none
Session duration: 30m - 1h.
Crash: Yes
Cause: Window resizing (Steam), Videos, switching workspace, Discord, screen sharing, maximizing window to full screen.
Build: linux-amdgpu 6.13.arch1-3, linux-amdgpu-headers-6.13.arch1-3
I tried about 5 times in a span of 4 hours.
The only thing that prevents my system to freeze: amdgpu.ppfeaturemask=0xf7fff
Last edited by SnowF (2025-01-31 17:44:56)
Offline
ALL AMDGPU
Build: linux-amdgpu 6.13.arch1-2, linux-amdgpu-headers-6.13.arch1-2
Included patches:
- do not use 100ms delay for GFXOFF operations
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control (updated)Build: linux-ring-test 6.13.arch1-1, linux-ring-test-headers-6.13.arch1-1
Included patches:
- Add-extended-ring-test-to-amdgpu_ring_test_helperBuild: linux-amdgpu 6.13.arch1-3, linux-amdgpu-headers-6.13.arch1-3
Included patches:
- Do-not-use-100ms-delay-for-GFXOFF-operations
- Cleanup-gfx_off-workarounds-in-gfx_v9_0.c
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control (v2)What to check:
- idle stability
- workflow stability
- glxgears window resizing
- vkcube window resizing
- sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover - may not work yet.What to report:
- stability
- performance
- temperatures
- power consumption (if available)
- dmesg (if possible)Kernel option to keep during testing period: fsck.mode=force
I've been testing linux-ring-test 6.13.arch1-1 for three hours with no freeze.
glxgears and vkcube resizing works well. Also does 'cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover' (returned 0 after a few seconds of black screen)
All the tests have been done in KDE plasma.
About performance, I haven't notice any lag but my work isn't so heavy for the CPU or the GPU. I've also watched a few HD videos on youtube with no lags.
Power consumption is sightly higher than using official kernel with patched mesa, about 5 Watts more when idle (12 W vs 7 W), and temperature is also higher.
I've enabled amdgpu logging while testing glxgears, vkcube and cat to amdgpu_gpu_recover. This dmesg part is here
I'll keep testing it.
Last edited by pacoandres (2025-01-31 19:17:35)
Offline
MR 33248 was merged to trunk in a slightly different form that only affects raven & raven2 chipsets.
mesa 25.0 hasn't been branched off yet, so 25.0 rc candidates and stable will have the change .
I have taken down the previous 25.0 builds and uploaded a new one with the change merged .I suggest the testers of the kernel parameters stick to 24.3.x mesa .
Thanks Lone_Wolf, your update mesa package fixed for AMD Ryzen 5 3400G - Vega 11
Offline
AMDGPU
Build: linux-amdgpu 6.13.arch1-4 - freezes.
Included patches:
- Do-not-use-100ms-delay-for-GFXOFF-operations
- Cleanup-gfx_off-workarounds-in-gfx_v9_0.c
- Modify-powergating-state-before-writing-to-KIQ
Details
Kernel option to keep during testing period: fsck.mode=force
Last edited by Mechanicus (2025-01-31 22:05:09)
Offline
Any feedback regarding linux-amdgpu 6.13.arch1-2?
Offline