You are not logged in.
@orbit-oc
Yes, you are 100% right, sorry, i always thought mine was Raven Ridge.
Device: AMD Radeon Vega 8 Graphics (radeonsi, raven, ACO, DRM 3.54, 6.6.72-1-lts) (0x15d8)
OpenGL renderer string: AMD Radeon Vega 8 Graphics (radeonsi, raven, ACO, DRM 3.54, 6.6.72-1-lts)
Last edited by pacmancrashedagain (Yesterday 12:15:52)
Offline
Another freeze on unpatched kernel with amdgpu.ppfeaturemask=0xffff7bcf (which includes proposed 0xf7fff) after 3 days of uptime and active usage.
I've been using PP_GFXOFF_MASK feature mask (0xf7fff) for more than a month and freezes never stopped because of that. Now it just happens less frequently and I'm not sure if this exact mask alone is responsible for that.
Btw with glxinfo | grep radeonsi my system reports raven2 iGPU architecture, but CPU itself (Athlon 300u) has picasso architecture.
P.S.: Using amdgpu.mes=1 had no effect on system stability. Dropping it now.
Offline
Changelog: https://github.com/SeryogaBrigada/linux … 13-amdgpu/
Don't forget to select the test kernel manually in boot menu.
Download link: https://drive.google.com/drive/folders/ … KOx34jmcRx
Last edited by Mechanicus (Yesterday 20:09:52)
Offline
As my system gets hot with the kernel parameter I've monitored the APU power consumption with the Mesa patch and with the kernel parameter while working as usually. Each number is the 2 hours working average consumption.
Mesa patch with no kernel parameters ~8 W
Kernel parameter with no Mesa patch ~11 W
Now I'm going to test @Mechanicus kernel patch https://bbs.archlinux.org/viewtopic.php … 0#p2223480
Last edited by pacoandres (Yesterday 12:43:45)
Offline
Another freeze on unpatched kernel with amdgpu.ppfeaturemask=0xffff7bcf (which includes proposed 0xf7fff) after 3 days of uptime and active usage.
I've been using PP_GFXOFF_MASK feature mask (0xf7fff) for more than a month and freezes never stopped because of that. Now it just happens less frequently and I'm not sure if this exact mask alone is responsible for that.
Btw with glxinfo | grep radeonsi my system reports raven2 iGPU architecture, but CPU itself (Athlon 300u) has picasso architecture.
P.S.: Using amdgpu.mes=1 had no effect on system stability. Dropping it now.
Can you try the following and replicate a crash?
amdgpu.ppfeaturemask=0xfff77fff
Offline
Can you try the following and replicate a crash?
I have no way to reliable reproduce the crash. Besides why are you proposing this feature flag?
Still, PP_GFX_DCS_MASK will be turned off in my mask at least until next freeze. I don't think it'll do any harm.
Offline
For All AMDGPU
I've found that in every driver the registry access commands using 100ms delay to control GFXOFF (see https://github.com/torvalds/linux/blob/ … gfx.c#L812 and https://github.com/torvalds/linux/blob/ … gfx.c#L38).
The drivers control GFXOFF in the next way (https://github.com/torvalds/linux/blob/ … _0.c#L4287):
amdgpu_gfx_off_ctrl(adev, false); <-- turn GPU ON with delay
... read/write registers ..
amdgpu_gfx_off_ctrl(adev, true); <-- turn GPU OFF for power saving with delaySince the calls like that are executing in every internal queue (kernel interface queue, graphics queue, compute queue, video core queue) there is a possibility that pending GFXOFF from one queue will be executed while registers in another queue are being accessed, resulting in a hang.
The current patch I made changes delay from 100ms to 0. Please test the updated build without extra kernel options. Please uninstall the previous build manually before installation.
Don't forget to select the test kernel manually in boot menu.
https://drive.google.com/drive/folders/ … KOx34jmcRx
After 1 hour 30 min, aprox, the system freeze with this kernel without parameters and official mesa 24.3.4.
The only log is:
ene 30 18:28:32 monelle kernel: amdgpu 0000:09:00.0: amdgpu: Dumping IP State
ene 30 18:28:41 monelle kernel: clocksource: Long readout interval, skipping watchdog check: cs_nsec: 3665974895 wd_nsec: 3665972630
ene 30 18:28:41 monelle kernel: sched: DL replenish lagged too much
(I think the last two lines aren't related to the bug).
As a reminder, I've a Ryzen 5 3400G (Picasso architecture)
Offline
Big thanks, @pacoandres! Continue investigation.
Offline
For All AMDGPU
I've found that in every driver the registry access commands using 100ms delay to control GFXOFF (see https://github.com/torvalds/linux/blob/ … gfx.c#L812 and https://github.com/torvalds/linux/blob/ … gfx.c#L38).
The drivers control GFXOFF in the next way (https://github.com/torvalds/linux/blob/ … _0.c#L4287):
amdgpu_gfx_off_ctrl(adev, false); <-- turn GPU ON with delay
... read/write registers ..
amdgpu_gfx_off_ctrl(adev, true); <-- turn GPU OFF for power saving with delaySince the calls like that are executing in every internal queue (kernel interface queue, graphics queue, compute queue, video core queue) there is a possibility that pending GFXOFF from one queue will be executed while registers in another queue are being accessed, resulting in a hang.
The current patch I made changes delay from 100ms to 0. Please test the updated build without extra kernel options. Please uninstall the previous build manually before installation.
Don't forget to select the test kernel manually in boot menu.
https://drive.google.com/drive/folders/ … KOx34jmcRx
Freeze watching a 4k video 20~25 minutes from the boot. I tried without hardware acceleration and faster crashing. Without parameters.
The only thing that still doesn't crash me under any load level is amdgpu.ppfeaturemask=0xf7fff
Ryzen 3 3200G (Picasso) gfx9 Vega 8.
Last edited by SnowF (Yesterday 19:11:53)
Offline
I can now reproduce the issue more easily. Testing amdgpu.mes=1 seemed to work at first but I was able to trigger the freeze deliberately, so only an improvement. amdgpu.featuremask=0xfff73fff seems to work, I can't trigger a freeze with it.
I'll try the provided kernel next.
Offline
Thanks guys! I went deeper and cleaned GFX9 part up from "temporary workarounds" and moved some calls to general part. If this works, the amdgpu driver will lose a lot of weight soon.
New build is ready on the same link.
Last edited by Mechanicus (Yesterday 20:24:33)
Offline
Test report for Lone_Wolf's latest patched mesa: https://bbs.archlinux.org/viewtopic.php … 0#p2223190
mesa-test-git 25.0.0_devel.200908.66775c89fce-1
All other relevant pkgs official Arch repos
No additional kernel parameters
Ran an 8HR long 4K vid overnight, and also intentionally heavy GPU loads trying to induce a freeze.
Running well with 24hr testing. No freezing or anything abnormal to report.
This will be my "reliable fallback" setup until a fix is released.
Switching setup to test Mechanicus patched kernel (above) next, with the following setup:
linux-test 6.13.arch1-1 uname -rs: Linux 6.13.0-arch1-1-test
mesa 1:24.3.4-1
All other relevant pkgs official Arch repos
No kernel parameters
Hardware:
CPU: Ryzen 5 PRO 2400GE w/ Radeon Vega 11 Graphics Zen level: v3
Mesa chipset detection*: raven
$ glxinfo | grep radeonsi
Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, ACO, DRM 3.59, 6.12.10-arch1-1) (0x15dd)
OpenGL renderer string: AMD Radeon Vega 11 Graphics (radeonsi, raven], ACO, DRM 3.59, 6.12.10-arch1-1)
* For details see: https://bbs.archlinux.org/viewtopic.php … 6#p2223446
EDIT:
I believe I've downloaded two different linux testing kernels that have the same "<name/version>pkg.tar.zst", at different times...
Could I get verification I'm testing the correct latest patched kernel please?
$ sudo md5sum /boot/vmlinuz-linux-test /boot/initramfs-linux-test.img edit: Would this be/require reproducible?
01fbbd2254d12a8347102fe4c663d94d /boot/vmlinuz-linux-test
85db26903d6d9f70ace421e0eeffe731 /boot/initramfs-linux-test.img
Or just the package name would work as well:
$ md5sum *
d132c21b04ddb401f1213e53dc47d0af linux-test-6.13.arch1-1-x86_64.pkg.tar.zst
babfec3ec67a7ad42afd218ec77c02cf linux-test-headers-6.13.arch1-1-x86_64.pkg.tar.zst
Last edited by NuSkool (Today 00:35:25)
Offline
I was able to crash the most recent patched kernel from @Mechanicus (I didn't test the previous one since a new one was uploaded)
Only relevant message in journal was
Jan 30 23:14:44 v330-14arr kernel: amdgpu 0000:03:00.0: amdgpu: Dumping IP State
Ryzen 5 2500U (Raven Ridge)
Offline
Build: linux-amdgpu-6.13.arch1-1
Changelog:
- do not use 100ms delay for GFXOFF operations
- cleanup gfx_off workarounds in gfx_v9_0.c
- extend amdgpu_gfx_*_ring_begin_use/end_use functions with PG control
Details
What to report:
- stability
- performance
- temperatures
- power consumption (if available)
Last edited by Mechanicus (Today 00:36:37)
Offline
Was able to trigger the crash on this latest kernel from @Mechanicus too. Pretty bad freeze actually, REISUB didn't work, no messages in journal this time.
(Ryzen 5 2500U (Raven Ridge))
Offline
Test report on Mechanicus patched kernel with the following setup:
This was the kernel before this new one "linux-amdgpu-6.13.arch1-1" was posted.
linux-test 6.13.arch1-1 uname -rs: Linux 6.13.0-arch1-1-test
official repo mesa 1:24.3.4-1
No kernel parameters
Ran about 3 hours before freezing.
No extra GPU stress/any testing other than watching youtube vids.
Nothing in journal to report.
Typical mesa freeze other than dropping to console on REISUB, but did not reboot.
Last edited by NuSkool (Today 02:27:16)
Offline