You are not logged in.
Thanks for the thorough update on the status @Mechanicus!
I'd be game to test the kernel if those commits would be present in the AUR linux-git package. https://aur.archlinux.org/packages/linux-git.
Looks like it'll build current mainline.
Both your links say committed 2 weeks ago, and https://www.kernel.org/ says ''mainline: 6.13 2025-01-19', so seems AUR linux-git should build with those commits?
I'd give the default PKGBUILD build a shot in a clean chroot, but this will take a long while on my hardware.
Any suggestions before I get started building a kernel?
If/when I get through building the kernel, any suggestions on mesa related packages or parameters to start with?
Last edited by NuSkool (2025-01-27 19:26:00)
Offline
Update: there is now a proposed MR in mesa to fix the bisected bug! https://gitlab.freedesktop.org/mesa/mes … ests/33248
@Lone_Wolf
Maybe we should test it?
Offline
Thanks for the thorough update on the status @Mechanicus!
I'd be game to test the kernel if those commits would be present in the AUR linux-git package. https://aur.archlinux.org/packages/linux-git.
Looks like it'll build current mainline.Both your links say committed 2 weeks ago, and https://www.kernel.org/ says ''mainline: 6.13 2025-01-19', so seems AUR linux-git should build with those commits?
I'd give the default PKGBUILD build a shot in a clean chroot, but this will take a long while on my hardware.
Any suggestions before I get started building a kernel?
If/when I get through building the kernel, any suggestions on mesa related packages or parameters to start with?
Guess you need https://aur.archlinux.org/packages/linux-mainline or just install linux-6.13 from Arch Linux testing repo (https://archlinux.org/packages/core-tes … _64/linux/). With 6.13 no extra kernel options needed to check the expectations.
Last edited by Mechanicus (2025-01-27 20:56:48)
Offline
@kclisp
If i didn't misread the commit it looks like it's slightly different from the patch, hopefully it actually fixes it, because I still had one freeze with the patch.
Offline
Update: there is now a proposed MR in mesa to fix the bisected bug! https://gitlab.freedesktop.org/mesa/mes … ests/33248
@Lone_Wolf
Maybe we should test it?
Yup.
New binary uploaded : mesa trunk 84b660b9229 plus mesa MR 33248
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
@Lone_Wolf
Thanks! Build seems stable with regards to my reproducer.
Offline
Linux-6.13 from core-testing: after 2.5 hours of multiple browser windows (2 with WebGL samples and 2 with YouTube HW accelerated playback) the system froze when switching between windows.
Test with amdgpu.enforce_isolation=1: freeze when switching between multiple browser windows reproduced. But the error messages now different:
[ 681.153691] amdgpu 0000:07:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
[ 698.013230] amdgpu 0000:07:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ 714.563699] amdgpu 0000:07:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
[ 714.600222] amdgpu 0000:07:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
[ 721.573168] amdgpu 0000:07:00.0: amdgpu: Dumping IP State
[ 731.499225] iwlwifi 0000:04:00.0: Unhandled alg: 0x703
[ 747.269661] amdgpu 0000:07:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
[ 747.753099] amdgpu 0000:07:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Test amdgpu.cwsr_enable=0: no freezes regardless of system load so far.
Important: with amdgpu.cwsr_enable=0 the WebGL Aquarium FPS in Google Chrome increased from 23 to 30.
Last edited by Mechanicus (2025-01-28 14:32:16)
Offline
Linux-6.13 from core-testing: after 2.5 hours of multiple browser windows (2 with WebGL samples and 2 with YouTube HW accelerated playback) the system froze when switching between windows.
Test with amdgpu.enforce_isolation=1: freeze when switching between multiple browser windows reproduced. But the error messages now different:[ 681.153691] amdgpu 0000:07:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 [ 698.013230] amdgpu 0000:07:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 [ 714.563699] amdgpu 0000:07:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 [ 714.600222] amdgpu 0000:07:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6 [ 721.573168] amdgpu 0000:07:00.0: amdgpu: Dumping IP State [ 731.499225] iwlwifi 0000:04:00.0: Unhandled alg: 0x703 [ 747.269661] amdgpu 0000:07:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706 [ 747.753099] amdgpu 0000:07:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Testing with amdgpu.cwsr_enable=0 now.
Worked whole day with amdgpu.cwsr_enable=0 , no freezes so far, this option seems like the only one that make any difference on my system, will see if it freezes at some point in the future.
Offline
Worked whole day with amdgpu.cwsr_enable=0 , no freezes so far, this option seems like the only one that make any difference on my system, will see if it freezes at some point in the future.
I got the same result. Probably we found at least one problematic part: https://github.com/torvalds/linux/blob/ … r_gfx9.asm
Need more volunteers to test this flag. Then we can create a patch to disable CWSR for GFX 8 and 9. @kclisp, @Lone_Wolf would you like to take a part in the party?
Last edited by Mechanicus (2025-01-28 14:48:42)
Offline
got a freeze on 24.3.4-1 with amdgpu.cwsr_enable=0 in the first 5 minutes of usage (nothing in the logs), going back to Lone_Wolf's mesa-test-git 25.0.0_devel.200756.84b660b9229-1 with Marek's MR, it's the only one that hasn't crashed on me yet.
Offline
There is more prominent patch from Alex Deucher: https://gitlab.freedesktop.org/drm/amd/ … te_2755333
@Lone_Wolf we should test it, since it is related to GFX9 only, and not related to MESA. Could you please prepare the kernel build?
Last edited by Mechanicus (2025-01-28 15:45:16)
Offline
Yesterday I've got another freeze testing amdgpu.cwsr_enable=0 parameter. Now I'm testing amdgpu.mes=1. Reminding everyone that I'm on Void Linux.
2025-01-27T05:52:53.04480 kern.err: [ 5201.225840] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting
2025-01-27T05:52:53.04491 kern.err: [ 5201.226311] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing A4A6 (len 84, WS 0, PS 0) @ 0xA4DC
2025-01-27T05:52:53.04493 kern.err: [ 5201.226722] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing CF70 (len 525, WS 0, PS 0) @ 0xCFC1
By the way some time ago I've managed to get cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover working by applying amdgpu.ppfeaturemask=0xffff7bcf (here I turned off PP_POWER_CONTAINMENT_MASK, PP_UVD_HANDSHAKE_MASK, PP_CLOCK_STRETCH_MASK and PP_GFXOFF_MASK)
Offline
Yesterday I've got another freeze testing amdgpu.cwsr_enable=0 parameter. Now I'm testing amdgpu.mes=1. Reminding everyone that I'm on Void Linux.
2025-01-27T05:52:53.04480 kern.err: [ 5201.225840] [drm:atom_op_jump [amdgpu]] *ERROR* atombios stuck in loop for more than 20secs aborting 2025-01-27T05:52:53.04491 kern.err: [ 5201.226311] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing A4A6 (len 84, WS 0, PS 0) @ 0xA4DC 2025-01-27T05:52:53.04493 kern.err: [ 5201.226722] [drm:amdgpu_atom_execute_table_locked [amdgpu]] *ERROR* atombios stuck executing CF70 (len 525, WS 0, PS 0) @ 0xCFC1
By the way some time ago I've managed to get cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover working by applying amdgpu.ppfeaturemask=0xffff7bcf (here I turned off PP_POWER_CONTAINMENT_MASK, PP_UVD_HANDSHAKE_MASK, PP_CLOCK_STRETCH_MASK and PP_GFXOFF_MASK)
Could you compile your own kernel? Here is an updated fix from AMD developer: https://gitlab.freedesktop.org/drm/amd/ … te_2755499
Regarding amdgpu_gpu_recover - the mask you've applied just disabled GPU modules, so it is not OK.
Last edited by Mechanicus (2025-01-28 18:57:53)
Offline
Linux-6.13 (build based on https://aur.archlinux.org/packages/linux-mainline + ArchLinux patches) with updated gfxoff patch (https://gitlab.freedesktop.org/drm/amd/ … te_2755499).
Download link:https://drive.google.com/drive/folders/ … KOx34jmcRx
Note: you need manually select this kernel in boot menu. It is not a replacement for default package.
Last edited by Mechanicus (2025-01-28 19:38:30)
Offline
Till now I've been using 24.3.4 compiled with the patch with no freezes neither other issues.
I'm going to compile it with this patch and see what happens.
https://gitlab.freedesktop.org/mesa/mes … te_2755501
EDIT: I've seen it's not a patch for mesa, but for the kernel driver.
Last edited by pacoandres (2025-01-28 18:46:07)
Offline
Till now I've been using 24.3.4 compiled with the patch with no freezes neither other issues.
I'm going to compile it with this patch and see what happens.
https://gitlab.freedesktop.org/mesa/mes … te_2755501
Link for kernel with this patch is available in previous comment
Offline
pacoandres wrote:Till now I've been using 24.3.4 compiled with the patch with no freezes neither other issues.
I'm going to compile it with this patch and see what happens.
https://gitlab.freedesktop.org/mesa/mes … te_2755501Link for kernel with this patch is available in previous comment
Thanks.
Offline
System a froze using kernel linux-git 6.13.r8997.f34b580514c9-1 with official repo mesa and no additional kernel parameters.
I'll install Mechanicus https://bbs.archlinux.org/viewtopic.php … 8#p2223048 custom kernel for testing now.
EDIT: Change of plans.... Thanks Mechanicus for the huge time saver!
Last edited by NuSkool (2025-01-28 19:12:38)
Offline
System a froze using kernel linux-git 6.13.r8997.f34b580514c9-1 with official repo mesa and no additional kernel parameters.
I'll install Mechanicus https://bbs.archlinux.org/viewtopic.php … 8#p2223048 custom kernel for testing now.
EDIT: Change of plans.... Thanks Mechanicus for the huge time saver!
Thanks to all of you who accepted my point of view on the problem and participated in testing!
Regarding compilation time - you can drastically improve it by applying optimized parameters to makepkg, like I do here: https://github.com/SeryogaBrigada/Simpl … pdate#L127
Last edited by Mechanicus (2025-01-28 19:20:07)
Offline
OK, that didn't take long, froze during verifying to myself I did everything right.
Reboot for another go...
@Mechanicus, I see a discrepancy* so double checking. Did I get/running your correct kernel for testing and verifying you uploaded the correct kernel?
* Between 'pacman -Q linux-mainline' and 'uname -r'.
I'm used to seeing the output from those two commands match.
ie: A different Arch system
$ pacman -Q linux ; uname -r
linux 6.12.9.arch1-1
6.12.9-arch1-1
Pacman log installing test kernel:
[2025-01-28T11:13:56-0800] [PACMAN] Running 'pacman --color=always -U linux-mainline-6.13-2-x86_64.pkg.tar.zst linux-mainline-headers-6.13-2-x86_64.pkg.tar.zst'
[2025-01-28T11:13:59-0800] [ALPM] transaction started
[2025-01-28T11:14:00-0800] [ALPM] installed linux-mainline (6.13-2)
[2025-01-28T11:14:02-0800] [ALPM] installed linux-mainline-headers (6.13-2)
And some verification:
$ pacman -Q linux-mainline ; uname -r
linux-mainline 6.13-2
6.13.0-arch1-2-mainline-gffd294d346d1-dirty
$ ls -1 /boot
efi
grub
GRUB-BU
amd-ucode.img
initramfs-linux-fallback.img
initramfs-linux-git-fallback.img
initramfs-linux-git.img
initramfs-linux.img
initramfs-linux-mainline-fallback.img
initramfs-linux-mainline.img
vmlinuz-linux
vmlinuz-linux-git
vmlinuz-linux-mainline
$ grep 'mainline' /boot/grub/grub.cfg
linux /boot/vmlinuz-linux-mainline root=UUID=60bc1026-da96-43b5-8963-eda5d63b8049 rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
initrd /boot/initramfs-linux-mainline.img
And a reply to:
you can drastically improve it by applying optimized parameters to makepkg
Yea thanks. I have 6 threads jobs of 8 setup to use on this system for clean chroot builds. I ran out of root disk space, so had to restart with clean chroot in my home dir...
Compiling generic kernels with all the drivers is a big time sink on a weak system, and didn't feel like slimming it down to the essentials.
Last edited by NuSkool (2025-01-28 20:23:00)
Offline
@Mechanicus, I see a discrepancy* so double checking. Did I get/running your correct kernel for testing and verifying you uploaded the correct kernel?
* Between 'pacman -Q linux-mainline' and 'uname -r'.
And some verification:$ pacman -Q linux-mainline ; uname -r linux-mainline 6.13-2 6.13.0-arch1-2-mainline-gffd294d346d1-dirty
This is correct. uname -r should return 6.13.0-arch1-2-mainline-gffd294d346d1-dirty
Offline
Everyone who uses mesa-24.3.4 please report any change in the behavior after applying amdgpu.ppfeaturemask=0xfff73fff kernel parameter. This option disables GFXOFF module, so the increase in GPU power consumption is expected.
Offline
Also got a freeze with amdgpu.cwsr_enable=0 as others, took some time to get there, but it happened. Now I'll test amdgpu.ppfeaturemask=0xfff73fff.
Offline
Ran the following setup for testing:
linux-mainline 6.13-2 Mechanicus patched kernel
mesa 1:24.3.4-1 official repo mesa
Locked up twice. First time within minutes with the second taking several hours.
Added the following parameter to this setup for further testing:
amdgpu.ppfeaturemask=0xfff73fff Mechanicus kernel parameter
Last edited by NuSkool (2025-01-29 07:18:22)
Offline
I started experiencing these freezes on my system with gfx11 / RDNA 3 graphics (7700 XT) around the 25th, right around when I updated Mesa to 24.3.4, and when I updated a Docker container with GPU access to Ubuntu 24.10.x with presumably a 24.2.8 Mesa package.
I ended up rotating the 7700 XT out of my installed hardware, since I wasn't experiencing the freezes with a 6700 XT, and not currently experiencing them with an RX 480 I put in the machine I removed the 6700 XT from.
I'll consider testing the ppfeaturemask workaround as well, if that looks like it will fix it until the firmware is fixed.
Offline