You are not logged in.
@Amunn thank you for the update!
Offline
linux-amdgpu-testing-6.13.2.arch1-3 is ready for testing.
EDIT: on the 6.13.12 1-3 test kernel
@Amunn, Just verifying, are you talking the same (w/ typo) or different kernels here?
6.13.2.arch1-3
6.13.12 1-3
Earlier today I read this as the same kernel. Now later I've noticed the diff.
I've got some testing time available but don't want to waste time on a dead kernel...
Scripts I Use : https://github.com/Cody-Learner
$ grep -m1 'model name' /proc/cpuinfo : AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics
$ glxinfo | grep Device : Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, ACO, DRM 3.61, 6.13.9-rc1) (0x15dd)
$ sudo dmesg | awk '/drm/ && /gfx/' : [ 6.427009] [drm] add ip block number 6 <gfx_v9_0>
Offline
I have been testing the @Mechanicus latest patches individually, the conclusions are at https://github.com/pacoandres/laikm/issues/23
Offline
@NuSkool Yes I tested 6.13.2.arch.1-3, was tired and wrote it out wrong.
I tested with the current packages from the repo every time.
Last edited by Amunn (2025-02-11 10:39:38)
Offline
glibc 2.41+r6+gcf88351b685d-1
Build:linux-amdgpu-testing-6.13.2.arch1-4 - freeze confirmed.
Included patches:
- Simplify GFXOFF handling
- Optimize mutex protected blocks in amdgpu_vm_flush
- Make amdgpu_vmid::current_gpu_reset_count atomic_t type
Full pipeline cleanup: all extra mutexes/workarounds removed
Build: linux-amdgpu-testing-6.13.2.arch1-5 - freezes.
Included patches:
- Optimize mutex protected blocks in amdgpu_vm_flush
- Make amdgpu_vmid::current_gpu_reset_count atomic_t type
- Make amdgpu_fence_driver::sync_seq atomic type
- Simplify GFXOFF handling v2
- Remove workaround for TLB seq race
- Remove jpeg1&vcn1 hardware bug workaround
What to check:
- stability
If stable:
- temperature
- average package power (sudo turbostat -s PkgWatt)
- performance - for example WebGL Aquarium fps for different amount of fish
Kernel option to keep during testing period: fsck.mode=force
laikm from pacoandres for reporting issues
Last edited by Mechanicus (2025-02-13 13:48:52)
Offline
Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Installed this today - and nothing else - and have worked the whole day w/o any issues, so day #1 successfully.
Offline
Currently testing 6.13.2.arch1-4, no crashes for around 3 h of use and 5h since boot.
I feel this is too early to conclude since I ran previous tests for up to 10h before crashing.
This seems like a performance upgrade over the mainline kernel, and almost as good as previous tests (might be just a feeling, as I have no concrete proof to back it up).
The fullscreen video thing (Steam store) I mentioned earlier is back, but it's better than it used to be.
Fish test:
fps: 60
canvas width: 1024
canvas height: 1024
idle: 2.5W
10k: 60 fps 11.4W
15k: 50 fps 12.8W
20k: 38 fps 12.8W
30k: 27 fps 12.5W
Highest temp reached = 61C
Idle = 42C (running stock cooler slow to reduce noise)
EDIT: 4 more hours of light use, no crash yet. Will report again tomorrow.
Last edited by Amunn (2025-02-12 00:04:00)
Offline
linux-amdgpu-testing-6.13.2.arch1-4 got a freeze in 5 min while reading documentation with firefox. No other apps opened.
The only log line is
feb 12 12:02:35 monelle kernel: amdgpu 0000:09:00.0: amdgpu: Dumping IP State
Offline
So the linux-amdgpu-testing-6.13.2.arch1-4 kernel crashed after around 10h of light use over three boots.
I looked at the logs, and saw something possibly related 30min before the crash:
Feb 12 15:59:10 user fuzzel[12901]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 15:59:48 user fuzzel[13080]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 15:59:48 user fuzzel[13080]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 15:59:48 user fuzzel[13080]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 15:59:56 user fuzzel[13252]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 15:59:56 user fuzzel[13252]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 15:59:56 user fuzzel[13252]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 16:00:01 user fuzzel[13280]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 16:00:01 user fuzzel[13280]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 16:00:01 user fuzzel[13280]: png: libpng: iCCP: known incorrect sRGB profile
Feb 12 16:00:50 user systemd-coredump[14016]: Process 14010 (spotify) of user 1000 terminated ab>
Feb 12 16:00:50 user systemd[1]: Created slice Slice /system/systemd-coredump.
Feb 12 16:00:50 user systemd[1]: Started Process Core Dump (PID 14016/UID 0).
Feb 12 16:00:50 user systemd-coredump[14025]: Process 14017 (spotify) of user 1000 terminated ab>
Feb 12 16:00:50 user systemd[1]: Started Process Core Dump (PID 14025/UID 0).
Feb 12 16:00:50 user (sd-parse-elf)[14030]: Could not parse number of program headers from core >
Feb 12 16:00:50 user systemd-coredump[14027]: [?] Process 14017 (spotify) of user 1000 dumped co>
Module [dso] without build-id.
Module [dso]
Stack trace of thread 14017:
#0 0x0000765e2a45a624 n/a (n/a + 0x0)
#1 0x0000765e2a400ba0 n/a (n/a + 0x0)
#2 0x0000765e2a3e8582 n/a (n/a + 0x0)
#3 0x0000765e2a3e93bf n/a (n/a + 0x0)
#4 0x0000765e2a4e8419 n/a (n/a + 0x0)
#5 0x0000765e2a4e9714 n/a (n/a + 0x0)
#6 0x00006502f224cd70 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Feb 12 16:00:50 user systemd-coredump[14024]: [?] Process 14010 (spotify) of user 1000 dumped co>
Stack trace of thread 14010:
#0 0x0000765e2a45a624 n/a (n/a + 0x0)
#1 0x0000765e2a400ba0 n/a (n/a + 0x0)
#2 0x0000765e2a3e8582 n/a (n/a + 0x0)
#3 0x0000765e2a3e93bf n/a (n/a + 0x0)
#4 0x0000765e2a4e8419 n/a (n/a + 0x0)
#5 0x0000765e2a4e9714 n/a (n/a + 0x0)
#6 0x00006502f224cd70 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Feb 12 16:00:50 user systemd[1]: systemd-coredump@1-14025-0.service: Deactivated successfully.
Feb 12 16:00:50 user systemd[1]: systemd-coredump@1-14025-0.service: Consumed 312ms CPU time, 13>
Feb 12 16:00:50 user systemd[1]: systemd-coredump@0-14016-0.service: Deactivated successfully.
Feb 12 16:00:50 user systemd[1]: systemd-coredump@0-14016-0.service: Consumed 311ms CPU time, 13>
Feb 12 16:01:50 user systemd-coredump[14050]: Process 14043 (spotify) of user 1000 terminated ab>
Feb 12 16:01:50 user systemd[1]: Started Process Core Dump (PID 14050/UID 0).
Feb 12 16:01:50 user systemd-coredump[14051]: [?] Process 14043 (spotify) of user 1000 dumped co>
Stack trace of thread 14043:
#0 0x00007cf7ab85a624 n/a (n/a + 0x0)
#1 0x00007cf7ab800ba0 n/a (n/a + 0x0)
#2 0x00007cf7ab7e8582 n/a (n/a + 0x0)
#3 0x00007cf7ab7e93bf n/a (n/a + 0x0)
#4 0x00007cf7ab8e8419 n/a (n/a + 0x0)
#5 0x00007cf7ab8e9714 n/a (n/a + 0x0)
#6 0x000059a796c28d70 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Feb 12 16:01:50 user systemd[1]: systemd-coredump@2-14050-0.service: Deactivated successfully.
Feb 12 16:01:50 user systemd[1]: systemd-coredump@2-14050-0.service: Consumed 301ms CPU time, 14>
Feb 12 16:23:05 user kernel: amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
Feb 12 16:23:26 user pipewire[842]: spa.alsa: front:2p: (0 suppressed) snd_pcm_avail after recov>
Feb 12 16:23:26 user pipewire[842]: spa.alsa: front:2p: snd_pcm_mmap_commit error: Broken pipe
Feb 12 16:23:33 user pipewire[842]: spa.alsa: front:2p: snd_pcm_mmap_commit error: Broken pipe
Feb 12 16:23:37 user pipewire[842]: spa.alsa: front:2p: (10 suppressed) snd_pcm_avail after reco>
Feb 12 16:23:42 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
Feb 12 16:23:59 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a7>
Some irrelevant stuff removed for clarity.
Offline
pacmancrashedagain wrote:Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Installed this today - and nothing else - and have worked the whole day w/o any issues, so day #1 successfully.
2nd day w/o any issues! Still nothing else installed except @Lone_Wolf 's "25.0 binary", no special kernel parameters, no nothing
Offline
Report: linux-amdgpu-testing 6.13.2.arch1-4
Ran for ~4HR before freezing.
Start date time : Feb 11, 2025 12:10PM
Finish date time : Feb 11, 2025 ~04:00PM
pacman -Q linux-amdgpu-testing : linux-amdgpu-testing 6.13.2.arch1-4
uname -rs : Linux 6.13.2-arch1-4-amdgpu-testing
pacman -Q glibc : glibc 2.41+r6+gcf88351b685d-1
pacman -Q mesa : mesa 1:24.3.4-1
cat /proc/cmdline : ... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
When the freeze occurred, I had just installed 'fbterm' and trying to get 'qterminal' running in it, in different tty than my desktop session.
Core dump:
systemd-coredump[5040]: Process 5038 (qterminal) of user 1000 terminated abnormally with signal 6/ABRT, processing...
systemd[1]: Created slice Slice /system/systemd-coredump.
systemd[1]: Started Process Core Dump (PID 5040/UID 0).
(sd-parse-elf)[5043]: Could not parse number of program headers from core file: invalid `Elf' handle
systemd-coredump[5041]: [?] Process 5038 (qterminal) of user 1000 dumped core.
Module [dso] without build-id.
Module [dso] without build-id.
Module [dso] without build-id.
Module [dso] without build-id.
Module [dso]
Module [dso] without build-id.
Stack trace of thread 5038:
#0 0x00007d9897aa5624 n/a (n/a + 0x0)
#1 0x00007d9897a4bba0 n/a (n/a + 0x0)
#2 0x00007d9897a33582 n/a (n/a + 0x0)
#3 0x00007d98974905c2 n/a ([dso] + 0x6db5c2)
#4 0x00007d989749154b n/a ([dso] + 0x6dc54b)
#5 0x00007d9897cdef5d n/a (n/a + 0x0)
#6 0x00007d9897d8acd8 n/a (n/a + 0x0)
#7 0x00007d9897558a7d n/a ([dso] + 0x7a3a7d)
#8 0x00007d9897d8ad6e n/a (n/a + 0x0)
#9 0x00007d98986f9ae6 n/a (n/a + 0x0)
#10 0x000059ec425662d3 n/a (n/a + 0x0)
#11 0x00007d9897a35488 n/a (n/a + 0x0)
#12 0x00007d9897a3554c n/a (n/a + 0x0)
#13 0x000059ec42569655 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Last edited by NuSkool (2025-02-12 18:18:45)
Scripts I Use : https://github.com/Cody-Learner
$ grep -m1 'model name' /proc/cpuinfo : AMD Ryzen 5 PRO 2400GE w/ Radeon Vega Graphics
$ glxinfo | grep Device : Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, ACO, DRM 3.61, 6.13.9-rc1) (0x15dd)
$ sudo dmesg | awk '/drm/ && /gfx/' : [ 6.427009] [drm] add ip block number 6 <gfx_v9_0>
Offline
linux-amdgpu-testing-6.13.2.arch1-5 is the last thing I can do for now without power-gating state change workaround. In this build 3 mutex iocks/unlocks removed from the pipeline, GFXOFF switch logic refactored and simplified. There is no delay and no extra operations anymore.
Offline
I'm on 1-5 test now, 1h since boot. Testing more tomorrow.
So far it seems fast somehow, I have instant video playback, haven't experienced this before on my current system.
I tried going through the thumbnails on the steam store with video on, they load and start right when I hover over.
Thanks for the effort btw!
Offline
I got a crash after about 30min today (1h without a crash yesterday).
edit: on the 6.13.2.arch1-5 kernel
2200G cpu
Feb 13 08:43:56 user kernel: amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
Feb 13 08:44:39 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
Feb 13 08:44:43 user pipewire[847]: pw.node: (alsa_output.pci-0000_0a_00.6.analog-stereo-58) graph xrun not-triggered (1 suppressed)
Feb 13 08:44:43 user pipewire[847]: pw.node: (alsa_output.pci-0000_0a_00.6.analog-stereo-58) xrun state:0x7ee7649ab008 pending:1/1 s:3083772418428 a:3083772425592 f:3083772466481 waiting:7164 process:40>
Feb 13 08:44:45 user pipewire[847]: spa.alsa: front:2p: (0 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 13 08:44:53 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Last edited by Amunn (2025-02-13 07:50:23)
Offline
mesa 25.0.0 rc3 has been released with many fixes for radeonsi.
Although none of the fixes appear to be for raven / raven2 / computing there may be people that prefer a binary closer to the next stable release .
I've uploaded a new mesa 25.0.0-rc3 binary .
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Thanks to @Lone_Wolf for the updated package mesa-test-git-25.0.0_rc3.201182.3a8abfa39b7-1-x86_64.
I still ask myself why mesa 24.3.5 which should be released on 05.02.2025 is still a long time coming.
Considering the severity of this bug, a separate version 24.3.4.1 could and should have been released long ago.
In the meantime further distributions are affected...
Offline
glibc 2.41+r6+gcf88351b685d-1
This build contains all optimization patches and powergating workaround. Let's see if it can work any better than the regular kernel.
Build: linux-amdgpu-testing-6.13.2.arch1-6 - stable.
Included patches:
- Optimize mutex protected blocks in amdgpu_vm_flush
- Make amdgpu_vmid::current_gpu_reset_count atomic_t type
- Make amdgpu_fence_driver::sync_seq atomic type
- Simplify GFXOFF handling v2
- Remove workaround for TLB seq race
- Remove jpeg1&vcn1 hardware bug workaround
- Deny GFXOFF in amdgpu_ring_commit using PG state change
What to check:
- stability
If stable:
- temperature
- average package power (sudo turbostat -s PkgWatt)
- performance - for example WebGL Aquarium fps for different amount of fish
Kernel option to keep during testing period: fsck.mode=force
laikm from pacoandres for reporting issues
Last edited by Mechanicus (2025-03-01 13:49:55)
Offline
@Mechanicus Ok, on 1-6 now.
I installed mesa-test-git earlier, am I doing the test correctly if I removed it and installed mesa (24.3.4-1) and vulkan-radeon(same version number)?
Offline
I've been testing linux-amdgpu-testing-6.13.2.arch1-6 for about two hours with no freeze. I couldn't do any benchmark yet, maybe tomorrow with a long and heavy test.
@Amunn: if you downgrade mesa and vulkan-radeon to 24.3.4 the test conditions are the same as before you upgraded to git version.
Offline
OJaksch wrote:pacmancrashedagain wrote:Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Installed this today - and nothing else - and have worked the whole day w/o any issues, so day #1 successfully.
2nd day w/o any issues! Still nothing else installed except @Lone_Wolf 's "25.0 binary", no special kernel parameters, no nothing
3rd day w/o any issues. Installing mesa-test-git-25.0.0_rc3 now and continuing my test series...
Offline
6.13.2-arch1-6-amdgpu-testing
I couldn't freeze it with anything (except cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover).
Overall, the system is working briskly.
Blame the google-translator for all language errors.
Offline
I still have no crashes, and the aquarium test performance is increased by around 5-10%.
Power consumption is up by 10% as well, but at 14w it's still not much. Will keep testing, light use with many electron based programs open. Thanks for the effort in development of the driver!
Offline
I've been working all day with linux-amdgpu-testing-6.13.2.arch1-6 and have no freezes.
Performance, consumption and temperature are sightly worse than with other testing kernels, but as is less than 5% this could be due to other causes.
Thanks for the effort.
Offline
3rd day w/o any issues. Installing mesa-test-git-25.0.0_rc3 now and continuing my test series...
4th day w/o any issues - 1st day with mesa-test-git-25.0.0_rc3, but I''ll tend it calling "fixed it for me".
gfx_v9
AMD Ryzen 5 PRO 3400G with Radeon Vega Graphics
Advanced Micro Devices, Inc. [AMD/ATI] Picasso/Raven 2 [Radeon Vega Series / Radeon Vega Mobile Series] [1002:15d8]
Offline
This build aims to double check the place where the powergating state change workaround should be applied. The crash is also a valid result.
Build: linux-amdgpu-testing-6.13.2.arch1-7 - crashed and confirmed that the workaround is placed correctly in 1-6 version.
Included patches:
- Optimize mutex protected blocks in amdgpu_vm_flush
- Simplify GFXOFF handling
- Add powergating state change workaround for GFX9 v3
- Remove workaround for TLB seq race
Last edited by Mechanicus (2025-02-15 08:17:31)
Offline