You are not logged in.
Report: linux-amdgpu-stable-6.13.1.arch1-3
System : Ryzen 5 PRO 2400GE w/ Radeon Vega 11 Graphics
GPU Type : APU
Family : Raven (RV)
ASIC Name : Raven
Chip Class : GFX9
This kernel has been reliable regardless of what I throw at it for testing.
Start date time : Feb 4, 2025 12:16PM
Finish date time : Feb 7, 2025 04:31PM
Downloaded md5sums : ead331e7c88365501a81420139b3938c linux-amdgpu-stable-6.13.1.arch1-3-x86_64.pkg.tar.zst
: 92ac39d68d004948e5f0e86a2bf14be4 linux-amdgpu-stable-headers-6.13.1.arch1-3-x86_64.pkg.tar.zst
pacman -Q linux-amdgpu-stable: linux-amdgpu-stable 6.13.1.arch1-3
uname -rs : Linux 6.13.1-arch1-3-amdgpu-stable
pacman -Q glibc : lglibc 2.41+r2+g0a7c7a3e283a-1
cat /proc/cmdline : .... lrw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
amdgpu_gpu_recover : Fails from xfce4-terminal in desktop session
: Successfully exits code 0 from console
Power Usage : Logged into console idle: 121.4F 3.8W
Power Usage : Idle desktop w/ chromium: 121.3F 4.5W
WebGL Aquarium Chromium : 500 fish 60fps 128.5F 10.2W
1000 fish 60fps 134.4F 10.5W
5000 fish 60fps 145.0F 12.0W
10000 fish 60fps 162.2F 14.1W
20000 fish ~50fps 168.8F 14.9W
30000 fish ~35fps 169.0F 13.7W
Thermal Throttling?
Shorter tests:
#2 20000 fish ~50fps 165.6F 14.6W
#2 30000 fish ~35fps 165.75F 13.6W
Due to concern in providing reasonably accurate, repeatable results for the WebGL Aquarium temp and power usage, I came up with these.
I run this command after temp has stabilized a bit between changing test parameters. Hit [Ctrl]+[C] to stop writing data to test_results file.
rm -f test_results && watch -t -n1 "(sensors -f|grep -E 'SVI2_P_SoC:|Tctl:'|sed 's/[+°F]/ /g'|xargs|cut -d' ' -f 2,4)|& tee -a test_results"
Put this in script within "PATH" called "results". This averages out the columns of numbers printed to the "test_results" file
#!/bin/bash
cat test_results
temp="$(awk '{ sum += $1; count++ } END { if (count > 0) print sum / count; }' test_results)F"
power="$(awk '{ sum += $2; count++ } END { if (count > 0) print sum / count; }' test_results)W"
printf '\n%s\n\n' "${temp} ${power}"
In review, I run the long one liner above, press [Ctrl]+[C] to stop, then enter "results" to copy paste the last line to my report.
Last edited by NuSkool (2025-02-08 04:49:01)
Scripts I use: https://github.com/Cody-Learner
Offline
EDITED: Change from preliminary to final report.
Preliminary Report: linux-amdgpu-testing-6.13.1.arch1-18
About 3 hr in and still going strong. Nothing noticeable to report except amdgpu_gpu_recover.
Made it through some initial testing and will continue till around 24HR total.
This test kernel tested and ran well on my system for around 19 hours.
System : Ryzen 5 PRO 2400GE w/ Radeon Vega 11 Graphics
GPU Type : APU
Family : Raven (RV)
ASIC Name : Raven
Chip Class : GFX9
Start date time : Feb 7, 2025 05:40PM
Finish date time: TBD Feb 8, 2025 01:09PM
Downloaded md5sums: e469c0f8e1a65b79c6a1a7a53a0511b6 linux-amdgpu-testing-6.13.1.arch1-18-x86_64.pkg.tar.zst
30cb59de0508613c591819a4f1576f92 linux-amdgpu-testing-headers-6.13.1.arch1-18-x86_64.pkg.tar.zst
pacman -Q linux-amdgpu-testing: linux-amdgpu-testing 6.13.1.arch1-18
uname -rs : Linux 6.13.1-arch1-18-amdgpu-testing
pacman -Q glibc : glibc 2.41+r2+g0a7c7a3e283a-1
cat /proc/cmdline : ... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
amdgpu_gpu_recover : From fresh reboot in console, runs successfully, exits 0, but running startx afterwards results in a system freeze.
From desktop session screen went dark, came back momentarily to return dark and freeze.
Power Usage : Logged in console idle
Power Usage : Idle desktop w/ chromium 128.3F 5.5W
WebGL Aquarium Chromium : 500 fish 60fps 131.1F 9.6W
1000 fish 60fps 138.3F 10.2W
5000 fish 60fps 145.2F 11.8W
10000 fish 60fps 161.8F 14.0W
20000 fish ~50fps 174.3F 15.3W
30000 fish ~35fps 170.0F 14.4W
I've switched kernels to test: linux-amdgpu-testing 6.13.2.arch1-1
Last edited by NuSkool (2025-02-08 21:28:25)
Scripts I use: https://github.com/Cody-Learner
Offline
After a few hour I haven't been able to get any freeze on linux-amdgpu-testing-6.13.1.arch1-18.
More detailed results here: https://github.com/pacoandres/laikm/iss … 2644750658
Offline
Build: linux-amdgpu-testing-6.13.2.arch1-1 - freeze confirmed.
Included patches:
- Optimize mutex protected blocks in amdgpu_vm_flush
- Make amdgpu_vmid::current_gpu_reset_count atomic_t type
- Make amdgpu_fence_driver::sync_seq atomic type
- Reduce GFXOFF enable delay from 100ms to 10ms
laikm from pacoandres for reporting issues
Last edited by Mechanicus (Yesterday 12:42:14)
Offline
Preliminary Report: linux-amdgpu-testing 6.13.2.arch1-1
About 1HR into testing and seems good so far.
Ran through WebGL temp/power testing.
System froze up after about 7.5HR run time.
System : Ryzen 5 PRO 2400GE w/ Radeon Vega 11 Graphics
GPU Type : APU
Family : Raven (RV)
ASIC Name : Raven
Chip Class : GFX9
Start date time : Feb 8, 2025 01:30PM
Finish date time: Feb 8, 2025 08:55PM
$ md5sum linux*
ac3ac006b0a468d703c2905fa1541295 linux-amdgpu-testing-6.13.2.arch1-1-x86_64.pkg.tar.zst
1506140d0f8df66e7b9756668569e6fd linux-amdgpu-testing-headers-6.13.2.arch1-1-x86_64.pkg.tar.zst
pacman -Q linux-amdgpu-testing: linux-amdgpu-testing 6.13.2.arch1-1
uname -rs : Linux 6.13.2-1-amdgpu-testing
pacman -Q glibc : glibc 2.41+r2+g0a7c7a3e283a-1
cat /proc/cmdline : ... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
Power Usage : Idle desktop w/ chromium 123.0F 4.8W
WebGL Aquarium Chromium : 500 fish 60fps 133.9F 10.0W
1000 fish 60fps 136.2F 10.4W
5000 fish 60fps 146.2F 12.3W
10000 fish 60fps 160.6F 14.3W
15000 fish ~55fps 161.7F 14.0W
20000 fish ~42fps 165.2F 13.4W
30000 fish ~30fps 163.6F 12.2W
Thermal throttling >10000 fish.
Rerun on cool 124F system:
30000 fish ~38fps 165.6F 14.3W
amdgpu_gpu_recover : Fails from console
: Didn't try from DE session
Last edited by NuSkool (Yesterday 05:02:53)
Scripts I use: https://github.com/Cody-Learner
Offline
@NuSkool results look promising! Especially 30000 fish fps!
Last edited by Mechanicus (2025-02-08 22:59:13)
Offline
I am wondering about the point of making the sequence number an atomic type, unless the design of the kernel isn't behaving as intended. agd5f in my bug report did indicate that the rings should only be accessed one thread per ring at a time. If that's not happening, something is clearly going weird with those sequence numbers.
Offline
I am wondering about the point of making the sequence number an atomic type, unless the design of the kernel isn't behaving as intended. agd5f in my bug report did indicate that the rings should only be accessed one thread per ring at a time. If that's not happening, something is clearly going weird with those sequence numbers.
From what I see is that GFX rings are accessed from two places: 1) from GXF ring itself; 2) from amdgpu_ring_mux, which writes the data from software rings to GFX ring. And probably with this design the amdgpu_ring_mux should be the only manager of the GFX rings to protect any possible data race.
This change is for checking the actual ring fences behavior.
Last edited by Mechanicus (Yesterday 09:23:35)
Offline
glibc 2.41+r6+gcf88351b685d-1
Stable build for regular use compiled with new glibc 2.41+r6+gcf88351b685d-1.
Build: linux-amdgpu-stable-6.13.2.arch1-1, linux-amdgpu-stable-headers-6.13.2.arch1-1
Included patches:
- Reduce GFXOFF enable delay from 100ms to 10ms
- Optimize mutex protected blocks in amdgpu_vm_flush
- Deny GFXOFF in amdgpu_ring_commit using PG state change
Offline
glibc 2.41+r6+gcf88351b685d-1
Build: linux-amdgpu-testing-6.13.2.arch1-2 - freeze confirmed.
Included patches:
- Reduce GFXOFF enable delay from 100ms to 10ms
- Optimize mutex protected blocks in amdgpu_vm_flush
- Make amdgpu_vmid::current_gpu_reset_count atomic_t type
- Make amdgpu_fence_driver::sync_seq atomic type
- Remove workaround for TLB seq race
- Make error code handling explicit
Build: linux-amdgpu-testing-6.13.2.arch1-3 - freezes.
Included patches:
- Reduce GFXOFF enable delay from 100ms to 10ms
- Optimize mutex protected blocks in amdgpu_vm_flush
- Make amdgpu_vmid::current_gpu_reset_count atomic_t type
- Make amdgpu_fence_driver::sync_seq atomic type
- Make error code handling explicit
- Make error code handling explicit. Part 2
What to check:
- stability
If stable:
- temperature
- average package power (sudo turbostat -s PkgWatt)
- performance - for example WebGL Aquarium fps for different amount of fish
Kernel option to keep during testing period: fsck.mode=force
laikm from pacoandres for reporting issues
Last edited by Mechanicus (Today 22:18:53)
Offline
linux-amdgpu-stable-6.13.2.arch1-1: Tested for 3 hours an seems stable. I've tried to freeze intentionally but I haven been able to.
Offline
I'm currently testing linux-amdgpu-testing 6.13.2.arch1-1.
Can confirm crash after about 10h of use with Spotify, Webcord, Bluetooth manager, Firefox and Steam running.
Only regular use, some very light 2d gaming. Seems faster than 6.13.1-7, which made games sluggish.
I'm not a developer, but have AMD 2200G processor, and thought it might be useful to report.
systemd-coredump[11352]: [?] Process 11339 (spotify) of user 1000 dumped core.
Stack trace of thread 11339:
#0 0x00007f8d117ef624 n/a (n/a + 0x0)
#1 0x00007f8d11795ba0 n/a (n/a + 0x0)
#2 0x00007f8d1177d582 n/a (n/a + 0x0)
#3 0x00007f8d1177e3bf n/a (n/a + 0x0)
#4 0x00007f8d1187d419 n/a (n/a + 0x0)
#5 0x00007f8d1187e714 n/a (n/a + 0x0)
#6 0x000055b928e9ad70 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Offline
@pacoandres thank you! Stable build should be stable, it is a backup is the testing kernel crashes.
@Amunn thank you! Please try linux-amdgpu-testing-6.13.2.arch1-2 version. I've just tried it on Ryzen 3 4300G and the graphics performance is now great even with epic graphics settings in Outlast Trials.
Last edited by Mechanicus (Yesterday 20:18:20)
Offline
Updated system to current glibc.
Switched to testing kernel: linux-amdgpu-testing-6.13.2.arch1-2
EDIT: System froze after ~3HR run time.
Start date time : Feb 9, 2025 12:23PM
Finish date time: Feb 9, 2025 03:34PM
uname -rs : Linux 6.13.2-arch1-2-amdgpu-testing
pacman -Q glibc : glibc 2.41+r6+gcf88351b685d-1
Last edited by NuSkool (Yesterday 23:42:05)
Scripts I use: https://github.com/Cody-Learner
Offline
Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Online
Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Thanks for reporting! What about vulkan-mesa-layers and vulkan-radeon? Is it okay to leave them at stable 24.2.7 while using mesa-25.x then?
Offline
pacmancrashedagain wrote:Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Thanks for reporting! What about vulkan-mesa-layers and vulkan-radeon? Is it okay to leave them at stable 24.2.7 while using mesa-25.x then?
I don't know how the mesa package works in detail, but when i install that package from LoneWolf, mesa-test-git 25.0.0_devel.200908.66775c89fce-1, it replaces vulkan-radeon.
So i presume that vulkan-radeon is already in that mesa-test-git package, if it's not, then i guess the best is to use that other package compiled by Lonewolf with Mesa 24.2.8 and then install the vulkan-radeon.
Online
from mesa-test-git PKGBUILD
provides=(mesa vulkan-intel vulkan-radeon vulkan-mesa-layers libva-mesa-driver vulkan-swrast vulkan-virtio mesa-vdpau vulkan-driver opengl-driver)
repo mesa splits functionality into multiple packages, mesa-test-git packages has all of it in 1 package.
To prevent problems there's also an extensive conflicts line that is meant to make it impossible to mix repo mesa subpackages with mesa-test-git.
conflicts=(mesa vulkan-intel vulkan-radeon vulkan-mesa-layers libva-mesa-driver vulkan-swrast mesa-vdpau vulkan-virtio
vulkan-nouveau mesa-libgl opencl-clover-mesa opencl-rusticl-mesa
TL;DR :
both mesa-test-git binaries replace repo mesa + vulkan stuff and more.
Mixing mesa versions is a bad idea.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
OJaksch wrote:pacmancrashedagain wrote:Just thought i would say that after 6 days i had no crashes with the latest Mesa from @Lone_Wolf in this post https://bbs.archlinux.org/viewtopic.php … 0#p2223890, which if i got it right, it will be the regular Mesa 25.0 when it's released, and i'm constantly spawning new windows and i use mpv a lot.
I know it's not the end of it as people are doing a lot of work with these kernel builds but at least i think i have a stable system now with no more crashes so i would like to thank everyone for that.
Thanks for reporting! What about vulkan-mesa-layers and vulkan-radeon? Is it okay to leave them at stable 24.2.7 while using mesa-25.x then?
I don't know how the mesa package works in detail, but when i install that package from LoneWolf, mesa-test-git 25.0.0_devel.200908.66775c89fce-1, it replaces vulkan-radeon.
So i presume that vulkan-radeon is already in that mesa-test-git package, if it's not, then i guess the best is to use that other package compiled by Lonewolf with Mesa 24.2.8 and then install the vulkan-radeon.
Looks like you're right, it is a miraculous package and indeed holds some also updated packages... From its PKGINFO:
...
provides = mesa
provides = vulkan-intel
provides = vulkan-radeon
provides = vulkan-mesa-layers
provides = libva-mesa-driver
provides = vulkan-swrast
provides = vulkan-virtio
provides = mesa-vdpau
provides = vulkan-driver
provides = opengl-driver
...
Will try tomorrow and report on one of the next days.
Offline
Ha, written at the same second
Mixing mesa versions is a bad idea.
Thanks for the clarification!
Offline
mesa-test-git 25.0.0_devel.200908.66775c89fce-1 (#329)
@pacmancrashedagain @OJaksch
I've had it running for 12 days now and a total of 117 hours in normal operation.
But it's about time for the overdue update to 24.3.5. I share @timber22's criticism. The problem persists since 05.12.2024!
https://bbs.archlinux.org/viewtopic.php … 0#p2223190
https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310
Offline
@pacoandres thank you! Stable build should be stable, it is a backup is the testing kernel crashes.
@Amunn thank you! Please try linux-amdgpu-testing-6.13.2.arch1-2 version. I've just tried it on Ryzen 3 4300G and the graphics performance is now great even with epic graphics settings in Outlast Trials.
I got a crash after around 10h of use, the system froze when I hovered over a image in the Steam library.
Process 16850 (spotify) of user 1000 dumped core.
Module [dso] without build-id.
Stack trace of thread 16850:
#0 0x00007049264d5624 n/a ([dso] + 0x302624)
#1 0x000070492647bba0 n/a ([dso] + 0x2a8ba0)
#2 0x0000704926463582 n/a ([dso] + 0x290582)
#3 0x00007049264643bf n/a ([dso] + 0x2913bf)
#4 0x0000704926563419 n/a ([dso] + 0x390419)
#5 0x0000704926564714 n/a ([dso] + 0x391714)
#6 0x00005bca1351cd70 n/a (n/a + 0x0)
ELF object binary architecture: AMD x86-64
Last edited by Amunn (Today 18:33:09)
Offline
linux-amdgpu-testing-6.13.2.arch1-3 is ready for testing.
Last edited by Mechanicus (Today 19:32:58)
Offline
I got a crash after 1min by opening spotify and steam, when I looked at the steam library, it crashed immidiately.
EDIT: on the 6.13.12 1-3 test kernel
Feb 01 16:27:31 user foot[937]: wayland: failed to read events from the Wayland socket: Broken pipe
Feb 01 16:27:31 user foot[937]: wayland: failed to flush wayland socket: Broken pipe
Feb 01 16:27:41 user systemd-coredump[26067]: [?] Process 25980 (firewall-applet) of user 1000 dumped core.
Stack trace of thread 25980:
#0 0x00007b02c98a5334 n/a (libc.so.6 + 0x96334)
#1 0x00007b02c984c120 raise (libc.so.6 + 0x3d120)
#2 0x00007b02c98334c3 abort (libc.so.6 + 0x244c3)
#3 0x00007b02c8098163 _ZNK14QMessageLogger5fatalEPKcz (libQt5Core.so.5 + 0x98163)
#4 0x00007b02c8733457 _ZN22QGuiApplicationPrivate25createPlatformIntegrationEv (libQt5Gui.so.5 + 0x133457)
#5 0x00007b02c8733b11 _ZN22QGuiApplicationPrivate21createEventDispatcherEv (libQt5Gui.so.5 + 0x133b11)
#6 0x00007b02c82b682d _ZN23QCoreApplicationPrivate4initEv (libQt5Core.so.5 + 0x2b682d)
#7 0x00007b02c8733bc7 _ZN22QGuiApplicationPrivate4initEv (libQt5Gui.so.5 + 0x133bc7)
EDIT2:
Rebooted and opened the same programs, but it didn't crash this time.
It seems like it fixed an issue that I've had for a while, when i would look at full screen video in the steam store it doesn't freeze for 2s like it used to.
System is Aorus B450 motherboard with a 2200G processor.
Last edited by Amunn (Today 20:03:55)
Offline
It crashed again after some light webbrowsing (firefox, with spotify, webcord and steam started and open in different sway desktops).
Feb 10 21:10:30 user kernel: amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
Feb 10 21:10:31 user pipewire[852]: spa.alsa: front:1p: (0 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 10 21:10:33 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:34 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:42 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:46 user pipewire[852]: spa.alsa: front:1p: (9 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 10 21:10:46 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:47 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:54 user pipewire[852]: spa.alsa: front:1p: (1 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 10 21:10:54 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:57 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:10:59 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:11:01 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:11:03 user pipewire[852]: spa.alsa: front:1p: (4 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 10 21:11:05 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
Feb 10 21:11:10 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 28b4 wait reg 28c6
Feb 10 21:11:13 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:11:16 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:11:20 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Feb 10 21:11:21 user kernel: [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:73:crtc-0] hw_done or flip_done timed out
Feb 10 21:11:22 user pipewire[852]: pw.node: (alsa_output.pci-0000_0a_00.6.analog-stereo-58) graph xrun not-triggered (1 suppressed)
Feb 10 21:11:22 user pipewire[852]: pw.node: (alsa_output.pci-0000_0a_00.6.analog-stereo-58) xrun state:0x7a9faa3c2008 pending:1/2 s:1224422094726 a:1225697246999 f:1225697286905 waiting:1275152273 proc>
Feb 10 21:11:22 user pipewire[852]: pw.node: (spotify-65) xrun state:0x7a9faa8ed008 pending:0/1 s:1231526568741 a:1231526582467 f:1224513262516 waiting:13726 process:18446744066696231665 status:awake
Feb 10 21:11:23 user pipewire[852]: spa.alsa: front:1p: (13 suppressed) snd_pcm_avail after recover: Broken pipe
Feb 10 21:11:23 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Feb 10 21:11:25 user kernel: amdgpu 0000:0a:00.0: amdgpu: failed to write reg 1a6f4 wait reg 1a706
Feb 10 21:11:27 user pipewire[852]: spa.alsa: front:1p: snd_pcm_mmap_commit error: Broken pipe
Offline