You are not logged in.
I am running `mesa-test-git-25.0.0_devel.200908.66775c89fce-1-x86_64.pkg.tar.zst` for the last 5 days, no issues, but a single GPU only crash moments ago, (no HDMI out signal after that)
Was using KDE plasma, 2 windows of VScode, Easyeffects, Dolphin
and was trying to play something on VLC, and saw weird patterns in the Video, then it crashed in a few seconds.
Interestingly unlike before, this was not a full crash, I was able to SSH into the PC and use it fine without any hiccups/slowdowns, I attempted reboot from there and waited for 15 minutes, before finally hard resetting.
- AMD2400G Vega 11 iGPU HDMI
- 1GB VRAM / 15GB RAM
- Kernel: repo 6.13.2-arch1-1
- Mesa: mesa-test-git-25.0.0_devel.200908.66775c89fce-1-x86_64
I tried to reproduce the error again, but didn't happen, still saw the weird pattern in VLC, not in any other players though.
Offline
loqs wrote:NuSkool wrote:EDIT1: On the other hand, if a fix in the repo is days away maybe too late.
drm-amdgpu-gfx9-manually-control-gfxoff-for-cs-on-rv.patch is queued for 6.13.4. does that fix the issue for you NuSkool?
....
Is there/has there been a pre-built (or an AUR pkg, a PKGBUILD) kernel with those patches?
I'd be glad to test it along with repo mesa.
linux-6.13.3.arch1 with drm-amdgpu-gfx9-manually-control-gfxoff-for-cs-on-rv.patch applied:
linux-6.13.3.arch1-1.1-x86_64.pkg.tar.zst/linux-headers-6.13.3.arch1-1.1-x86_64.pkg.tar.zst
Offline
Is there consensus that the following would be a good indicator if the system has 'gfx9'?
$ sudo dmesg | grep '<gfx' [ 6.366573] [drm] add ip block number 6 <gfx_v9_0>
@Lone_Wolf I think it may be worth adding this to #585 or an alternative command along with an explanation for users to ID their system for coverage.
Then maybe consider adding the entire #585 as a quote to the first (top?) post or your post #3? It may help users looking for solutions.
For X users glxinfo | grep Device would be better as this has chipset names, wayland users will have to use eglinfo -B
@pacmancrashedagain : you would need to change the contents of post #1 , are you willing to do that ?
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
@Lone_Wolf, sure, the content of the first post is no longer relevant and might be confusing for new people coming in this thread, so should i edit it with the content of the post #585 and those commands to identify the gfx card/family, right?
I made an initial edit with all the information i think, i need to give it a few touches but i think it's decent now, tell me if you want to change anything.
Last edited by pacmancrashedagain (2025-02-19 12:18:36)
Offline
For X users glxinfo | grep Device would be better as this has chipset names
Good hint! And I'm still at daily stable Arch except mesa-test-git-25.0.0_rc (now mesa-test-git-25.0.0_rc3.201182.3a8abfa39b7-1-x86_64.pkg.tar.zst) since 11th February. Still no issues. Yeah!
Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, ACO, DRM 3.59, 6.13.2-arch1-1) (0x15d8)
Offline
@pacmancrashedagain Looks good, thanks .
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
NuSkool wrote:loqs wrote:drm-amdgpu-gfx9-manually-control-gfxoff-for-cs-on-rv.patch is queued for 6.13.4. does that fix the issue for you NuSkool?
....
Is there/has there been a pre-built (or an AUR pkg, a PKGBUILD) kernel with those patches?
I'd be glad to test it along with repo mesa.linux-6.13.3.arch1 with drm-amdgpu-gfx9-manually-control-gfxoff-for-cs-on-rv.patch applied:
linux-6.13.3.arch1-1.1-x86_64.pkg.tar.zst/linux-headers-6.13.3.arch1-1.1-x86_64.pkg.tar.zst
I've just tried this kernel build. Got two frezzes when resizing glxgears. No log except the well-known
amdgpu 0000:09:00.0: amdgpu: Dumping IP State
Offline
linux-6.13.3.arch1 with drm-amdgpu-gfx9-manually-control-gfxoff-for-cs-on-rv.patch applied:
linux-6.13.3.arch1-1.1-x86_64.pkg.tar.zst/linux-headers-6.13.3.arch1-1.1-x86_64.pkg.tar.zst
@logs Thanks! Just got up and running your kernel + repo mesa a few min ago.
@Lone_Wolf @pacmancrashedagain It's great to see those changes and should help with organization.
@pacoandres Well dagnabbit, that's not looking very encouraging. I'll continue and report when appropriate.
I did make it through the glxgears resize.
EDIT: Vulkan issue was using wrong mesa package when I did the test... removed details.
Last edited by NuSkool (Yesterday 02:18:56)
Scripts I use: https://github.com/Cody-Learner
$ glxinfo | grep Device Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, LLVM 19.1.7, DRM 3.60, 6.13.3-arch1-1.1) (0x15dd)
$ sudo dmesg | awk '/drm/ && /gfx/' [ 6.427009] [drm] add ip block number 6 <gfx_v9_0>
Offline
REPORT: @logs post #602 patched kernel.
pacman -Q linux linux 6.13.3.arch1-1.1
uname -rs Linux 6.13.3-arch1-1.1
pacman -Q mesa mesa 1:24.3.4-1
cat /proc/cmdline .... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
Froze after ~5.5hr during ordinary web browser usage.
My be worth mentioning it did make it through glxgears resizing and WebGL aquarium tests.
Performed well on temp, watts, and performance.
Journal:
kernel: amdgpu 0000:0a:00.0: amdgpu: Dumping IP State
Last edited by NuSkool (2025-02-19 23:15:31)
Scripts I use: https://github.com/Cody-Learner
$ glxinfo | grep Device Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, LLVM 19.1.7, DRM 3.60, 6.13.3-arch1-1.1) (0x15dd)
$ sudo dmesg | awk '/drm/ && /gfx/' [ 6.427009] [drm] add ip block number 6 <gfx_v9_0>
Offline
Confirmed freeze with loqs's patched kernel (linux 6.13.3.arch1-1.1) and repo mesa (mesa 1:24.3.4-1) using my reproducer https://gitlab.freedesktop.org/mesa/mes … te_2733555.
Separately, I'm happy to report that mesa 25.0.0 has been released! It'll hopefully be in our repos soon.
Offline
It looks that mesa developers have confirmed that kernel 6.13 patched with https://git.kernel.org/pub/scm/linux/ke … b7282cf5ba doesn't solve the problem:
https://gitlab.freedesktop.org/drm/amd/-/issues/3975
May be I'm wrong and it's a different problem, but the logs they provide look quite similar to some freeze logs I've stored (when there was any log).
EDIT: I've just updated the system with the repo packages (now it's kernel 6.13.3.arch1-1, firmware 20250210.5bc5868b-1 and mesa still 24.3.4) and can confirm that the system freeze.
Last edited by pacoandres (2025-02-20 10:03:18)
Offline
Probably it will only hit the repos in a couple of days but Mesa 25.0 is officially released:
Offline
It looks that mesa developers have confirmed that kernel 6.13 patched with https://git.kernel.org/pub/scm/linux/ke … b7282cf5ba doesn't solve the problem:
https://gitlab.freedesktop.org/drm/amd/-/issues/3975
6.13.3.arch1 with the two 6.13 backports from [PATCH 1/2] drm/amdgpu/gfx9: manually control gfxoff for CS on RV
linux-6.13.3.arch1-1.2-x86_64.pkg.tar.zst/linux-headers-6.13.3.arch1-1.2-x86_64.pkg.tar.zst
Offline
I just want to chime in to say that I seem to be affected by this issue (applications occasionally locking up the CPU, freezing the whole system and forcing me to hard reboot), and I have a Ryzen 5 3600XT CPU, with a Radeon RX 5600XT graphics card.
This mesa patch, as I've understood it, will fix the issue for Raven APUs, but mine is a Matisse CPU. So I'm left uncertain if it's going to fix it for me. Can anyone offer some insight on this? Thanks.I have very recently rolled my system back to the older mesa/vulkan-radeon/llvm-libs versions as suggested here, but it's too early to tell with confidence if that alleviates it for me.
Interestingly, since the downgrade to mesa 1:24.2.7 and llvm-libs 18.1.8 I have not seen any issues on my end so far, despite my matisse/navi10 based setup. Maybe it's just a coincidence, I don't rule out yet that issues still can occur later; if it does happen I'll just make a new thread about my problem.
Last edited by ParaSait (2025-02-20 18:14:08)
Offline
@ParaSait downgrading mesa doesn't fix the kernel bug. For your hardware you can use linux-amdgpu-stable-6.13.2.arch1-1 (workaround applied for all amdgpu supported chips) or linux-amdgpu-testing-6.13.2.arch1-13 (workaround + optimization patches)
Last edited by Mechanicus (2025-02-20 19:53:47)
Offline
loqs's linux 6.13.3.arch1-1.2 with mesa 1:24.3.4-1 seems stable with regards to my reproducer. Good to see. I will continue testing this in "normal" conditions.
Offline
Ok, sooooo... after the flurry of random problems I've been having with my PC, and that I've been suspecting the kernel & mesa to be the cause of, I decided to give memtest86+ a go, and it turns out that I clearly have a faulty ram. Crikey.
So, yeah, ignore everything I said here about this issue affecting my setup, it most likely has absolutely nothing to do with this. But thanks to everyone who wanted to help me out!
Last edited by ParaSait (Yesterday 02:04:34)
Offline
Ubuntu 24.04 LTS released
kernel 6.11
mesa 24.2
do they have a mesa patch for 24.2?
Last edited by grayich (Yesterday 11:12:44)
Offline
Not needed, the issues started with mesa 24.3 .
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
pacoandres wrote:It looks that mesa developers have confirmed that kernel 6.13 patched with https://git.kernel.org/pub/scm/linux/ke … b7282cf5ba doesn't solve the problem:
https://gitlab.freedesktop.org/drm/amd/-/issues/39756.13.3.arch1 with the two 6.13 backports from [PATCH 1/2] drm/amdgpu/gfx9: manually control gfxoff for CS on RV
linux-6.13.3.arch1-1.2-x86_64.pkg.tar.zst/linux-headers-6.13.3.arch1-1.2-x86_64.pkg.tar.zst
I've been testing this kernel build and seems stable. I've been working with it for about 5 hours.
Performance and consumption are like with repo kernel and patched mesa.
Offline
Preliminary test results: logs patched linux 6.13.3.arch1-1.2 from post #613.
With around 17HR runtime has been reliable, runs and performs well with repo mesa.
I'll continue testing this setup.
pacman -Q linux linux 6.13.3.arch1-1.2
uname -sr Linux 6.13.3-arch1-1.2
pacman -Q mesa mesa 1:24.3.4-1
cat /proc/cmdline... rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
glxgears resize test Passed @ 60fps
vkgears resize test Passed @ 60fps
Power Usage : Idle desktop w/ chromium 121.80F 5.40W
WebGL Aquarium Chromium : 500 fish 60fps 133.65F 9.96W
1000 fish 60fps 135.33F 10.37W
5000 fish 60fps 143.26F 11.82W
10000 fish 60fps 153.07F 13.82W
15000 fish ~54fps 161.44F 14.30W
20000 fish ~43fps 159.34F 13.67W
30000 fish ~30fps 157.87F 12.45W
Last edited by NuSkool (Yesterday 15:38:55)
Scripts I use: https://github.com/Cody-Learner
$ glxinfo | grep Device Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, LLVM 19.1.7, DRM 3.60, 6.13.3-arch1-1.1) (0x15dd)
$ sudo dmesg | awk '/drm/ && /gfx/' [ 6.427009] [drm] add ip block number 6 <gfx_v9_0>
Offline
-
Last edited by Mechanicus (Today 11:36:32)
Offline
Preliminary test results: logs patched linux 6.13.3.arch1-1.2 from post #613.
Perhaps open an issue on Arch's gitlab instance asking for the two back-ports to be included in the next release of linux package? As they are not in the upstream 6.13.4 release candidate.
Offline
Perhaps open an issue on Arch's gitlab instance ....
Sorry, until I figure out a way to get through the final conformation process of registration without a cell phone, I have no access.
https://bbs.archlinux.org/viewtopic.php … 8#p2226958
Last edited by NuSkool (Yesterday 21:38:35)
Scripts I use: https://github.com/Cody-Learner
$ glxinfo | grep Device Device: AMD Radeon Vega 11 Graphics (radeonsi, raven, LLVM 19.1.7, DRM 3.60, 6.13.3-arch1-1.1) (0x15dd)
$ sudo dmesg | awk '/drm/ && /gfx/' [ 6.427009] [drm] add ip block number 6 <gfx_v9_0>
Offline
@loqs @NuSkool
linux 6.13.4 and linux-lts 6.12.16 are released.
The back-ported patch drm-amdgpu-gfx9-manually-control-gfxoff-for-cs-on-rv.patch was dropped two days ago.
A patch drm-amdgpu-bump-version-for-rv-pco-compute-fix.patch seems to be included.
I do not understand the process.
Can anyone explain that? Is there anywhere a hint to that?
Could it be that the problem is returned to mesa and then handled differently than previously discussed here?
I don't know if I'm posting rubbish here...
Offline