You are not logged in.
I've been experiencing this exact issue as soon as I log in. I'm just going to stay on 6.9 until a temporary patch is found, and then I'll just wait for the bug to get fixed on probably XWayland's side. I'd also be willing to help bisect if need be, since I experience this bug more or less on boot.
Offline
I posted the issue in XWayland where it was quickly moved to drm/amd and closed as duplicate of https://gitlab.freedesktop.org/drm/amd/-/issues/3528. Discussion in that issue includes XWayland regression as well as regression in some native linux games, though their relation doesn't seem to be established right now. There's patches for a 6.10 kernel in comments in there.
As a short term solution, using linux-lts (more convenient) or patching your kernel manually or via a pre-made package on the AUR (more control / more recent version) is probably the way to go. Fortunately the issue linked above does seem to be active so hopefully it will be fixed soon enough.
Offline
That sounds like good news, I hope that indeed the issue is the same, and at the very end of the comments there is mention of someone saying that the issue has resolved xwayland performance as well as their game performance issues, so it does sound like that the same issue as what we found if it didn't at first.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
From what I get, this is indeed due to the "Clear Page Tracking" stuff that was introduced in kernel 6.10 as I mentioned earlier. They provide patches for that, that I haven't tested yet, but apparently they do solve the issue we are talking about in this thread.
Some users still seem to have a performance regression in some games, even with these patches (If I got it right), but that may be unrelated and doesn't affect us in particular.
For creating a temporary package for a patched 6.10 kernel, I suppose that the best approach would be to start from the official 'linux' package ( https://gitlab.archlinux.org/archlinux/ … ages/linux ) and modify the PKGBUILD file to apply the proposed patches.
Note: the latest xorg-xwayland-24.1.2-1 update seems to make this issue even worse.
Last edited by OpusOne (2024-08-07 21:15:48)
Offline
linux 6.10.3.arch1-2 with 0001-drm-amdgpu-revert-clear-on-free.patch applied:
linux-6.10.3.arch1-2.1-x86_64.pkg.tar.zst/linux-headers-6.10.3.arch1-2.1-x86_64.pkg.tar.zst
Edit:
linux 6.10.3.arch1-2 with 0001-drm-amdgpu-revert-clear-on-free.patch and 0001-drm-amdgpu-always-allocate-cleared-VRAM-for-GEM-allo.patch applied:
linux-6.10.3.arch1-2.2-x86_64.pkg.tar.zst/linux-headers-6.10.3.arch1-2.2-x86_64.pkg.tar.zst.
Last edited by loqs (2024-08-07 23:47:05)
Online
Thanks for the patched kernels.
I've been running this all day while working and so far have not been able to replicate the issue, so assuming others get the same results its likely this can be closed as solved in upcoming release.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
Amazing. I've been having this issue for two freaking days and just stumbled on this thread by sheer accident. Resizing hexchat while sitting on Arch/OpenSUSE IRC channels - switching from one channel to another would cause it to happen. Then it would never go away till I rebooted. Thanks again for the patches. It was like using AOL in the 1990s waiting for pictures to load. You may have saved my sanity.
AMDGPU - 6800
Last edited by synthexic (2024-08-10 19:38:12)
Offline
6.10.4 still has issue, so don't upgrade to that.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
Well, following the tickets where this issue is being tracked:
https://gitlab.freedesktop.org/drm/amd/-/issues/3528
https://gitlab.freedesktop.org/drm/amd/-/issues/3538
I'm not sure there is any projected date/version for the fix yet. The proposed patches do improve the situation to the point of making it appear "solved", but my guess is that the more long-term fix may require more work.
Also, it seems to affect relatively few people, oddly. So the push to fix it quickly may not be there.
Offline
Reading the bug reports, sounds like the ReBAR issues that hit 6.9 still aren't completely fixed... Can the affected paper over it by enabling reBAR? This could explain why some people (those that enabled reBAR) aren't seeing this, I definitely don't see this and have it active since the time where starting anything graphically intensive without reBAR would lead to a crash of said graphics intensive application.
Offline
I've had it enabled for a good while now if this is what you're referring to.
The last comment suggests that there are two different issues here, the one that we've raised in this thread and something similar related to BAR.
One of the developers said they are looking into that issue that was closed as a duplicate.
sudo dmesg | grep BAR=
[ 6.737430] [drm] Detected VRAM RAM=16368M, BAR=16384M
https://gitlab.freedesktop.org/drm/amd/ … te_2524205
Link to the comment that suggests there are two separate issues, one related to the xwayland / kernel issue we've found here and the other being the topic of the issue that I linked.
They shouldn't have closed the other one as a duplicate, as it doesn't seem to be.
Last edited by Nikolai5 (2024-08-14 10:30:44)
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
For me:
[drm] Detected VRAM RAM=8176M, BAR=8192M
Offline
Starting to look like this never gets fixed lol.
Offline
@synthexic They're still testing the patches it seems on the gitlab issue.
Test the patch yourself and add a comment to it to say whether it helps you or not.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
Let's hope they're still testing the patches. But if I'm not mistaken, these patches merely revert some feature they introduced in amdgpu for kernel 6.10. So maintainers of amdgpu may not want to get rid of this "feature" and may prefer solving the performance issue while keeping it. Which looks like a different story.
From what I had searched, it looks like this "clear page" thing has been a kind of "long-standing" feature that keeps getting canceled and them making a comeback.
The fact that they closed the ticket that was specific to this issue and kept only one ticket which was about some game performance (and which mixes another issue too) is IMO a very bad idea.
Offline
I agree Opus,
My monitor setup allows me to use X11 without issue, so that's the workaround that I'm using at the moment, though using that patch would probably be fine too.
So for the foreseeable I'll be running KDE Plasma on X11. Yeah, thinking about it you're right, a fix is probably a long ways off, I think we need to keep an eye on both of those gitlab items.
The one developer did say they were investigating that closed ticket, they may reopen it, they might decide the root cause is the same and so can be kept in with the video game performance issue.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
We'll see. Meanwhile, I've seen changes in amdgpu for kernel 6.11 are introducing a new set of regressions. Doesn't look like a good time for amdgpu lately.
Offline
This bug doesn't get resolved for me by using X11. It just makes it less obvious. Building a kernel with the patch is the only thing that's worked for me. But imagine being a first time linux user - you just downloaded Fedora or Ubuntu or whatever kids do these days and your GPU takes 4 minutes to draw jpgs of your favorite waifu on your screen. What do you tell that person?
Sorry, it's gonna be another 1-2 months till they test this bug fix that reverts 18 lines of code that should've never been in there to start with because they've never managed to make this feature work for a year?
The fix for this has been pretty obvious for over a month now. You gotta agree this is taking waaaaaaaaaaaaaaay longer than bugs usually take to get addressed. Especially given the size of the offending code and how easy the resolution is.
Offline
Well we don't want the post to turn into a moaning session as it doesn't help and won't be well received by the people here that actually want to help.
But yes I do agree, it's bad that it happened and that it is taking so long and that they closed an issue ticket when it shouldn't have been, and that for a new user coming to Linux it would be a terrible first impression.
We just need to do what we can to help push the bug ticket down the right path to resolution in a polite way.
Desktop: Ryzen 7 1800X | AMD 7800XT | KDE Plasma
MacbookPro-2012 | MATE
Offline
Agreed. This isn't looking really good, but no need to be nasty/fussy about it. That won't help.
I don't know in details how the amdgpu project is maintained. I was under the impression that it was mainly AMD developers with a few external contributors, I don't quite know who has the final say (is the maintainer part of AMD?)
Since amdgpu is a full part of the kernel, does Linus himself have the final say with amdgpu patches, I don't know either.
Offline
It seems the patches should be arriving in 6.11, if this is any indication
https://gitlab.freedesktop.org/agd5f/li … 7ce3fcef5d
https://www.phoronix.com/news/Linux-6.11-rc7-AMDGPU-Fix
Hopefully there's a better solution found than simply disabling the feature, but I have no clue when that might happen.
Offline
https://git.kernel.org/pub/scm/linux/ke … ions.patch is queued for 6.10.10
Online
https://git.kernel.org/pub/scm/linux/ke … ions.patch is queued for 6.10.10
That's good news.
I don't know enough of amdgpu to really understand why adding this flag: AMDGPU_GEM_CREATE_VRAM_CLEARED solves the issue, or if there may be other consequences of doing so?
Offline
loqs wrote:https://git.kernel.org/pub/scm/linux/ke … ions.patch is queued for 6.10.10
That's good news.
I don't know enough of amdgpu to really understand why adding this flag: AMDGPU_GEM_CREATE_VRAM_CLEARED solves the issue, or if there may be other consequences of doing so?
Supposedly the flag adds some latency, but ensures that the VRAM is always cleared after some operation. The optimal solution would be to figure out why not clearing VRAM causes this issue, but that fix will likely come much later. For now, having performance not being so significantly impacted for XWayland is more important than re-adding a small bit of latency back to all AMD gpu's. This is not so much a fix as it is quick patch to make the kernel usable for those of use with this issue.
Offline
Yes I realize it's probably just temporary. I didn't fully grasp the extent of the underlying problems.
My understanding is that it was precisely this page clearing feature that caused the problem, and that setting this flag before allocating any page would sure add latency when allocating it, but otherwise the clearing would occur elsewhere later down the pipe with many more pages at once, which would cause the issue. But possibly I didn't quite get it. My secondary understanding is that clearing VRAM pages at all is done for security reasons, which is why they don't want to get rid of this feature altogether, and they're just trying to handle it in a more efficient way.
But again, my understanding is probably very partial and limited about this.
Offline