You are not logged in.
Hey, so recently I've been having pretty bad issues with my system:
In pretty much every game I have tested (DotA2, Horizon Forbidden West and a few others) my system will either completely freeze after a minute or two, or sometimes the game will run fine for 10, sometimes 20 minutes, and then the system completely locks up. No access to other ttys or anything anymore.
I also get absolutely 0 logs from said freezes, which confuses me even more.
I've tried a bunch of things already, different kernels, older kernels, older Linux-Firmware, and I'm a bit clueless as to what else it could be.
System:
7800X3D
Radeon 6950XT
32GB RAM
Arch (Tested kernels: 6.8.9/6.8.9-zen, 6.9.2&6.9.3 both zen and non-zen)
Linux-Firmware I tested up to 20240115.9b6d0b08-2 in the past and saw 0 improvement.
Running Hyprland (also happens with KDE though, which I use to run Games in HDR)
On Horizon Forbidden West I also get the known issues with ring timeout on my GPU, and thought that this is what ultimately leads to the system freeze. On DotA I get no such thing, and it just freezes up.
I'd be super grateful for any ideas that get me to the root of this problem, as currently I pretty much have a paperweight gaming PC that can't play any games.
Offline
I also get absolutely 0 logs from said freezes, which confuses me even more.
You're rebooting with the power button?
=> https://wiki.archlinux.org/title/Keyboa … el_(SysRq)
Then see https://wiki.archlinux.org/title/Ryzen#Troubleshooting but from the symptom description, also monitor your temperatures.
Possibly run memtest86+ (for a day or until you hit errors)
You do have https://archlinux.org/packages/?name=amd-ucode (also see https://wiki.archlinux.org/title/Microcode )?
And see https://wiki.archlinux.org/title/Stress_testing to maybe isolate the causing component.
Offline
I have to use the reset button, as even SysRq is completely unresponsive in that state.
Temps are fine, even done 24 hours of Furmark and 24 hours of Memtest earlier this week and nothing was weird/not working.
amd-ucode is also installed.
I'm always thinking it's my GPU, I just find it weird that furmark runs without issues but pretty much any game I throw at it brings my system to a halt.
Offline
If it's not the GPU and not the RAM, the natural contender is the CPU, see the ryzen link I posted, there're bunch of known caveats w/ the architecture.
Offline
So I tried those Ryzen things (which don't really seem to apply to Ryzen 7 though) and then let mprime run all night. Nothing happened - everything passed. Tried DotA again, crashed after 2 minutes. So it only seems to happen in games..
Offline
Overall power draw exceeds PSU limits?
Are you running OOM and the system starts thrashing? Do you have physical swap (partition or file, not only zswap or zram and certainly not both)?
Does it happen in an openbox session?
Is it only steam/wine games?
Are sauerbraten, xonotic or warsow affected?
Do you have any heavy-duty, non-steam, no-network required offline games you could test?
Offline
PSU is a 1000W one, actually a bit overkill for what I got.
Not running OOM.
I don't really have much else to test except Steam, I'll see if I can download something over night. I thought it was Proton at first, but DotA is Linux Native. Also just crashed in Last Epoch after ~2 Minues. Here's my PC stats when it happened: https://i.imgur.com/hUkOxwi.png
All looks pretty normal, RAM isn't even half full, temps are low.
I also tried running Sauerbraten, played myself a few round and watched 32 Bots hash it out in several matches and it ran fine, which is confusing me even more now - but that game also barely had any impact on my hardware.
Last edited by yasuman (2024-06-03 15:58:48)
Offline
Please replace the oversized image w/ a link.
If we assume a GPU issue, you could try
1. https://wiki.archlinux.org/title/AMDGPU#Freezes_with_%22[drm]_IP_block:gmc_v8_0_is_hung!%22_kernel_error
2. Limit the GPU, https://wiki.archlinux.org/title/AMDGPU#Overclocking / https://wiki.archlinux.org/title/AMDGPU#Power_profiles so the game cannot move it to its limits
Last edited by seth (2024-06-03 20:32:56)
Offline
So before any changes I made I tested another game: Shadow of the Tomb Raider. Realized I have it on Epic and Steam, so I installed both and played an hour each without it crashing - every now and then I just got weird green lines across my screen for a split second. Seems like my GPU is artifacting.
Then I started DotA2 again and it crashed on the loading screen..
Now I set my GPU to power-saving, and the game is running so far, just with absolutely horrible performance. At this point I think my GPU might simply be broken..
Offline
Did you try https://wiki.archlinux.org/title/AMDGPU#Freezes_with_%22[drm]_IP_block:gmc_v8_0_is_hung!%22_kernel_error ?
(sorry, the url doesn't work as link)?
And stupid question: did you forget to attach the dedicated power supply for the GPU (or is it/did it come loose)?
Offline
Tried that as well, thought it worked because DotA didn't crash for an hour - but now it's back to regular crashing.
Didn't forget to plug it in, and it's been working fine for a year now. This only started happening around 2-3 weeks ago, I think.
Maybe I'll slap Windows on an empty SSD, if it happens there too then my GPU is definitely dying.
Offline
D.U.S.T. problem? Everything clean? Have you tried to just blow more air against the system?
If you can't RMA the GPU, limiting its max freqs "somewhat" but not all the way down to powersaving levels might allow you to continue to use it (for a while)
Offline
Okay, the problem now definitely seems to be somehow with Linux.
Slapped Windows on an old SSD, and after 5 hours of DotA I haven't seen so much as an artifact appear. The question just is, what's causing it.. I haven't changed anything on my system recently.
Tried a few other games, and it all just ran without a hitch.
Maybe I'll just reinstall my Arch or something.
Offline
Have you tried the LTS kernel?
Do you use the GPU for audio?
Let's throw some stuff against the wall:
amdgpu.audio=0 amdgpu.sg_display=0 amdgpu.ppfeaturemask=0xffffbffd amdgpu.runpm=0 amdgpu.bapm=0 amdgpu.aspm=0 pcie_aspm=off
Offline
Tried LTS (6.6) and it still happens.
Yep, using GPU for audio.
Added all those kernel-parameters and it still crashed..
I tried downgrading the radeon drivers and mesa yesterday to a version of back when I'm sure this didn't happen (like a month ago) - but after doing that I couldn't get past SDDM so I must've done something wrong there. I feel like it's a regression somewhere, I just don't know where.
Offline
I just don't know where.
Journal?
Added all those kernel-parameters and it still crashed … Yep, using GPU for audio.
Ftr, after adding those parameters the GPU audio was no longer available?
Offline
Yeah, no longer had audio after it. I removed all of the parameters again now since it just kept crashing anyway.
I have a feeling it might actually be related to Hyprland somehow, but I have to do some testing for that first.
Offline
Hey there,
I came across this thread while trying to debug a similar issue (game freezing, no accessible tty) on my system, which is very close to your own: 7800x3d (recently ugpraded from 5800x - was not having this issue) and 7800xt.
I tried a lot of the same things but eventually for me narrowed it down to using "gamemode" - I think based on one of your screens I saw you're also using this?
There is also some evidence to suggest that you may need to enable Resizeable BAR: https://bbs.archlinux.org/viewtopic.php?id=295456 - and it might also be why gamemode is crashing (have to investigate now that I've enabled it on my system).
Not sure if helpful but curious to hear if you ended up finding out what it was!
Offline
Hey.
Since I didn't really update this post: The issue was 100% with Hyprland it turned out. I have not figured out why or how though. But I've since fully switched to KDE and haven't had a single crash in any game I have been playing since then.
Once Hyprland has a few more major versions I'll try it out again, but currently it seems it somehow breaks my ability to game - and debugging also wasn't really possible at it was hard freezing everything.
Things like ReBAR etc are all own. Gamemode I just tried to see if it fixes it on Hyprland which it didn't. So now I'm just stress free using KDE for gaming. (Still on Hyprland for work, because it's much faster to work with)
Offline
Please change the thread title (edit first post) to reflect your findings.
Prepending [Solved ] may not be appropriate, maybe [Workedaround] ?
(there's a character limit so you may have to shorten the title first before adding stuff)
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Online
I don't mean to necro a resolved thread two weeks later but I have a nearly identical build and have been running into the same issues. I've tried everything in this thread including gamemode removal, kernel parameters, etc. This all started for me 2-3 weeks ago and I've been too busy with work to revisit the problem. I've updated to the latest kernel, 6.10.7. Temps, voltages, case health (connections, dust, thermal paste) are all good. I can game for 1-5 minutes and my entire system will freeze and I need to use the power button to hard reset.
I swapped in my Windows 11 drive and removed my Linux one and could game infinitely, so it's absolutely an Arch thing. The kicker compared to the topic creator though is I've always been on KDE. I don't even have hyprland installed. I've been tearing my hair out trying to fix this problem, something obviously went wrong along the way but like you rolling back everything -- LTS Kernel, Proton GE, MESA, you name it -- doesn't do anything. My computer fully locks up when I game now, end of story, on Arch.
Offline
https://bbs.archlinux.org/viewtopic.php?id=298978 # you're on slightly different HW and EOS
https://bbs.archlinux.org/viewtopic.php … 8#p2175238 # ryzen troubleshooting
https://bbs.archlinux.org/viewtopic.php … 2#p2189602 # ReBAR
Offline