You are not logged in.
For a while I have been experiencing freezes while playing games on steam.
First my system shutters for a while and eventually the system freezes completely forcing me to reboot. It appears this only happens when playing proton games, but I have only tested this on a couple of games that run natively so I can't be 100% sure.
Audio keeps playing but I can't switch to a tty. I can however ssh into the system where I can interact with the system quite smoothly. Also after I reboot I am able to play the games without experiencing any issues. This only works if i use SysRq, and not if i force off the pc and turn it back on again.
I have been experiencing this for months and have updated my system dozens of times. I have also had the issues on linux, linux-lts and linux-zen kernels.
I have an AMD cpu and a Nvidia gpu. I have tried turning off fTPM. I use i3.
While ssh’ed into the pc i was able to get this output from dmesg: txt file on 0x0.st
I don’t know if this has anything to do with the issue but i have tried setting up pci passthrough, and have run gpu-passthrough-manager. I only have one gpu, don’t ask why i did this.
I wasn't completely sure which subforum to use so I just posted here, sorry if it's incorrect.
Last edited by bigbusta (2025-01-07 10:56:00)
Offline

There're bus errors on 06:00 and 00:03 and your nvidia GPU bails.
I only have one gpu, don’t ask why i did this.
Please post your Xorg log, https://wiki.archlinux.org/title/Xorg#General and the output of
lspci -tvnnOffline

[    52.753] (**) NVIDIA(0): Option "RegistryDwords" "PowerMizerEnable=0x1; PerfLevelSrc=0x2222; PowerMizerDefaultAC=0x1"Do you apply other power-related features to the GPU?
rd.driver.pre=vfio-pci amd_iommu=onWhy's that?
[  6691.530] (II) config/udev: Adding input device Microsoft X-Box 360 pad 0 (/dev/input/js1)
[  6691.530] (II) No input driver specified, ignoring this device.
[  6691.530] (II) This device may have been added with another device file.
[  6691.590] (II) config/udev: Adding input device Microsoft X-Box 360 pad 0 (/dev/input/event20)
[  6691.590] (II) No input driver specified, ignoring this device.
[  6691.590] (II) This device may have been added with another device file.
[  6936.768] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
[  6936.768] (EE) NVIDIA(0):     recover...
[  6943.769] (WW) NVIDIA: Wait for channel idle timed out.
[  6943.770] (EE) NVIDIA(GPU-0): Push buffer DMA allocation failed
[  6943.770] (EE) NVIDIA(0): Failed to allocate push buffer
[  6943.770] (EE) NVIDIA(0): Error recovery failed.
[  6943.770] (EE) NVIDIA(0):  *** Aborting ***So about 4 minutes after plugging the XBox Controller, the nvidia GPU drops out - I assume this is because you started some game?
Is the system overclocked or underpowered? Did you forget to plug the dedicated power supply into the GPU?
If you inspect the system journal for the relevant boot, what's ahead of that "Failed to query display engine channel state" storm that likely wiped the ringbuffer?
Offline
Do you apply other power-related features to the GPU?
I don't think so. If i do it is not something i have done on purpose.
rd.driver.pre=vfio-pci amd_iommu=on
Why's that?
Most likely something that was done when i set up pci passtrough, potentially done by gpu-passtrough-manager.
So about 4 minutes after plugging the XBox Controller, the nvidia GPU drops out - I assume this is because you started some game?
Yes, it was a game that was running trough proton.
Is the system overclocked or underpowered? Did you forget to plug the dedicated power supply into the GPU?
The gpu is not overclocked at all and the cpu is only "overclocked" trough the built-in asus optimal. The power supply shouldn't be a problem since i was using it for a while before the problem started occurring. My pc not being good enough for the games i'm playing also shouldn't be a problem since the problem occurs on games that should be able to run on pretty much any pc.
If you inspect the system journal for the relevant boot, what's ahead of that "Failed to query display engine channel state" storm that likely wiped the ringbuffer?
journal ctl output for the relevant boot
I’m not entirely sure when "Failed to query display engine channel state" happened but i can say that the line
Dec 16 20:03:05 mathias-computer sudo[54329]:  mathias : TTY=pts/0 ; PWD=/home/mathias ; USER=root ; COMMAND=/usr/bin/rebootIs when i rebooted trough ssh. Right above it there is this
Dec 16 19:53:35 mathias-computer systemd-coredump[50083]: Process 50081 (wine64-preloade) of user 1000 dumped core.
                                                          
                                                          Module /home/mathias/.local/share/Steam/steamapps/common/Proton - Experimental/files/bin/wine64-preloader without build-id.
                                                          Stack trace of thread 50081:
                                                          #0  0x00006ffca77cf026 _start (/home/mathias/.local/share/Steam/steamapps/common/Proton - Experimental/files/bin/wine64-preloader + 0x1026)
                                                          ELF object binary architecture: AMD x86-64Which would suggest that proton crashed?
Also i forgot to say this in my original post, but there has been one time where my pc recovered from the stutters.
Last edited by bigbusta (2024-12-18 18:01:14)
Offline

Most likely something that was done when i set up pci passtrough, potentially done by gpu-passtrough-manager.
Remove that, notably the explicit AMD IOMMU
There's no indication of any problems aside the proton crash, but you're running wireplumber and pulseaudio.
Just to get that out of the way: replace PA w/ pipewire-pulse!
If that's not it: does this also affect the non-experimental proton branch?
Offline
I removed AMD IOMMU and replaced pulseaudo with pipewire-pulse but niether had any effect.
I also found out that i have had the problem with a game that runs natively in linux(doesn't run trough proton), so it doesn't seem to be a problem with proton at all.
I have a dual-boot with windows, and it didn't seem to be a problem on there.
here is the logs from a crash after the attempted fixes:
lspci
journalctl
xorg
Offline

https://archlinux.org/packages/extra/x8 … wire-jack/
Do you have any more sound daemons around? 
I have a dual-boot with windows
3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.
The journal snippet is ~10 minutes and next to the jackd cragh only has
Dec 20 16:26:08 mathias-computer steam-runtime-steam-remote[5768]: steam-runtime-steam-remote: Steam is not running: No such device or addressas mildly relevant entry?
Meanwhile the X11 log still has
[   906.369] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
[   906.369] (EE) NVIDIA(0):     recover...
[   906.464] (II) NVIDIA(0): Error recovery was successful.
[   921.673] (EE) NVIDIA(0): The NVIDIA X driver has encountered an error; attempting to
[   921.673] (EE) NVIDIA(0):     recover...
[   928.674] (WW) NVIDIA: Wait for channel idle timed out.
[   928.675] (EE) NVIDIA(GPU-0): Push buffer DMA allocation failed
[   928.675] (EE) NVIDIA(0): Failed to allocate push buffer
[   928.675] (EE) NVIDIA(0): Error recovery failed.
[   928.675] (EE) NVIDIA(0):  *** Aborting *** 5-9 minutes after the xbox controller shows up and about one minnute after the journal segmetn ends - do you reboot the system w/ the power button?
Don't. Try to switch the VT, ssh into the system, frenetically press ctrl+alt+del or https://wiki.archlinux.org/title/Keyboa … el_(SysRq) + REISUB
Offline
I disabled fast startup, and rebooted twice(once in windows and once in linux), but it seemingly had no effect.
Dec 20 16:26:08 mathias-computer steam-runtime-steam-remote[5768]: steam-runtime-steam-remote: Steam is not running: No such device or address
It seems that this error prints when starting steam, which is very weird since steam still starts.
do you reboot the system w/ the power button?
I ssh into my pc and run "sudo reboot", sometimes this works on it's own, but sometimes i use sysrq on top of that. Don't know if this is the best way.
Offline

Don't know if this is the best way.
The best way is one that preserves the journal 
I'm asking because the journal ends abruptly and the Xorg log has an error after the end of the posted journal.
Instead of running "sudo reboot", post the journal of that boot
sudo journalctl -b | curl -F 'file=@-' 0x0.st(this upload the journal to 0x0.st and gets you a short url to share)
Offline
So i believe i have found the cause.
TL;DR:
The problem is caused by overloading the gpu by for instance by having picom running in the background while playing high performance games.
Consider using a lightweight window manager such as openbox while playing games, since it "fixed" the issue for me.
I tried switching to the openbox window manager to test whether the window manager i was using, i3, was the issue.
It kind of fixed the problem since it never completely froze but only stuttered. Since openbox is very lightweight and i had nothing running in the background, it made me think the problem is overloading my gpu.
The fact that low perfomance games runs fine with no issue supports this.
I also saw a post on this forum, which concluded that picom was the issue, but since picom is quite high performance i still think overloading the gpu is the issue. Also stopping picom didn't fix it for me.
Obviously i can't a 100% know that this is the issue but i will mark it as solved anyways.
Offline

Is the system overclocked or underpowered? Did you forget to plug the dedicated power supply into the GPU?
The other offender would be temperature - does the GPU overheat when running "heavy" tasks?
Does it make a difference whether you run picom on the glx or xrender backend?
Offline
The other offender would be temperature - does the GPU overheat when running "heavy" tasks?
It doesn't seem to be temperature since, when i observed stuttering the temperature seemed to be the same as on average or even lower. Also my gpu seemed to always be around 80 degrees Celsius.
Does it make a difference whether you run picom on the glx or xrender backend?
Since it doesn't make a difference whether i run picom or not i don't think the backend would make a difference? I could check anyway.
Offline

Since it doesn't make a difference whether i run picom or not
Sorry, missed that.
But that puts a dent into your theory since i3 isn't particularily heavy on the GPU either.
You do get the problem w/ i3 w/o picom but you don't get it w/ openbox even w/ picom running?
Might be more related to the tiling aspect (enforcing window sizes, breaking fullscreen etc et pp) or an i3 specific bug.
Does the awesome wm cause this as well? Maybe depending on whether running in tiling or stacking mode?
Offline
Hello.
I think I do have the same Problem. I am running an Nvidia Gtx 1080Ti with the default proprietary nvidia drivers (version 570), Kernel Version 6.13.2, and Plasma 6 (Wayland).
My System works fine when not doing GPU heavy Tasks, Linux native Games in Steam also work fine (Portal, Black Mesa). 
For games played on proton it is as following: i have played elden ring via steam proton and ac odyssey via lutris. Both games do launch just fine, but after 1-5 minutes playing my screen freezes. the sound keeps playing, but distorts after some time. Switching to another tty works, then i lookup the pid of the game and kill it. funny thing is, that when i use btop in the other tty, it lags really hard and crashes with core dumped. 
The same things also happens on a bazzite install...
I got a small improvement with disabling the GSP Firmware NVreg_EnableGpuFirmware=0, the game now runs a tiny bit longer, and first freezes a few seconds, then goes back to normal, then freezes again. 
Whats also interesting is, because i can hear coil whining from my gpu, when the gpu is under stress i get coil whining, as soon as the game freezes, the coil whining goes away, suggesting the graphics gard just doesnt "process" the game anymore? nvidia-smi in the other tty also says that there is no load on the gpu anymore.
what i have not tried so far is switching drivers or changing proton versions, dxvk versions etc. 
any ideas where to look first, is it the driver, vulkan, proton or wayland/kde?
also i am very unexperienced at lookinng at logs and/or finding the right ones. Any tips or links for this?
Greetings, potato4u
edit: nvm just had a crash on black mesa running native aswell
Last edited by potato4u (2025-02-09 15:14:19)
Offline

https://wiki.archlinux.org/title/Steam/ … _emulation
Have you tried to downclock the GPU?
Offline