You are not logged in.
Pages: 1
Hi, I'm new on the Arch forums so i don't know if it fits better on here or hardware/kernel, but honestly it fits both...
I've been having issues with my system for a while, and i think the issues have been unrelated but results in the same. When I have been playing World of Warcraft, my entire system freezes without even being under a particularly high load (low power ish CPU paired with a big GPU in a cpu intensive game and so on) and i don't think its about power draw, as i've had the GPU and CPU going at full utilization in other games.
The first occurance was solved by later releases of RADV, as i had deduced it came down to the vulkan loader, and that AMDVLK didn't have the same issues, I recently also found this section:
https://wiki.archlinux.org/title/Vulkan … lkan_games
Which seems to be the same issue as i had originally. The issue disappeared like a year ago for me, with an update to RADV.
Note: when i had this issue, it was only a GPU hang, making my system still accessible from the network, and i could restart the display manager from another computer through SSH.
Now i've had another issue that results in the entire system freezing, and i think it's unrelated, but the context is nice imo. I was playing Warframe and suddenly at random times i would freeze, a fix i found was that instead of freezing my system, it would just stutter heavily if i ran the game in a nested gamescope session.
With the release of World of Warcraft: The War Within this week, I've been playing WoW again, and discovered freezes again in that game as well, i tried applying the same fix using gamescope with no luck, i tried switching to AMDVLK with no luck, I tried switching kernel, I've enabled every compatibility option ingame, I've tried switching to DXVK instead of VKD3D, issue keep coming up, and its frustrating as i'm supposed to tank for my guild in the upcoming raid :/
What the two recent and still in effect issues have in common which makes me think its the same issue is that they both do not show any logs leading up to the freeze in journalctl and that now the system is NOT accessible from the network - its completely dead. This also makes me think it might be a kernel problem or even a hardware problem - at least a very low level problem. The freeze happens anywhere from 20 minutes after i launch the games, to ~3 hours.
What i did find odd in the logs is that this appears repeatedly hundreds of times from the moment i launch World of Warcraft:
Aug 28 23:16:39 desktop env[8116]: 00e0:err:seh:dispatch_exception unknown exception (code=c0000420) raised
My system:
OS: Arch Linux (obviously)
CPU: AMD Ryzen 3600x
GPU: AMD Radeon RX 6950 XT
PSU: be quiet! 1000W
Memory: 16 GB DDR4
Motherboard: X570-A PRO
Offline
What the two recent and still in effect issues have in common which makes me think its the same issue is that they both do not show any logs leading up to the freeze in journalctl and that now the system is NOT accessible from the network - its completely dead.
Not if you're rebooting w/ the power button, see whether https://wiki.archlinux.org/title/Keyboa … el_(SysRq) works (nb. it has to be explicitly enabled before the freeze!)
Ryzen has a tendency to be undervolted and doesn't like RCU callbacks, https://wiki.archlinux.org/title/Ryzen#Troubleshooting
freezes without even being under a particularly high load
Would hint at the c-states, the CPU can hiccup when waking up, try to limit them to 1 (keep 'em busy) and if that stabilizes the system, exclude 6 and see whether you get away with that.
Offline
Would hint at the c-states, the CPU can hiccup when waking up, try to limit them to 1 (keep 'em busy) and if that stabilizes the system, exclude 6 and see whether you get away with that.
Sorry i didn't see the reply until late last night, i'm running the game with it set to 1 and what i find notable is that instead of the mystery exception i'm getting lines at the same rate directly from vkd3d? that seems unrelated though?
Aug 30 16:13:14 desktop env[4180]: 1108.001:0984:0a3c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Pipeline cache marked dirty. Flush is scheduled.
Aug 30 16:13:15 desktop env[4180]: 1109.001:0984:0a3c:info:vkd3d-proton:vkd3d_pipeline_library_disk_thread_main: Flushing disk cache (wakeup counter since last flush = 2). It seems like application has stopped creating new PSOs for the time being.
i'll update this post if the system freezes again.
Last edited by mast3r_waf1z (2024-08-30 14:48:10)
Offline
Is that all, no surrounding drm/gpu warnings/errors?
Does https://archlinux.org/packages/extra/x86_64/vkmark/ or just vkcube cause anything similar (could just be a proton thing, you could try https://wiki.archlinux.org/title/Steam/ … _emulation
Offline
Is that all, no surrounding drm/gpu warnings/errors?
nope, but my system just froze in the middle of a dungeon after ~40 minutes, i started the game up again after restarting and the original message is back in journalctl, odd...
Does https://archlinux.org/packages/extra/x86_64/vkmark/ or just vkcube cause anything similar (could just be a proton thing, you could try https://wiki.archlinux.org/title/Steam/ … _emulation
i'm running wine-ge through lutris, so i'm not using proton directly, i can try changing to proton and try that method though?
Speaking of, during my freeze just now i forgot to check sysrq, i'll do that if it happens again
Offline
The environment example is for the steam launch options, https://wiki.archlinux.org/title/Steam#Launch_options
But you can probalby globally export the variable (or at least to lutris)
Offline
The environment example is for the steam launch options, https://wiki.archlinux.org/title/Steam#Launch_options
But you can probalby globally export the variable (or at least to lutris)
yeah i just tried, wow doesn't like it
Edit: i don't have more time to work on this now actually, i'll update the post again tomorrow
Last edited by mast3r_waf1z (2024-08-30 15:09:34)
Offline
ok, progress, i tried to use SYSRQ to kill the system while frozen and it did not work.
I observed that the help command from SYSRQ is displayed in the journalctl logs, so i pressed it during the freeze and checked on my subsequent reboot. I found that there is no mention of SYSRQ in journalctl.
another find i've had is that the filesystem is oddly slow all of a sudden from today on, giving me the suspicion that it might be the NVME drive i'm using as my root partition? i mean, if the root partition dies it would make sense that kernel messages wouldn't be visible in the logs.
But this doesn't explain why SYSRQ doesn't work... odd. Its just a quick test before i was going to continue testing tomorrow, but honestly i think i might take a spare NVME and `dd` the current NVME onto that one to see if it makes any difference?
Please come with input before then, i'll check this thread again tomorrow
Offline
IO issues/errors would show up in the journal before any crash - also check https://wiki.archlinux.org/title/SMART and run a simple https://wiki.archlinux.org/title/Benchmarking#dd
(Be VERY careful with the parameters you pass to dd, it'll unconditionally nuke your data if you screw up)
Offline
So, the issue seems to be solved, as i think the NVME might have been dying. I don't know why there's nothing in the journal, but copying over my disk to a spare drive stopped the issue in both games
for reference, both NVME drives were once part of a RAID 0 in a computer i bought for spare parts a few years ago
Last edited by mast3r_waf1z (2024-09-01 00:37:20)
Offline
Pages: 1