You are not logged in.
Pages: 1
I recently purchased a new GPU, and I was having some hard crashes when playing several games and, in hindsight, when I alt-Tab. I run Wayland and Pipewire with KDE as my DE. Following some suggestions on this post to change to the LTS kernel I managed to get the hard crashes down to a soft crash, but they are still happening. I would like to note before I go further, this install is very old, it has been through lots of hardware, both from Intel, AMD, and Nvidia, although I strongly believe it is in a good state as my previous GPU, an RX 580, was working perfectly fine. In the LTS kernel, the only kernel I have tried that hasn't hard-crashed, I have tried linux-zen 6.10.2, linux 6.10.2 and linux 6.9.9. The journalctl logs from hard crashed kernels seem useless, although, here is a log from Linux 6.9.9 https://hastebin.skyra.pw/efozeyovik.yaml and here is the dmesg from the soft crash in the LTS kernel https://hastebin.skyra.pw/laqasijagi.yaml the last two lines seem important to me
[ 307.002397] kwin_w:sh_opt0[1378]: segfault at 78382014dce4 ip 00007838a063956c sp 00007838925f7c60 error 6 in radeonsi_dri.so[7838a0414000+156a000] likely on CPU 1 (core 1, socket 0)
[ 307.002415] Code: 76 91 e9 ea fe ff ff 0f 1f 80 00 00 00 00 55 31 c0 ba 01 00 00 00 48 89 e5 41 56 41 55 41 54 49 89 fc 53 48 8d 9f d4 9c 03 00 <f0> 0f b1 13 85 c0 0f 85 b8 00 00 00 4d 8b ac 24 d8 9c 03 00 4d 85
I first worried that the issue might've been a hardware issue, but I don't believe that is likely because the kernel is panicking as I can't toggle caps lock and the above segfault from KDE. Further more, here is dmesg | grep amdgpu
[ 4.085024] [drm] amdgpu kernel modesetting enabled.
[ 4.085128] amdgpu: Virtual CRAT table created for CPU
[ 4.085142] amdgpu: Topology: Add CPU node
[ 4.085268] amdgpu 0000:03:00.0: enabling device (0006 -> 0007)
[ 4.089302] amdgpu 0000:03:00.0: amdgpu: Fetched VBIOS from VFCT
[ 4.089304] amdgpu: ATOM BIOS: 113-3HS23KXT143W210508
[ 4.144172] amdgpu 0000:03:00.0: vgaarb: deactivate vga console
[ 4.144178] amdgpu 0000:03:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[ 4.144276] amdgpu 0000:03:00.0: amdgpu: VRAM: 8176M 0x0000008000000000 - 0x00000081FEFFFFFF (8176M used)
[ 4.144279] amdgpu 0000:03:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[ 4.144282] amdgpu 0000:03:00.0: amdgpu: AGP: 267894784M 0x0000008400000000 - 0x0000FFFFFFFFFFFF
[ 4.144466] [drm] amdgpu: 8176M of VRAM memory ready
[ 4.144467] [drm] amdgpu: 7929M of GTT memory ready.
[ 5.708391] amdgpu 0000:03:00.0: amdgpu: STB initialized to 2048 entries
[ 5.709180] amdgpu 0000:03:00.0: amdgpu: Will use PSP to load VCN firmware
[ 5.912280] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 5.934033] amdgpu 0000:03:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 5.934054] amdgpu 0000:03:00.0: amdgpu: smu driver if version = 0x0000000f, smu fw if version = 0x00000013, smu fw program = 0, version = 0x003b3100 (59.49.0)
[ 5.934058] amdgpu 0000:03:00.0: amdgpu: SMU driver if version not matched
[ 5.934087] amdgpu 0000:03:00.0: amdgpu: use vbios provided pptable
[ 5.983165] amdgpu 0000:03:00.0: amdgpu: SMU is initialized successfully!
[ 6.266164] amdgpu: HMM registered 8176MB device memory
[ 6.267386] kfd kfd: amdgpu: Allocated 3969056 bytes on gart
[ 6.267401] kfd kfd: amdgpu: Total number of KFD nodes to be created: 1
[ 6.267556] amdgpu: Virtual CRAT table created for GPU
[ 6.268131] amdgpu: Topology: Add dGPU node [0x73ef:0x1002]
[ 6.268132] kfd kfd: amdgpu: added device 1002:73ef
[ 6.268151] amdgpu 0000:03:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 8, active_cu_number 32
[ 6.268541] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 6.268543] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 6.268545] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 6.268547] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 6.268548] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 6.268550] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 6.268551] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 6.268553] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 6.268554] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 6.268556] amdgpu 0000:03:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
[ 6.268558] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 6.268559] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 6.268561] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
[ 6.268562] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
[ 6.268564] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
[ 6.268566] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 8
[ 6.269553] amdgpu 0000:03:00.0: amdgpu: Using BACO for runtime pm
[ 6.270040] [drm] Initialized amdgpu 3.54.0 20150101 for 0000:03:00.0 on minor 1
[ 6.279346] fbcon: amdgpudrmfb (fb0) is primary device
[ 6.279352] amdgpu 0000:03:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ 8.545527] snd_hda_intel 0000:03:00.1: bound 0000:03:00.0 (ops amdgpu_dm_audio_component_bind_ops [amdgpu])
EDIT: Terraria might be the only game hard crashing my system after the re-install, and following this steam post I managed to stop terraria from crashing. I haven't seen any soft crashes of KDE after I did some stuff with my drivers, although I'm not confident they are gone.
EDIT 2: Upon further searching and lots of testing and crashes later, I think it may have be my NVMe SSD, adding
nvme_core.default_ps_max_latency_us=0
to my kernel arguments and I have not suffered any crashes. Furthermore, it doesn't appear to have anything to do with my GPU, or my Ryzen CPU, but likely just unrelated and possibly coincidental. I want to thank Laceflower on the Archlinux Discord for helping me out massively with this, I wouldn't have been able to find this out without her help.
Last edited by TokyoStarz (2024-08-09 05:53:55)
Offline
The Gentoo wiki has a great article that I think has solved my issue https://wiki.gentoo.org/wiki/AMDGPU#Fre … ic_Crashes
Offline
I did a full system reinstall and it seems to be doing good, no crashes so far, and surprisingly much better GPU performance
Offline
just kidding. I think the issue has to be down to me running a game and with either having discord open or being in a discord call
Offline
it might just be Terraria I don't have many leads as logging seems to be useless in pretty much everything. Although, it only occurs on Wayland, if I run the game on Wayland through Gamescope, despite the worst performance I have ever seen terraria run at, it works, and if I use Xorg, it also works. I am unsure if it's terraria or a driver or wayland or KDE because as far as I can tell the logs are not very helpful
Last edited by TokyoStarz (2024-08-05 04:38:17)
Offline
Pages: 1