You are not logged in.

#1 2024-12-27 13:36:11

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

[SOLVED] AMDGPU reset when playing games

It will even happen when you are playing very small games like Madness Nexus Project 2
tested on CS2 too
it happens when the game show up for a few moment[like 30 seconds]
and when it happens,all the graphic get freeze,after a while,screen get glitched,then pop me back to the login menu and reset the gpu

dmesg:https://0x0.st/8sKU.txt
neofetch

(base)   dynamo   ~ git-[ main]-  neofetch
                   -`                    dynamo@archyyds
                  .o+`                   ---------------
                 `ooo/                   OS: Arch Linux x86_64
                `+oooo:                  Kernel: 6.11.3-273-tkg-bore
               `+oooooo:                 Uptime: 14 mins
               -+oooooo+:                Packages: 2857 (pacman), 9 (flatpak)
             `/:-:++oooo+:               Shell: zsh 5.9
            `/++++/+++++++:              Resolution: 1920x1080
           `/++++++++++++++:             DE: Hyprland
          `/+++ooooooooooooo/`           Theme: Adwaita [GTK3]
         ./ooosssso++osssssso+`          Icons: breeze [GTK2/3]
        .oossssso-````/ossssss+`         Terminal: alacritty
       -osssssso.      :ssssssso.        CPU: Intel i5-6600K (4) @ 3.900GHz
      :osssssss/        osssso+++.       GPU: AMD ATI Radeon RX Vega 56/64
     /ossssssss/        +ssssooo/-       Memory: 4637MiB / 15930MiB
   `/ossssso+/:-        -:/+osssso+-
  `+sso+:-`                 `.-/+oso:
 `++:.                           `-/+/
 .`                                 `/

Using Vega56[not overclocked]

rocm-smi:

======================================== ROCm System Management Interface ========================================
================================================== Concise Info ==================================================
Device  Node  IDs              Temp    Power     Partitions          SCLK  MCLK  Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Edge)  (Socket)  (Mem, Compute, ID)
==================================================================================================================
0       1     0x687f,   51005  42.0°C  4.0W      N/A, N/A, 0         None  None  0%   auto  260.0W  7%     0%
==================================================================================================================
============================================== End of ROCm SMI Log ===============================================

I notice a strange thing that in lact,the overclock is strangely on[didn't turn it on manually]
when i try to disable overclocking support in lact
it says

2024-12-27T13:28:21.097121Z  WARN lact_gui::app: Got error from daemon, end of client boundary

Caused by:
Overclocking was not enabled through LACT (file at /etc/modprobe.d/99-amdgpu-overdrive.conf does not exist)

lspci -nnk

(base)   dynamo   ~ git-[ main]-  lspci -nnk
00:00.0 Host bridge [0600]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers [8086:191f] (rev 07)
	Subsystem: ASRock Incorporation Device [1849:191f]
	Kernel driver in use: skl_uncore
	Kernel modules: ie31200_edac
00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 07)
	Subsystem: ASRock Incorporation Device [1849:1901]
	Kernel driver in use: pcieport
00:14.0 USB controller [0c03]: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller [8086:a12f] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a12f]
	Kernel driver in use: xhci_hcd
	Kernel modules: xhci_pci
00:14.2 Signal processing controller [1180]: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem [8086:a131] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a131]
	Kernel driver in use: intel_pch_thermal
	Kernel modules: intel_pch_thermal
00:16.0 Communication controller [0780]: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 [8086:a13a] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a13a]
	Kernel driver in use: mei_me
	Kernel modules: mei_me
00:17.0 SATA controller [0106]: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] [8086:a102] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a102]
	Kernel driver in use: ahci
00:1c.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 [8086:a114] (rev f1)
	Subsystem: ASRock Incorporation Device [1849:a114]
	Kernel driver in use: pcieport
00:1d.0 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 [8086:a118] (rev f1)
	Subsystem: ASRock Incorporation Device [1849:a118]
	Kernel driver in use: pcieport
00:1d.3 PCI bridge [0604]: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #12 [8086:a11b] (rev f1)
	Subsystem: ASRock Incorporation Device [1849:a11b]
	Kernel driver in use: pcieport
00:1f.0 ISA bridge [0601]: Intel Corporation B150 Chipset LPC/eSPI Controller [8086:a148] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a148]
00:1f.2 Memory controller [0580]: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller [8086:a121] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a121]
00:1f.3 Audio device [0403]: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller [8086:a170] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:1151]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel, snd_soc_avs
00:1f.4 SMBus [0c05]: Intel Corporation 100 Series/C230 Series Chipset Family SMBus [8086:a123] (rev 31)
	Subsystem: ASRock Incorporation Device [1849:a123]
	Kernel driver in use: i801_smbus
	Kernel modules: i2c_i801
01:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge [1022:1470] (rev c3)
	Kernel driver in use: pcieport
02:00.0 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge [1022:1471]
	Subsystem: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge [1022:1471]
	Kernel driver in use: pcieport
03:00.0 VGA compatible controller [0300]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] [1002:687f] (rev c3)
	Subsystem: ASUSTeK Computer Inc. Device [1043:0555]
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu
03:00.1 Audio device [0403]: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
	Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64] [1002:aaf8]
	Kernel driver in use: snd_hda_intel
	Kernel modules: snd_hda_intel
05:00.0 Ethernet controller [0200]: Qualcomm Atheros Killer E2400 Gigabit Ethernet Controller [1969:e0a1] (rev 10)
	Subsystem: ASRock Incorporation Device [1849:e0a1]
	Kernel driver in use: alx
	Kernel modules: alx
06:00.0 Network controller [0280]: Intel Corporation Wireless 7260 [8086:08b1] (rev bb)
	Subsystem: Intel Corporation Dual Band Wireless-AC 7260 [Wilkins Peak 2] [8086:c470]
	Kernel driver in use: iwlwifi
	Kernel modules: iwlwifi
(base)   dynamo   ~ git-[ main]- 

grub kernel parameters:

GRUB_CMDLINE_LINUX_DEFAULT="loglevel=5 amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 amdgpu.noretry=0 amdgpu.ppfeaturemask=0xffffcff8 intel_iommu=on iommu=pt video=efifb:off pcie_acs_override=downstream,multifunction vfio-pci.ids=1002:687f,1002:aaf8"

[i disabled vfio in mkinitcpio.conf and don't have any other configs]

part of these parameters come from this post:https://bbs.archlinux.org/viewtopic.php?id=299883

im going to replace vega56 as long as i can
this crap is just unstable

## Solution:
Switching into https://github.com/manjaro-kernels/linu … g/6.12.7-1 manjaro kernel

Last edited by safe049 (2025-01-27 03:30:24)


std::cout << "I use Arch BTW" << endl;

Offline

#2 2024-12-27 15:10:59

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

Switching into linux stable kernel doesn't works


std::cout << "I use Arch BTW" << endl;

Offline

#3 2024-12-27 15:40:58

seth
Member
Registered: 2012-09-03
Posts: 61,581

Re: [SOLVED] AMDGPU reset when playing games

There's a whole slew of threads itr that hinge either on the 6.12 kernel (apparently not you) or the mesa 24.3 release, so try downgrading to mesa 24.2.7

Offline

#4 2024-12-28 03:00:54

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

switched to mesa 24.2.7,doesn't work
all the mesa packages:

(base)   dynamo   ~ git-[ main]-  sudo pacman -Qs mesa
local/glu 9.0.3-2
    Mesa OpenGL utility library
local/lib32-glu 9.0.3-2
    Mesa OpenGL utility library (32 bits)
local/lib32-mesa 1:24.2.7-1
    Open-source OpenGL drivers - 32-bit
local/lib32-vulkan-mesa-layers 1:24.2.7-1
    Mesa's Vulkan layers - 32-bit
local/mesa 1:24.2.7-1
    Open-source OpenGL drivers
local/mesa-demos 9.0.0-5
    Mesa demos
local/mesa-utils 9.0.0-5
    Essential Mesa utilities
local/vulkan-mesa-layers 1:24.2.7-1
    Mesa's Vulkan layers
(base)   dynamo   ~ git-[ main]- 

new journal:
http://0x0.st/8sAR.txt

tested on Windows
in Windows,playing Battlefield 1 is completely fine

playing Minecraft[JAVA]is okay[with a high-end shader]
i assume it only happens on proton?

Last edited by safe049 (2024-12-28 03:13:43)


std::cout << "I use Arch BTW" << endl;

Offline

#5 2024-12-28 05:08:19

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

Downgrading vulkan-radeon[include lib32]make the crash slower[at least i can get loaded and get into the game for a while]
but it still crashes
i will try lts kernel
https://0x0.st/8smf.txt


edit:switching to lts doesn't work

disabling iommu doesn't work either

tried to clear all grub launch parameter
still doesn't work

i can use ollama 8b llm good

Last edited by safe049 (2024-12-28 05:39:33)


std::cout << "I use Arch BTW" << endl;

Offline

#6 2024-12-28 08:00:09

seth
Member
Registered: 2012-09-03
Posts: 61,581

Re: [SOLVED] AMDGPU reset when playing games

Looking at your journal I'd actually indded primarily blame vmlinuz-linux611-tkg-bore - please try w/ the repo kernels.

Offline

#7 2024-12-28 13:24:42

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

switching into stable kernel without any kernel parameters solved the problem


std::cout << "I use Arch BTW" << endl;

Offline

#8 2024-12-28 19:41:17

seth
Member
Registered: 2012-09-03
Posts: 61,581

Re: [SOLVED] AMDGPU reset when playing games

"stable kernel" like "LTS" (long term support, not "linux totally stable") or just the regular 6.12 kernel instead of the tkg one?

Offline

#9 2024-12-29 04:10:39

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

well previously i switched to just the regular 6.12 kernel instead of the tkg one

but today it still crash in game
and i found this post with litreally same error with me and with vega56: https://bbs.archlinux.org/viewtopic.php?id=288107
i installed the Linux 6.12.7-1-MANJARO kernel and everything turned okay

maybe manjaro kernel did have some patch i guess


std::cout << "I use Arch BTW" << endl;

Offline

#10 2024-12-29 07:14:36

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

Nah it still happens but less frequent
i tried this way in reddit:
lower the max clock into 1375

i also flashed my vega56 vbios into vega64 one
that slower the crash but it still happens

i'd like to flash this vbios which people says it is stable:https://www.reddit.com/r/Amd/comments/g4asqi/psa_how_to_fix_your_unstable_gigabyte_vega_56/

but i use rog strix vega56
which got 2dp 1dvi 2hdmi,and i use 1dvi 1hdmi,
when i flash that vbios in
my hdmi screen will receive no signals

flashing back to original vbios and using 1375 mhz clock doesn't work
journal:
https://0x0.st/8sLn.txt

Last edited by safe049 (2024-12-29 07:37:44)


std::cout << "I use Arch BTW" << endl;

Offline

#11 2024-12-29 16:04:08

seth
Member
Registered: 2012-09-03
Posts: 61,581

Re: [SOLVED] AMDGPU reset when playing games

Yeah, was gonna say that "maybe manjaro kernel did have some patch i guess" is beyond implausible, but it was late and tired wink

Restrict your tests to the LTS kernel, there /is/ something going on w/ amdgpu in the 6.12 kernels.
Does the board allow any power/voltage control for the GPU?
Did you maybe forget to attach the dedicated 6/8-pin power supply? tongue
Wrt the apparently specific context, does https://wiki.archlinux.org/title/Steam/ … _emulation help?

Offline

#12 2025-01-27 03:30:07

safe049
Member
From: Shanxi,China
Registered: 2024-05-02
Posts: 105
Website

Re: [SOLVED] AMDGPU reset when playing games

safe049 wrote:

well previously i switched to just the regular 6.12 kernel instead of the tkg one

but today it still crash in game
and i found this post with litreally same error with me and with vega56: https://bbs.archlinux.org/viewtopic.php?id=288107
i installed the Linux 6.12.7-1-MANJARO kernel and everything turned okay

maybe manjaro kernel did have some patch i guess


Uhhh it is solved.

Using those launch parameters with MANJARO kernel worked


std::cout << "I use Arch BTW" << endl;

Offline

Board footer

Powered by FluxBB