You are not logged in.
First at all,it crashed by 3 times,so there are three journals,because it even crashed once when i'm posting the topic,i can't distinguish which is the first or the third.
GPU:ROG Radeon Vega56 8G OC[not overclocked]
edit2:[Solution: turn radeon.dpm=0 on]
edit3:happened again with same issue even when without lact.
Journals:
edit:
**Detailed one**
http://0x0.st/X3aI.txt
filtered out error part:
Sep 17 21:43:01 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:01 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:01 archyyds lact[1177]: 2024-09-17T13:43:01.157387Z WARN lact_daemon::server::gpu_controller::fan_control: GPU temperature is beyond critical values! 511°C
Sep 17 21:43:01 archyyds kernel: amdgpu: [powerplay] Failed message: 0x5, input parameter: 0x2000000, error code: 0xffffffff
Sep 17 21:43:01 archyyds kernel: snd_hda_intel 0000:03:00.1: Unable to change power state from D3hot to D0, device inaccessible
Sep 17 21:43:01 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:01 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:01 archyyds kernel: snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
Sep 17 21:43:02 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:02 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:02 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:02 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:03 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:03 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:03 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:03 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:04 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:04 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:04 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:04 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:05 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:05 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:05 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:05 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:06 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:06 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:06 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:06 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:07 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:07 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:07 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:07 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:08 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:08 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:08 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:08 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:09 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:09 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:09 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:09 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:10 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:10 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:10 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 17 21:43:10 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 17 21:43:11 archyyds kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=8243286, emitted seq=8243288
Sep 17 21:43:11 archyyds kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDR2.exe pid 18275 thread RDR2.exe:cs0 pid 18409
Sep 17 21:43:11 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Sep 17 21:43:11 archyyds lact[1177]: thread 'main' panicked at lact-daemon/src/server/gpu_controller/mod.rs:488:22:
Sep 17 21:43:11 archyyds lact[1177]: Could not get temperature by given key
Sep 17 21:43:11 archyyds lact[1177]: note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
------------------------------------
http://0x0.st/X3m1.txt [-1]
I found out that i can connect to my ssh server when it is crashed,
I use rocm-smi to try to status the gpu
all the status is just a mess,
The fan is over 100%[actually it's not],the clock get to 3000%,the watt gets to 56239W,I will try to get a full one and post in the comment
edit:yes i got one when playing rdr2
The temp is around 72 degrees,not so high,it could not be so high as 500+,that will even blow my house down
inxi:
dynamo ~ git-[ master]- inxi -Gxxx
Graphics:
Device-1: AMD Vega 10 XL/XT [Radeon RX 56/64] vendor: ASUSTeK driver: amdgpu
v: kernel arch: GCN-5 pcie: speed: 8 GT/s lanes: 16 ports: active: DVI-D-1
empty: DP-1, DP-2, HDMI-A-1, HDMI-A-2 bus-ID: 03:00.0 chip-ID: 1002:687f
class-ID: 0300
Device-2: Aveo USB2.0 Camera driver: snd-usb-audio,uvcvideo type: USB
rev: 2.0 speed: 480 Mb/s lanes: 1 bus-ID: 1-4:4 chip-ID: 1871:0142
class-ID: 0102
Display: x11 server: X.Org v: 21.1.13 with: Xwayland v: 24.1.2
compositor: Picom v: git-89c2c driver: X: loaded: amdgpu
unloaded: modesetting,vesa alternate: fbdev dri: radeonsi gpu: amdgpu
display-ID: :0 screens: 1
Screen-1: 0 s-res: 1920x1080 s-dpi: 96 s-size: 508x285mm (20.00x11.22")
s-diag: 582mm (22.93")
Monitor-1: DVI-D-1 mapped: DVI-D-0 model: Acer G223HQL
serial: LYHCN0022400 res: 1920x1080 hz: 60 dpi: 102
size: 477x268mm (18.78x10.55") diag: 547mm (21.5") modes: max: 1920x1080
min: 720x400
API: EGL v: 1.5 hw: drv: amd radeonsi platforms: device: 0 drv: radeonsi
device: 1 drv: swrast gbm: drv: kms_swrast surfaceless: drv: radeonsi x11:
drv: radeonsi inactive: wayland
API: OpenGL v: 4.6 compat-v: 4.5 vendor: amd mesa v: 24.2.2-arch1.1
glx-v: 1.4 direct-render: yes renderer: AMD Radeon RX Vega (radeonsi vega10
LLVM 18.1.8 DRM 3.54 6.6.51-1-lts) device-ID: 1002:687f
API: Vulkan v: 1.3.295 layers: 10 surfaces: xcb,xlib device: 0
type: discrete-gpu hw: amd driver: mesa radv device-ID: 1002:687f
lspci -k:
dynamo ~ git-[ master]- lspci -k
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
Subsystem: ASRock Incorporation Device 191f
Kernel driver in use: skl_uncore
Kernel modules: ie31200_edac
00:01.0 PCI bridge: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) (rev 07)
Subsystem: ASRock Incorporation Device 1901
Kernel driver in use: pcieport
00:14.0 USB controller: Intel Corporation 100 Series/C230 Series Chipset Family USB 3.0 xHCI Controller (rev 31)
Subsystem: ASRock Incorporation Device a12f
Kernel driver in use: xhci_hcd
Kernel modules: xhci_pci
00:14.2 Signal processing controller: Intel Corporation 100 Series/C230 Series Chipset Family Thermal Subsystem (rev 31)
Subsystem: ASRock Incorporation Device a131
Kernel driver in use: intel_pch_thermal
Kernel modules: intel_pch_thermal
00:16.0 Communication controller: Intel Corporation 100 Series/C230 Series Chipset Family MEI Controller #1 (rev 31)
Subsystem: ASRock Incorporation Device a13a
Kernel driver in use: mei_me
Kernel modules: mei_me
00:17.0 SATA controller: Intel Corporation Q170/Q150/B150/H170/H110/Z170/CM236 Chipset SATA Controller [AHCI Mode] (rev 31)
Subsystem: ASRock Incorporation Device a102
Kernel driver in use: ahci
00:1c.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #5 (rev f1)
Subsystem: ASRock Incorporation Device a114
Kernel driver in use: pcieport
00:1d.0 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #9 (rev f1)
Subsystem: ASRock Incorporation Device a118
Kernel driver in use: pcieport
00:1d.3 PCI bridge: Intel Corporation 100 Series/C230 Series Chipset Family PCI Express Root Port #12 (rev f1)
Subsystem: ASRock Incorporation Device a11b
Kernel driver in use: pcieport
00:1f.0 ISA bridge: Intel Corporation B150 Chipset LPC/eSPI Controller (rev 31)
Subsystem: ASRock Incorporation Device a148
00:1f.2 Memory controller: Intel Corporation 100 Series/C230 Series Chipset Family Power Management Controller (rev 31)
Subsystem: ASRock Incorporation Device a121
00:1f.3 Audio device: Intel Corporation 100 Series/C230 Series Chipset Family HD Audio Controller (rev 31)
Subsystem: ASRock Incorporation Device 1151
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel, snd_soc_avs
00:1f.4 SMBus: Intel Corporation 100 Series/C230 Series Chipset Family SMBus (rev 31)
Subsystem: ASRock Incorporation Device a123
Kernel driver in use: i801_smbus
Kernel modules: i2c_i801
01:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge (rev c3)
Kernel driver in use: pcieport
02:00.0 PCI bridge: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
Subsystem: Advanced Micro Devices, Inc. [AMD] Vega 10 PCIe Bridge
Kernel driver in use: pcieport
03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 XL/XT [Radeon RX Vega 56/64] (rev c3)
Subsystem: ASUSTeK Computer Inc. Device 0555
Kernel driver in use: amdgpu
Kernel modules: amdgpu
03:00.1 Audio device: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
Subsystem: Advanced Micro Devices, Inc. [AMD/ATI] Vega 10 HDMI Audio [Radeon Vega 56/64]
Kernel driver in use: snd_hda_intel
Kernel modules: snd_hda_intel
05:00.0 Ethernet controller: Qualcomm Atheros Killer E2400 Gigabit Ethernet Controller (rev 10)
Subsystem: ASRock Incorporation Device e0a1
Kernel driver in use: alx
Kernel modules: alx
06:00.0 Network controller: Intel Corporation Wireless 7260 (rev bb)
Subsystem: Intel Corporation Dual Band Wireless-AC 7260 [Wilkins Peak 2]
Kernel driver in use: iwlwifi
Kernel modules: iwlwifi
dynamo ~ git-[ master]-
It crash frequently when gaming,sometime will even crash on boot or just using normally,
My mesa or amdgpu packages are all the newest[in 2024 Sep 17]:
SEARCH AMD IN PACMAN:
dynamo ~ git-[ master]- sudo pacman -Qs amd
Deploying root access for dynamo. Password pls:
local/composable-kernel 6.0.2-1
High Performance Composable Kernel for AMD GPUs
local/hip-runtime-amd 6.0.2-4
Heterogeneous Interface for Portability ROCm
local/lact 0.5.4-2
AMDGPU Controller application
local/lib32-vulkan-radeon 1:24.2.2-1
Open-source Vulkan driver for AMD GPUs - 32-bit
local/libteam 1.32-2
Library for controlling team network device
local/miopen-hip 6.0.2-1
AMD's Machine Intelligence Library (HIP backend)
local/nvtop 3.1.0-1
GPUs process monitoring for AMD, Intel and NVIDIA
local/rocm-core 6.0.2-2
AMD ROCm core package (version files)
local/rocm-hip-libraries 6.0.2-1
Develop certain applications using HIP and libraries for AMD platforms
local/rocm-hip-runtime 6.0.2-1
Packages to run HIP applications on the AMD platform
local/rocm-hip-sdk 6.0.2-1
Develop applications using HIP and libraries for AMD platforms
local/rocm-opencl-runtime 6.0.2-1
OpenCL implementation for AMD
local/vulkan-radeon 1:24.2.2-1
Open-source Vulkan driver for AMD GPUs
local/xf86-video-amdgpu 23.0.0-2 (xorg-drivers)
X.org amdgpu video driver
dynamo ~ git-[ master]-
SEARCH MESA IN PACMAN:
dynamo ~ git-[ master]- sudo pacman -Qs mesa
local/glu 9.0.3-2
Mesa OpenGL utility library
local/lib32-glu 9.0.3-2
Mesa OpenGL utility library (32 bits)
local/lib32-mesa 1:24.2.2-1
Open-source OpenGL drivers - 32-bit
local/lib32-vulkan-mesa-layers 1:24.2.2-1
Mesa's Vulkan layers - 32-bit
local/libva-mesa-driver 1:24.2.2-1
Open-source VA-API drivers
local/mesa 1:24.2.2-1
Open-source OpenGL drivers
local/mesa-demos 9.0.0-4
Mesa demos
local/mesa-utils 9.0.0-4
Essential Mesa utilities
local/mesa-vdpau 1:24.2.2-1
Open-source VDPAU drivers
local/vulkan-mesa-layers 1:24.2.2-1
Mesa's Vulkan layers
dynamo ~ git-[ master]-
rocminfo:
ROCk module is loaded
=====================
HSA System Attributes
=====================
Runtime Version: 1.1
System Timestamp Freq.: 1000.000000MHz
Sig. Max Wait Duration: 18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model: LARGE
System Endianness: LITTLE
Mwaitx: DISABLED
DMAbuf Support: YES
==========
HSA Agents
==========
*******
Agent 1
*******
Name: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Uuid: CPU-XX
Marketing Name: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz
Vendor Name: CPU
Feature: None specified
Profile: FULL_PROFILE
Float Round Mode: NEAR
Max Queue Number: 0(0x0)
Queue Min Size: 0(0x0)
Queue Max Size: 0(0x0)
Queue Type: MULTI
Node: 0
Device Type: CPU
Cache Info:
L1: 32768(0x8000) KB
Chip ID: 0(0x0)
ASIC Revision: 0(0x0)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 3900
BDFID: 0
Internal Node ID: 0
Compute Unit: 4
SIMDs per CU: 0
Shader Engines: 0
Shader Arrs. per Eng.: 0
WatchPts on Addr. Ranges:1
Features: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: FINE GRAINED
Size: 16315928(0xf8f618) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 2
Segment: GLOBAL; FLAGS: KERNARG, FINE GRAINED
Size: 16315928(0xf8f618) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
Pool 3
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 16315928(0xf8f618) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: TRUE
ISA Info:
*******
Agent 2
*******
Name: gfx900
Uuid: GPU-02151d935c522884
Marketing Name: AMD Radeon RX Vega
Vendor Name: AMD
Feature: KERNEL_DISPATCH
Profile: BASE_PROFILE
Float Round Mode: NEAR
Max Queue Number: 128(0x80)
Queue Min Size: 64(0x40)
Queue Max Size: 131072(0x20000)
Queue Type: MULTI
Node: 1
Device Type: GPU
Cache Info:
L1: 16(0x10) KB
L2: 4096(0x1000) KB
Chip ID: 26751(0x687f)
ASIC Revision: 1(0x1)
Cacheline Size: 64(0x40)
Max Clock Freq. (MHz): 1590
BDFID: 768
Internal Node ID: 1
Compute Unit: 56
SIMDs per CU: 4
Shader Engines: 4
Shader Arrs. per Eng.: 1
WatchPts on Addr. Ranges:4
Coherent Host Access: FALSE
Features: KERNEL_DISPATCH
Fast F16 Operation: TRUE
Wavefront Size: 64(0x40)
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Max Waves Per CU: 40(0x28)
Max Work-item Per CU: 2560(0xa00)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
Max fbarriers/Workgrp: 32
Packet Processor uCode:: 472
SDMA engine uCode:: 434
IOMMU Support:: None
Pool Info:
Pool 1
Segment: GLOBAL; FLAGS: COARSE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 2
Segment: GLOBAL; FLAGS: EXTENDED FINE GRAINED
Size: 8372224(0x7fc000) KB
Allocatable: TRUE
Alloc Granule: 4KB
Alloc Alignment: 4KB
Accessible by all: FALSE
Pool 3
Segment: GROUP
Size: 64(0x40) KB
Allocatable: FALSE
Alloc Granule: 0KB
Alloc Alignment: 0KB
Accessible by all: FALSE
ISA Info:
ISA 1
Name: amdgcn-amd-amdhsa--gfx900:xnack-
Machine Models: HSA_MACHINE_MODEL_LARGE
Profiles: HSA_PROFILE_BASE
Default Rounding Mode: NEAR
Default Rounding Mode: NEAR
Fast f16: TRUE
Workgroup Max Size: 1024(0x400)
Workgroup Max Size per Dimension:
x 1024(0x400)
y 1024(0x400)
z 1024(0x400)
Grid Max Size: 4294967295(0xffffffff)
Grid Max Size per Dimension:
x 4294967295(0xffffffff)
y 4294967295(0xffffffff)
z 4294967295(0xffffffff)
FBarrier Max Size: 32
*** Done ***
fastfetch:
dynamo ~ git-[ master]- fastfetch
-` DISTRO ? Arch Linux x86_64
.o+` │ ├ ? Linux 6.6.51-1-lts
`ooo/ │ ├? ? 2051 (pacman), 1 (flatpak)
`+oooo: │ └ ? zsh 5.9
`+oooooo: DE/WM ? bspwm (X11)
-+oooooo+: │ ├? ? Colloid-dark [GTK3/4]
`/:-:++oooo+: │ ├ ? WhiteSur (24px)
`/++++/+++++++: │ ├ ? alacritty (11pt)
`/++++++++++++++: │ └ ? alacritty 0.13.2
`/+++ooooooooooooo/` │ ├? ? Intel(R) Core(TM) i5-6600K (4) @ 3.90 GHz
./ooosssso++osssssso+` │ ├? ? AMD Radeon RX Vega
.oossssso-````/ossssss+` │ ├? ? 1920x1080 @ 60Hz
-osssssso. :ssssssso. │ ├? ? 2.51 GiB / 15.56 GiB (16%)
:osssssss/ osssso+++. │ ├? ? 0 B / 16.00 GiB (0%)
/ossssssss/ +ssssooo/- │ ├? ? 22 mins
`/ossssso+/:- -:/+osssso+- │ └? ? 1920x1080 @ 60Hz
`+sso+:-` `.-/+oso: AUDIO ? Built-in Audio Analog Stereo
`++:. `-/+/
.` `/
I presume that this is some kind of driver issue,because seems before the update of mesa and amdgpu packages's update,my GPU doesn't fail like this,when it is crashed,my whole PC is running OK,CPU and else is still running,and could be used through SSH,But you can't access the desktop anymore through remote control,the graphic card is still running,the temp is not high[nearly 60 degrees],RGB still runs OK,Fan is also running.
PS:tomorrow i'm going school so can't reply any comments until friday or saturday.
edit3:happened again with basically same error in journalctl,any ideas?
Last edited by safe049 (2024-09-22 14:56:42)
std::cout << "I use Arch BTW" << endl;
Offline
It suprisely completed a full glmark2,this is the result:
dynamo ~ git-[ master]- glmark2
=======================================================
glmark2 2023.01
=======================================================
OpenGL Information
GL_VENDOR: AMD
GL_RENDERER: AMD Radeon RX Vega (radeonsi, vega10, LLVM 18.1.8, DRM 3.54, 6.6.51-1-lts)
GL_VERSION: 4.6 (Compatibility Profile) Mesa 24.2.2-arch1.1
Surface Config: buf=32 r=8 g=8 b=8 a=8 depth=24 stencil=0 samples=0
Surface Size: 800x600 windowed
=======================================================
[build] use-vbo=false: FPS: 7381 FrameTime: 0.135 ms
[build] use-vbo=true: FPS: 9492 FrameTime: 0.105 ms
[texture] texture-filter=nearest: FPS: 9175 FrameTime: 0.109 ms
[texture] texture-filter=linear: FPS: 9074 FrameTime: 0.110 ms
[texture] texture-filter=mipmap: FPS: 9043 FrameTime: 0.111 ms
[shading] shading=gouraud: FPS: 9237 FrameTime: 0.108 ms
[shading] shading=blinn-phong-inf: FPS: 9243 FrameTime: 0.108 ms
[shading] shading=phong: FPS: 9284 FrameTime: 0.108 ms
[shading] shading=cel: FPS: 9356 FrameTime: 0.107 ms
[bump] bump-render=high-poly: FPS: 9193 FrameTime: 0.109 ms
[bump] bump-render=normals: FPS: 9405 FrameTime: 0.106 ms
[bump] bump-render=height: FPS: 9354 FrameTime: 0.107 ms
[effect2d] kernel=0,1,0;1,-4,1;0,1,0;: FPS: 9624 FrameTime: 0.104 ms
[effect2d] kernel=1,1,1,1,1;1,1,1,1,1;1,1,1,1,1;: FPS: 9326 FrameTime: 0.107 ms
[pulsar] light=false:quads=5:texture=false: FPS: 9673 FrameTime: 0.103 ms
[desktop] blur-radius=5:effect=blur:passes=1:separable=true:windows=4: FPS: 8298 FrameTime: 0.121 ms
[desktop] effect=shadow:windows=4: FPS: 8237 FrameTime: 0.121 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1089 FrameTime: 0.919 ms
[buffer] columns=200:interleave=false:update-dispersion=0.9:update-fraction=0.5:update-method=subdata: FPS: 1340 FrameTime: 0.746 ms
[buffer] columns=200:interleave=true:update-dispersion=0.9:update-fraction=0.5:update-method=map: FPS: 1190 FrameTime: 0.840 ms
[ideas] speed=duration: FPS: 5477 FrameTime: 0.183 ms
[jellyfish] <default>: FPS: 9514 FrameTime: 0.105 ms
[terrain] <default>: FPS: 2249 FrameTime: 0.445 ms
[shadow] <default>: FPS: 8382 FrameTime: 0.119 ms
[refract] <default>: FPS: 4100 FrameTime: 0.244 ms
[conditionals] fragment-steps=0:vertex-steps=0: FPS: 9174 FrameTime: 0.109 ms
[conditionals] fragment-steps=5:vertex-steps=0: FPS: 9678 FrameTime: 0.103 ms
[conditionals] fragment-steps=0:vertex-steps=5: FPS: 9698 FrameTime: 0.103 ms
[function] fragment-complexity=low:fragment-steps=5: FPS: 9744 FrameTime: 0.103 ms
[function] fragment-complexity=medium:fragment-steps=5: FPS: 9795 FrameTime: 0.102 ms
[loop] fragment-loop=false:fragment-steps=5:vertex-steps=5: FPS: 9814 FrameTime: 0.102 ms
[loop] fragment-steps=5:fragment-uniform=false:vertex-steps=5: FPS: 9812 FrameTime: 0.102 ms
[loop] fragment-steps=5:fragment-uniform=true:vertex-steps=5: FPS: 9877 FrameTime: 0.101 ms
=======================================================
glmark2 Score: 8039
=======================================================
dynamo ~ git-[ master]-
std::cout << "I use Arch BTW" << endl;
Offline
Got a better journalctl and detailed,editing.
std::cout << "I use Arch BTW" << endl;
Offline
anyone?
std::cout << "I use Arch BTW" << endl;
Offline
Try disabling the lact daemon/the general overclocking attempts you appear do be doing.
Offline
Try disabling the lact daemon/the general overclocking attempts you appear do be doing.
I removed lact and it worked quite a while,but it still happens,btw,is it relevance with the dbus freedesktop error?this error is always there from start to the end
http://0x0.st/X3Rc.txt
New Journalctl:
Sep 21 15:02:42 archyyds systemd[1169]: Started dbus-:1.5-org.gnome.ScreenSaver@630.service.
Sep 21 15:02:42 archyyds gjs[88875]: Failed to resolve shell name: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: The name does not have an owner
Sep 21 15:03:12 archyyds systemd[1169]: Started dbus-:1.5-org.gnome.ScreenSaver@631.service.
Sep 21 15:03:12 archyyds gjs[89023]: Failed to resolve shell name: GDBus.Error:org.freedesktop.DBus.Error.NameHasNoOwner: The name does not have an owner
Sep 21 15:03:18 archyyds systemd[1]: run-docker-runtime\x2drunc-moby-5304a3fac327db4d7e94e431df1edfefa5bc4a5a52f5cd0b441824a8acdab386-runc.26Us8x.mount: Deactivated successfully.
Sep 21 15:03:24 archyyds kernel: amdgpu: [powerplay] Failed message: 0x37, input parameter: 0x0, error code: 0xffffffff
Sep 21 15:03:24 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x26, ret value: 0xffffffff
Sep 21 15:03:24 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x61, ret value: 0xffffffff
Sep 21 15:03:24 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x46, ret value: 0xffffffff
Sep 21 15:03:24 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x63, ret value: 0xffffffff
Sep 21 15:03:24 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x39, ret value: 0xffffffff
Sep 21 15:03:24 archyyds kernel: amdgpu 0000:03:00.0: amdgpu: Failed to send message: 0x3a, ret value: 0xffffffff
Sep 21 15:03:24 archyyds kernel: snd_hda_intel 0000:03:00.1: Unable to change power state from D3hot to D0, device inaccessible
Sep 21 15:03:24 archyyds kernel: snd_hda_intel 0000:03:00.1: CORB reset timeout#2, CORBRP = 65535
std::cout << "I use Arch BTW" << endl;
Offline
I'm now trying this parameter from https://discussion.fedoraproject.org/t/ … s/69068/10 with a same issue
amdgpu.dpm=0
Hope it would work
std::cout << "I use Arch BTW" << endl;
Offline
It did works but also disabled my sensors on GPU,which makes fan control unavailable,is there any other methods?
edit:
Switched to radeon.dpm=0 by suggestions on arch wiki,sensors worked again,testing
Last edited by safe049 (2024-09-21 08:13:30)
std::cout << "I use Arch BTW" << endl;
Offline
very smooth since turn radeon.dpm=0 on,marking solved
std::cout << "I use Arch BTW" << endl;
Offline
It's happening again with same issue without lact,any ideas to fix this?
using amdgpu.dpm=0 seems solve the issue but it disable the sensors and make fan control impossible,
if there's possibility to make sensors work when using that parameter is also okay
Last edited by safe049 (2024-09-22 15:31:52)
std::cout << "I use Arch BTW" << endl;
Offline
Well amdgpu,dpm=0 is just annoying,it doesn't even make the fan spin with maybe 75 degrees,maybe it just unable to detect it,is there a way to make sensor available with amdgpu.dpm=0 on?
std::cout << "I use Arch BTW" << endl;
Offline
using radeon.dpm=0 with radeon ATI drivers seems solved the problem,testing
tested games:
ready or not
battlefield 1
rdr2
cs2
tf2
rimworld
etc
std::cout << "I use Arch BTW" << endl;
Offline