You are not logged in.
Pages: 1
Sorry if this is the wrong place to post this, but my rx 6600XT is overheating whenever it reports as 70C, causing the entire system to (gracefully) shutdown, the relevant journalctl log entry is here
Jun 02 18:21:25 arch-dirt-pc kernel: amdgpu 0000:2d:00.0: amdgpu: ERROR: GPU over temperature range(SW CTF) detected!
Jun 02 18:21:25 arch-dirt-pc kernel: amdgpu 0000:2d:00.0: amdgpu: ERROR: System is going to shutdown due to GPU SW CTF!
This is very annoying when I'm playing games (especially ones that can't cap their framerate)
This error has happened on Void Linux too, so it is not entirely arch specific
The CPU is only around 45C when this happens (my cpu cooler is much more than I need), and with a manual check of the temperature of the system after taking off the side panel, it does not feel very warm inside, though this might be because the system has had a minute or so to cool down.
All of the fans on the GPU and in the case are spinning, and the GPU has a large amount of breathing room to get air from, so I'm confused why this is happening.
It seems odd that it would just panic at 70C though, most components can go up to 100C before panicking, and they would thermal throttle before that.
Does anyone know how I can possibly change the thermal limit or make it more aggressively thermal throttle?
Here's my fastfetch output in case it helps, and I would be willing to post dmesg or other utilities if needed.
ieatdirt@arch-dirt-pc
---------------------
OS: Arch Linux x86_64
Host: MS-7B86 (5.0)
Kernel: Linux 6.9.3-arch1-1
Uptime: 1 hour, 13 mins
Packages: 1084 (pacman)
Shell: zsh 5.9
Display (DisplayPort-0): 2560x1440 @ 165Hz *
Display (HDMI-A-0): 1920x1080 @ 60Hz
WM: i3 (X11)
Theme: Adwaita-dark [GTK3]
Icons: Adwaita-dark [GTK3]
Font: DejaVu Sans (11pt) [GTK3]
Cursor: Adwaita
Terminal: alacritty 0.13.2
Terminal Font: Liberation Mono (12.0pt)
CPU: AMD Ryzen 7 5700X (16) @ 3.40 GHz
GPU: AMD Radeon RX 6600 XT @ 0.09 GHz [Discrete]
Memory: 3.42 GiB / 31.27 GiB (11%)
Swap: 0 B / 4.00 GiB (0%)
Disk (/): 12.30 GiB / 19.52 GiB (63%) - ext4
Disk (/backupthings): 274.00 GiB / 915.82 GiB (30%) - ext4
Disk (/home): 282.45 GiB / 1.77 TiB (16%) - ext4
Local IP (wlan0): 192.168.1.38/24 *
Locale: en_US.UTF-8
Last edited by ieatdirt (2024-06-04 14:17:54)
Offline
With further testing, using
while true; do
sensors
sleep 2
clear
done
it appears, that the GPU isnt actually overheating at what btop calls 70C, its actually overheating when "junction" reaches 110C, which it very easily does at ~100W power draw, I moved my wifi card pcie slot down, which seems to have helped somewhat, however it still overheats, My wifi card directly below the GPU by about 6 cm (and has no active cooling) says that it is only 30C, which means the temperature inside the case can't be over that.
Sensors also says that the fans are not spinning at their max even with 135W power draw and 110C junction temperature, Maybe a way to force the fan to run faster could help offset the temperature?
Checking the wiki page here: https://wiki.archlinux.org/title/Fan_sp … an_control
I see amdgpu-fan, which seems to work, It gives me a bit of concern when the last commit was 3 years ago, but it works fine with the matrix i set, and the junction temperature is only reaching 90C, i just wonder why the builtin fan control is so stupid.
Last edited by ieatdirt (2024-06-04 14:17:39)
Offline
Please use [code][/code] tags.
What does "sensors" report wrt the thermal limits?
Do you overclock the system (or otherwise manually interfere w/ the factory-defaults)?
To blow more air at the GPU, see https://wiki.archlinux.org/title/Fan_sp … an_control
Edit: F5**, by 4 minutes
![]()
Last edited by seth (2024-06-04 14:22:39)
Offline
Pages: 1