You are not logged in.
I have a Radeon RX6800XT and when I'm not using my GPU for gaming, I use it to mine ethereum. To keep both noise and temperatures reasonable, I manually set a power cap of 160W and a fixed fan speed of 50%. This usually results in mid 50s edge/junction temps and low 70s memory temps.
Today, when checking my temperatures, I noticed that they were sky high (98°C memory temperature, 89°C edge temperature, 99°C junction temperature) and the GPU was consuming 240W.
To set the power cap, I used the following commands:
# Set manual performance control
echo manual > /sys/class/drm/card0/device/power_dpm_force_performance_level
# Set power cap to 160W
echo 160000000 > /sys/class/drm/card0/device/hwmon/hwmon*/power1_cap
However when I checked the actual value in power1_cap, it returned this:
cat /sys/class/drm/card0/device/hwmon/hwmon*/power1_cap
1271490560
That's a 1271W power "cap" !
My kernel version is 5.13.7-arch1-1, I tested with a few other 5.13 releases as well, and they behave the same: it seems that no matter which value I send to power1_cap, it always returns 1271490560.
Then I tested with 5.12.15.arch1-1, and it behaves as expected:
cat /sys/class/drm/card0/device/hwmon/hwmon*/power1_cap
160000000
I believe this could issue be very dangerous and damage people's hardware. My GPU ran for several hours at these temperatures before I noticed, I can only hope there is no permanent damage.
Offline
Update:
The issue has apparently already been reported here: https://gitlab.freedesktop.org/drm/amd/-/issues/1657
Offline