You are not logged in.
Good day!
I'm facing bizzare behavior of my arch system GPU usage while lunching games or benchmarks. Most cases it works normally 100% but some times ( randomly absolutely randomly ) it works around 30% of it's power - fps are low and power consumption is about 19W, should be up to 170W.
Example : i run Unigine Supertition "4k Optimized" i have 55-56 fps, then i quit and start benchmark again [i do nothing more] and get 17-18 fps. It can be 5 times good and 3 times bad. Same is with regular gamming, once it's ok another time its unplayable. Dmesg is silent no new output or so.
What i did: I've already tried different mesa, kernels, window managers, fresh install (3 times), so i thought maybe something wrong with my card and tested with windows - run like a charm. I got back to linux and picked Debian ( testing ) and it worked well, no problems like i faced in Arch. I know nothing about Debian and it took me hours to set it up properly.
I really dont know where to look for solution i'm now sure it's arch problem, can something be wrong with firmware?
I'll be pleased with some hints where to look for solution. Tell me what you need and i'll give you everything if it helps.
Best regards,
Skłorpią
Last edited by sklorpion (2021-11-17 18:52:44)
Offline
Dmesg is silent no new output or so
How about the systemd journal?
Para todos todo, para nosotros nada
Offline
when i do
journalctl -f
then i'm spammed with
lis 14 19:02:46 archie sudo[25718]: pam_systemd_home(sudo:account): systemd-homed is not available: Unit dbus-org.freedesktop.home1.service not found.
lis 14 19:02:46 archie sudo[25718]: caca : PWD=/home/caca ; USER=root ; COMMAND=/usr/bin/nvme smart-log /dev/nvme0
lis 14 19:02:46 archie sudo[25718]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
lis 14 19:02:46 archie sudo[25718]: pam_unix(sudo:session): session closed for user root
lis 14 19:02:46 archie dbus-daemon[578]: [system] Activating via systemd: service name='org.freedesktop.home1' unit='dbus-org.freedesktop.home1.service' requested by ':1.543' (uid=0 pid=25752 comm="sudo nvme smart-log /dev/nvme0 ")
lis 14 19:02:46 archie dbus-daemon[578]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.home1.service': Unit dbus-org.freedesktop.home1.service not found.
i don't know how i can focus journalctl on GPU only but I managed to cut out that crap
i solved above with this https://bbs.archlinux.org/viewtopic.php?id=258297 to come to this
lis 14 19:30:25 archie sudo[27469]: caca : PWD=/home/caca ; USER=root ; COMMAND=/usr/bin/nvme smart-log /dev/nvme0
lis 14 19:30:25 archie sudo[27469]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1000)
lis 14 19:30:25 archie sudo[27469]: pam_unix(sudo:session): session closed for user root
i killed my conky where i exec
sudo nvme smart-log /dev/nvme0 | grep temperature | awk '{printf $3}'
and it stopped spitting 10x a sec. Now i could analyse journal.
It worked 12 times in row but then broke for 1 pass and give me this:
lis 14 19:42:38 archie systemd[1]: Starting Cleanup of Temporary Directories...
lis 14 19:42:38 archie systemd[1]: systemd-tmpfiles-clean.service: Deactivated successfully.
lis 14 19:42:38 archie systemd[1]: Finished Cleanup of Temporary Directories.
again 7 good and 1 bad and no output nothing, so my effort was pointless.
i feel like it happens rarely now, i know i still know shit.
Offline
Is the amdgpu kernel module in use when the performance is poor? The -k option for lspci will show this.
Off-topic:
sudo nvme smart-log /dev/nvme0 | grep temperature | awk '{printf $3}'
Did you know awk can pattern match?
sudo nvme smart-log /dev/nvme0 | awk '/temperature/{printf $3}'
Para todos todo, para nosotros nada
Offline
Is the amdgpu kernel module in use when the performance is poor? The -k option for lspci will show this.
yes, i did
watch -n 0.5 lspci -k | grep 'Kernel driver in use'
and nothing changes when performance is lower.
i need to find better way to test it, i need to automate it. Now i did like 20 test while running radeontop all were good. I will update in 24h.
Off-topic:
Did you know awk can pattern match?
- no, thank You.
Offline
So... i've made some simple script to test my gpu, it runs unigine-superstition 50 times and 15 runs are bad rest are with good fps. The best thing - this is hilarious - is if i turn on radeontop program a have 50\50 good results, i dont know if its luck or what, i'm sure i'll go nutts with this one.
During all runs i was checking if kernel driver in use is amdgpu - always true.
My suspicion is that gpu may sometimes enter some kind of low power consumption state or some other craazy state?
Pure luck, i had 6 bad in row ...
Last edited by sklorpion (2021-11-15 19:24:55)
Offline
We could try tweaking the module parameters, for example
amdgpu.runpm=0
^ That kernel command line parameter disables "runtime power management control for dGPUs in PX/HG laptops", which sounds like it might be relevant.
Check the parameters applied in Debian, either manually (/sys/module/amdgpu/parameters/) or with modinfo(8).
Para todos todo, para nosotros nada
Offline
I'm running a script at boot on my 6700XT that sets things to "3D_FULL_SCREEN" or "COMPUTE" in a file "pp_power_profile_mode" in /sys/class/drm/card0/device/. That's solving some weird stutter issues I have here.
Doing that change manually on the command line looks like this:
cd /sys/class/drm/card0/device/
echo manual | sudo tee power_dpm_force_performance_level
echo 1 | sudo tee pp_power_profile_mode
While you are in that /sys/class/drm/card0/device/ location, take a look at the contents of the pp_power_profile_mode file. There's a '*' next to the name of the profile that's currently used by the driver.
Here's an example shell script that searches for the right sub-folder in /sys/class/drm and does the change:
#!/bin/bash
if (( $UID != 0 )); then
echo "$0: needs to run as root!" 1>&2
exit 1
fi
for device in /sys/class/drm/card?/device; do
if [[ -e "$device"/pp_power_profile_mode ]]; then
echo manual > "$device"/power_dpm_force_performance_level
echo 1 > "$device"/pp_power_profile_mode
# The other power profile modes are:
# 1 = 3D_FULL_SCREEN
# 4 = VR
# 5 = COMPUTE
fi
done
For my card here, the "3D_FULL_SCREEN" setting mostly solves the stutter issues but I found examples where it didn't fully work. The problem was still there with old games that don't stress the card much. Using the "COMPUTE" setting solves the stutter issues fully for me.
Using "COMPUTE", the card doesn't seem to use any in-between speeds for the core clock. It's either the lowest when idle on the desktop, or it's max clock speed. The memory clock speed still seems to work like normal, with the card using lower speeds for example when scrolling in the web browser. I was uncomfortable with the card's max core boost speed being used with the COMPUTE mode, so I then looked into how undervolting/underclocking works and limited the max core clock of my card by a lot.
There's a bug report that might be about this problem:
https://gitlab.freedesktop.org/drm/amd/-/issues/1500
Here's the documentation for the files /sys/class/drm:
Offline
Ropid thank you, first tests are promissing, great post, links and hints.
I tried to turn "0 BOOTUP_DEFAULT", "1 3D_FULL_SCREEN*", "2 POWER_SAVING", "3 VIDEO", "5 COMPUTE" there is no difference in performance at all but when i choose "0 BOOTUP_DEFAULT" i face my problem time to time, and if i choose something else it works well.
Strangely while
watch -n 1 cat pp_power_profile_mode
and benchmarking on "0 BOOTUP_DEFAULT " poor fps system is not changing pp_power_profile_mode.
I'm after 50 runs and all were correct i hope to do about 100 more see if it really helped. YES after 100 more benchmarks it works.
Head_on_a_Stick thank you, this
amdgpu.runpm=0
doesn't work. I 'll look at modinfo tomorrow.
If someone is looking for testing script that runs in loop unigine_superstition (windowed 1920x1080) for 20 sec to get Avg fps and exit, this is my poor code:
#!/bin/bash
cd ~/path/where/to/write/screenshots
x=1
while [ $x -le 50 ]
do
notify-send "Zaczynam testy Numer $x "
#notify about number of test
sleep 1
xdotool mousemove 2450 1352
xdotool click 1
# above is click RUN button on unigine_supertition values are X and Y where the button is you can get it running xdotool getmouselocation --shell
sleep 20
# i need 20 sec to start unigine - depends on overall system speed.
wartoscWatt=`sensors | grep 'power1:' | cut -c15-16`
# that one above is for my gpu power consumption, you cant cut it out.
#now we make a screenshot cropped to rectangle 130x81 that upper left corner is located X:2424 Y:344 of my screen
#that area is filled with FPS, Min,Max, Avg FPS
import -window root -crop 130x81+2424+344 "$x".png
#I change colors from black to white (negate colors) for better OCR and reduce its quality (which truly is pointless)
convert "$x".png -quality 80% -negate "$x".png
#run OCR and dont give me errors output, just OCR png to txt, it will make from 1.png t1.txt file
tesseract -c debug_file=/dev/null "$x".png t"$x"
#close unigine by pressing Esc key
xdotool key 'Escape'
#give me shell output about test nr, avg fps, and power consumption
echo "Przejście nr $x " `cat t"$x".txt | grep Avg` " power consumption $wartoscWatt"
x=$(($x + 1))
done
Last edited by sklorpion (2021-11-16 21:49:03)
Offline
This is it! Can be marked as SOLVED!
I run 50 test "0 BOOTUP_DEFAULT " with result of 10 low fps.
i run 200 test " 1 3D_FULL_SCREEN " all good.
Ropid post #8 is a solution, thank you!
Offline