You are not logged in.

#1 2020-12-25 11:27:03

scurrvy2020
Member
Registered: 2019-09-10
Posts: 8

[Solved] amdgpu issue with 5600 XT, poor performance

I have a 5600 XT running with Arch Linux with mesa-git. The system will hang on boot unless I pass

amdgpu.ppfeaturemask=1

or

amdgpu.dpm=0

. I've tried multiple kernel versions, mesa versions, and this are the only options I've found that allow the system to boot.  Only very occassionally, like once out of 20-30 attempts, will the system boot without the flags.

These settings affect the graphics performance; for example, StarWars battlefront II drops from 144+ fps to 30-40 FPS on the menu screen.


The Log from a failed boot:

Dec 25 12:38:15 user kernel: [drm] amdgpu kernel modesetting enabled.
Dec 25 12:38:15 user kernel: amdgpu: Ignoring ACPI CRAT on non-APU system
Dec 25 12:38:15 user kernel: amdgpu: Topology: Add CPU node
Dec 25 12:38:15 user kernel: fb0: switching to amdgpudrmfb from EFI VGA
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: enabling device (0006 -> 0007)
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: Fetched VBIOS from VFCT
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: BAR 2: releasing [mem 0x7ff0000000-0x7ff01fffff 64bit pref]
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: BAR 0: releasing [mem 0x7fe0000000-0x7fefffffff 64bit pref]
Dec 25 12:38:15 user kernel: [drm:amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem resizing BAR0 (-22).
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: BAR 0: assigned [mem 0x7fe0000000-0x7fefffffff 64bit pref]
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: BAR 2: assigned [mem 0x7ff0000000-0x7ff01fffff 64bit pref]
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: VRAM: 6128M 0x0000008000000000 - 0x000000817EFFFFFF (6128M used)
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
Dec 25 12:38:15 user kernel: [drm] amdgpu: 6128M of VRAM memory ready
Dec 25 12:38:15 user kernel: [drm] amdgpu: 6128M of GTT memory ready.
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: RAS: optional ras ta ucode is not available
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: RAP: optional rap ta ucode is not available
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a3d00 (42.61.0)
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: use vbios provided pptable
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: failed send message: EnableAllSmuFeatures (6)         param: 0x00000000 response 0xfffffffb
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to enable requested dpm features!
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: Failed to setup smc hw!
Dec 25 12:38:15 user kernel: [drm:amdgpu_device_init.cold [amdgpu]] *ERROR* hw_init of IP block <smu> failed -5
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: amdgpu_device_ip_init failed
Dec 25 12:38:15 user kernel: amdgpu 0000:0c:00.0: amdgpu: Fatal error during GPU init
Dec 25 12:38:15 user kernel: amdgpu: probe of 0000:0c:00.0 failed with error -5

The ERROR with BAR0 does not seem to be a real issue and I can get rid of that by disabling 4G encoding in the motherboard BIOS.

The issue appears to be an issue with the smc hw.

I tried some older versions of the firmware and linux-firmware-git. The issue did not change with any of the firmware.

Just one observation is that with

amdgpu.ppfeaturemask=1

I am not able to read the mem temperature from lm-sensors.

amdgpu-pci-0c00
Adapter: PCI adapter
vddgfx:      800.00 mV 
fan1:        1325 RPM  (min =    0 RPM, max = 2970 RPM)
edge:         +55.0°C  (crit = +100.0°C, hyst = -273.1°C)
                       (emerg = +105.0°C)
junction:     +57.0°C  (crit = +110.0°C, hyst = -273.1°C)
                       (emerg = +115.0°C)
mem:           +0.0°C  (crit = +105.0°C, hyst = -273.1°C)
                       (emerg = +110.0°C)
power1:       12.00 W  (cap = 160.00 W)

Last edited by scurrvy2020 (2020-12-31 08:32:50)

Offline

#2 2020-12-27 03:39:55

scurrvy2020
Member
Registered: 2019-09-10
Posts: 8

Re: [Solved] amdgpu issue with 5600 XT, poor performance

I went through and tried all of the different smc firmware that has been released for navi10_smc.bin.  And I did not notice any difference with any of the firmware.

date                               md5sum                       firmware
linux-firmware-20190923	ca8b8ef19533560979d28fcd100dadce	amdgpu/navi10_smc.bin
linux-firmware-20191215	ca8b8ef19533560979d28fcd100dadce	amdgpu/navi10_smc.bin
linux-firmware-20200122	c11beaf3cd5da0704cdf4ecaf781ad2f	amdgpu/navi10_smc.bin
linux-firmware-20200316	632de739379e484c0233f6808cba2c7f	amdgpu/navi10_smc.bin
linux-firmware-20200421	632de739379e484c0233f6808cba2c7f	amdgpu/navi10_smc.bin
linux-firmware-20200519	632de739379e484c0233f6808cba2c7f	amdgpu/navi10_smc.bin
linux-firmware-20200619	764c88a6d8c1ebb9d48b58026b0e786f	amdgpu/navi10_smc.bin
linux-firmware-20200721	764c88a6d8c1ebb9d48b58026b0e786f	amdgpu/navi10_smc.bin
linux-firmware-20200817	2dd196e77ddc762d6f2bc44f842d4bfd	amdgpu/navi10_smc.bin
linux-firmware-20200918	2dd196e77ddc762d6f2bc44f842d4bfd	amdgpu/navi10_smc.bin
linux-firmware-20201022	c0d776c360f898df13808c7e90fa5e66	amdgpu/navi10_smc.bin
linux-firmware-20201118	c0d776c360f898df13808c7e90fa5e66	amdgpu/navi10_smc.bin
linux-firmware-20201218	392e903462ad03b47ed15e15b2a2c4fb	amdgpu/navi10_smc.bin

I also tried booting with an ubuntu 20.04.1 live USB and it also hung on boot.

I logged stats on my boot success rate and on average it takes 6 attempts for it to boot correctly and the most attempts it has taken is 18.  This is the dmesg from a successful boot. 

[    0.000000] Command line: amdgpu.ppfeaturemask=0xffffffff rw root=UUID=f98ac5ce-2a7a-49c9-a3ee-8366fba9c8af amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.lockup_timeout=3000 pci=noaer initrd=/boot/amd-ucode.img initrd=/boot/initramfs-linux.img
[    0.000000] Kernel command line: amdgpu.ppfeaturemask=0xffffffff rw root=UUID=f98ac5ce-2a7a-49c9-a3ee-8366fba9c8af amdgpu.gpu_recovery=1 amdgpu.audio=0 amdgpu.lockup_timeout=3000 pci=noaer initrd=/boot/amd-ucode.img initrd=/boot/initramfs-linux.img
[    1.865902] [drm] amdgpu kernel modesetting enabled.
[    1.865963] amdgpu: Ignoring ACPI CRAT on non-APU system
[    1.865971] amdgpu: Topology: Add CPU node
[    1.866053] fb0: switching to amdgpudrmfb from EFI VGA
[    1.866134] amdgpu 0000:0c:00.0: vgaarb: deactivate vga console
[    1.866160] amdgpu 0000:0c:00.0: enabling device (0006 -> 0007)
[    1.866222] amdgpu 0000:0c:00.0: amdgpu: Trusted Memory Zone (TMZ) feature disabled as experimental (default)
[    1.867337] amdgpu 0000:0c:00.0: BAR 2: releasing [mem 0x7ff0000000-0x7ff01fffff 64bit pref]
[    1.867338] amdgpu 0000:0c:00.0: BAR 0: releasing [mem 0x7fe0000000-0x7fefffffff 64bit pref]
[    1.867396] [drm:amdgpu_device_resize_fb_bar [amdgpu]] *ERROR* Problem resizing BAR0 (-22).
[    1.867399] amdgpu 0000:0c:00.0: BAR 0: assigned [mem 0x7fe0000000-0x7fefffffff 64bit pref]
[    1.867407] amdgpu 0000:0c:00.0: BAR 2: assigned [mem 0x7ff0000000-0x7ff01fffff 64bit pref]
[    1.867418] amdgpu 0000:0c:00.0: amdgpu: VRAM: 6128M 0x0000008000000000 - 0x000000817EFFFFFF (6128M used)
[    1.867419] amdgpu 0000:0c:00.0: amdgpu: GART: 512M 0x0000000000000000 - 0x000000001FFFFFFF
[    1.867485] [drm] amdgpu: 6128M of VRAM memory ready
[    1.867486] [drm] amdgpu: 6128M of GTT memory ready.
[    2.677070] amdgpu 0000:0c:00.0: amdgpu: RAS: optional ras ta ucode is not available
[    2.697101] amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000036, smu fw if version = 0x00000037, smu fw version = 0x002a3d00 (42.61.0)
[    2.697103] amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
[    2.697171] amdgpu 0000:0c:00.0: amdgpu: use vbios provided pptable
[    2.697172] amdgpu 0000:0c:00.0: amdgpu: smc_dpm_info table revision(format.content): 4.5
[    2.732366] amdgpu 0000:0c:00.0: amdgpu: SMU is initialized successfully!
[    2.797921] amdgpu: Topology: Add dGPU node [0x731f:0x1002]
[    2.797924] amdgpu 0000:0c:00.0: amdgpu: SE 2, SH per SE 2, CU per SH 10, active_cu_number 36
[    2.800705] fbcon: amdgpudrmfb (fb0) is primary device
[    2.932139] amdgpu 0000:0c:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[    2.980052] amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[    2.980053] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[    2.980054] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[    2.980055] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[    2.980056] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[    2.980056] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[    2.980057] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[    2.980057] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[    2.980058] amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[    2.980058] amdgpu 0000:0c:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[    2.980059] amdgpu 0000:0c:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[    2.980060] amdgpu 0000:0c:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[    2.980060] amdgpu 0000:0c:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[    2.980061] amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[    2.980062] amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[    2.980062] amdgpu 0000:0c:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[    2.980791] [drm] Initialized amdgpu 3.39.0 20150101 for 0000:0c:00.0 on minor 0

Any suggestions are appreciated.

Offline

#3 2020-12-27 08:31:00

Ttz_ztT
Member
Registered: 2015-10-03
Posts: 19

Re: [Solved] amdgpu issue with 5600 XT, poor performance

5700XT here running fine with mesa.
Why do you use mesa-git in the first place? does it run with mesa ?
Mesa-git normally needs more -git packages to run smooth (if -git ain't broken).

Have you tried with BIOS/UEFI defaults?

Last edited by Ttz_ztT (2020-12-27 08:31:58)

Offline

#4 2020-12-27 15:31:03

scurrvy2020
Member
Registered: 2019-09-10
Posts: 8

Re: [Solved] amdgpu issue with 5600 XT, poor performance

I also have a 5700 xt on a different machine running just fine.

I tried mesa-git and amd-drm-next kernel to see if there were any patches in experimental releases that would improve the situation.

I tried all of the defaults as well.

I have a pcie riser cable.  I'm not getting any errors with it, but I wonder if that could be part of the issue?

Offline

#5 2020-12-27 19:24:16

Xabre
Member
From: Serbia
Registered: 2009-03-19
Posts: 755

Re: [Solved] amdgpu issue with 5600 XT, poor performance

It's certainly worth checking, since internet is full of people that are complaining about raiser cables and weirdness that occurs when using them. For example PCIe gen 4 slot on the mobo with a PCIe 4 GPU over a raiser cable that supports only PCIe 3 - in theory should work, in practice bitch to set up.

Offline

#6 2020-12-31 08:31:41

scurrvy2020
Member
Registered: 2019-09-10
Posts: 8

Re: [Solved] amdgpu issue with 5600 XT, poor performance

I changed the riser cable and the issue went away. 

So, confirmed this was due to the riser cable.

Thanks everyone for your input!

Offline

Board footer

Powered by FluxBB