You are not logged in.

#1 2019-11-16 07:18:15

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

I have always had stability problems with my 390x with the graphics driver hanging and my screens going black, the keyboard lighs caps lock etc.. don't change when supposed to and sometimes the last few seconds of audio are played and repeated

This happens with the radeon driver during normal workloads and happens with the amdgpu driver but only when gaming. (I think the same thing happened back when I tried amdgpu-pro but i'm not sure and I don't really want a proprietary driver)

I wanted to try booting with amdgpu.dpm=0 as a kernel parameter but I just get a black screen on boot. Trying this alongside amdgpu.dc=0 gives me this error https://imgur.com/u1QUaCn

I have also tried moving from radv to amdvlk, X11 and wayland, most major distros, and switching to the Linux zen kernel.

When using amdvlk when arma3 proton edition crashes my screens go black then sometimes recovers with the game closed and discord asking me if I want to switch the the new sound device it has detected (my graphics card)

Any help would be greatly appreciated.

Last edited by jaydenhawkes123 (2019-11-20 08:14:39)

Offline

#2 2019-11-16 13:24:02

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

The screenshot indicates your processor may have firmware issues.

Often those issues are solved/workedaround by updating the microcode used in the prcoessor.
These can come with uefi/bios firmware updates from motherboard manufacturers , but they often stop releasing those updates rather fast.

Linux devs + amd + intel have designed a mechanism that allows the kernel to perform microcode updates for their respective processors.

The https://wiki.archlinux.org/index.php/Microcode page details how that's done on archlinux, other distros should have similar pages.


Whether microcode updates will solve the issues is unknown, but there's a good chance they'll improve stability of the system.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#3 2019-11-16 21:15:14

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

Doing the microcode update has removed the firmware bug I have been seeing before, thanks. Still can't boot with dpm disabled but maybe now I won't need to, will try to do some gaming later today.

Offline

#4 2019-11-20 07:03:28

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

Unfortunately although updating the microcode has gotten rid of the occasional firmware bug warning, it has not solved my issue, games are still crashing.

Offline

#5 2019-11-20 07:07:37

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

I have just decided to try and check my GPU temperature to see if thats the problem and....

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      1000.00 mV
fan1:             N/A  (min =    0 RPM, max =    0 RPM)
edge:         +60.0 C  (crit = +104000.0 C, hyst = -273.1 C)
power1:       41.16 W  (cap = 230.00 W)

Why is it my citical temperature is thousands of degrees C

Offline

#6 2019-11-20 07:27:54

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

Also I just noticed a very large red systemd error in my journalctl, Don't this this was there a few years ago when I was having the same problems but maybe it's relevant pastebin

Offline

#7 2019-11-21 19:54:12

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

Are you using https://www.archlinux.org/packages/extr … 4/tracker/ ?

What WM/DE are you using ?

Can you try with simple WMs like openbox or twm ?


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

#8 2019-11-21 20:54:51

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

Currently I'm using I3, but I have the same problem with gnome, kde, dwm and other DE's. I'm not sure what tracker is but I do have it installed.

Offline

#9 2019-11-22 07:09:41

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

Some extra information asked by a user of a different forum
inxi -Fxxxza --no-host

System:    Kernel: 5.3.11-zen1-1-zen x86_64 bits: 64 compiler: gcc v: 9.2.0 
           parameters: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=85183542-ef05-4aba-a8b0-506ee1f8e714 rw 
           radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=0 radeon.dpm=0 
           Desktop: i3 4.17.1 info: polybar dm: LightDM 1.30.0 Distro: Arch Linux 
Machine:   Type: Desktop System: ASUS product: All Series v: N/A serial: <filter> 
           Mobo: ASUSTeK model: MAXIMUS VII HERO v: Rev 1.xx serial: <filter> UEFI: American Megatrends v: 3503 
           date: 04/18/2018 
CPU:       Topology: Quad Core model: Intel Core i7-4790K bits: 64 type: MT MCP arch: Haswell family: 6 model-id: 3C (60) 
           stepping: 3 microcode: 27 L2 cache: 8192 KiB 
           flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 63963 
           Speed: 3999 MHz min/max: 800/4400 MHz Core speeds (MHz): 1: 3998 2: 3998 3: 4000 4: 3999 5: 4000 6: 4004 7: 4004 
           8: 4003 
           Vulnerabilities: Type: itlb_multihit status: KVM: Split huge pages 
           Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable 
           Type: mds mitigation: Clear CPU buffers; SMT vulnerable 
           Type: meltdown mitigation: PTI 
           Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp 
           Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization 
           Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling 
           Type: tsx_async_abort status: Not affected 
Graphics:  Device-1: Advanced Micro Devices [AMD/ATI] Hawaii XT / Grenada XT [Radeon R9 290X/390X] vendor: Micro-Star MSI 
           driver: amdgpu v: kernel bus ID: 01:00.0 chip ID: 1002:67b0 
           Display: x11 server: X.Org 1.20.5 driver: amdgpu resolution: 1920x1080~60Hz, 1920x1080~60Hz 
           OpenGL: renderer: AMD Radeon R9 390 Series (HAWAII DRM 3.33.0 5.3.11-zen1-1-zen LLVM 9.0.0) v: 4.5 Mesa 19.2.4 
           direct render: Yes 
Audio:     Device-1: Intel 9 Series Family HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel bus ID: 00:1b.0 
           chip ID: 8086:8ca0 
           Device-2: Advanced Micro Devices [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X] vendor: Micro-Star MSI 
           driver: snd_hda_intel v: kernel bus ID: 01:00.1 chip ID: 1002:aac8 
           Device-3: Logitech HD Webcam C910 type: USB driver: snd-usb-audio,uvcvideo bus ID: 2-4:2 chip ID: 046d:0821 
           serial: <filter> 
           Sound Server: ALSA v: k5.3.11-zen1-1-zen 
Network:   Device-1: Intel Ethernet I218-V vendor: ASUSTeK driver: e1000e v: 3.2.6-k port: f040 bus ID: 00:19.0 
           chip ID: 8086:15a1 
           IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter> 
Drives:    Local Storage: total: 6.13 TiB used: 4.37 TiB (71.3%) 
           ID-1: /dev/sda vendor: Seagate model: ST3500418AS size: 465.76 GiB block size: physical: 512 B logical: 512 B 
           speed: 3.0 Gb/s rotation: 7200 rpm serial: <filter> rev: CC38 temp: 36 C scheme: MBR 
           ID-2: /dev/sdb vendor: Crucial model: CT240M500SSD1 size: 223.57 GiB block size: physical: 4096 B logical: 512 B 
           speed: 6.0 Gb/s serial: <filter> rev: MU05 temp: 36 C scheme: GPT 
           ID-3: /dev/sdc vendor: Seagate model: ST2000DX002-2DV164 size: 1.82 TiB block size: physical: 4096 B logical: 512 B 
           speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: CC41 temp: 38 C scheme: GPT 
           ID-4: /dev/sdd vendor: Seagate model: ST2000DX002-2DV164 size: 1.82 TiB block size: physical: 4096 B logical: 512 B 
           speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: CC41 temp: 37 C scheme: GPT 
           ID-5: /dev/sde vendor: Western Digital model: WD2003FZEX-00SRLA0 size: 1.82 TiB block size: physical: 4096 B 
           logical: 512 B speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: 1A01 temp: 40 C 
Partition: ID-1: / raw size: 214.77 GiB size: 210.39 GiB (97.96%) used: 184.19 GiB (87.5%) fs: ext4 dev: /dev/sdb3 
           ID-2: swap-1 size: 7.80 GiB used: 0 KiB (0.0%) fs: swap swappiness: 60 (default) cache pressure: 100 (default) 
           dev: /dev/sdb4 
Sensors:   System Temperatures: cpu: 29.8 C mobo: 27.8 C gpu: amdgpu temp: 58 C 
           Fan Speeds (RPM): cpu: 0 
Info:      Processes: 229 Uptime: 10h 53m Memory: 15.58 GiB used: 3.69 GiB (23.7%) Init: systemd v: 243 Compilers: gcc: 9.2.0 
           Shell: bash v: 5.0.11 running in: terminator inxi: 3.0.36 

journalctl -b0 -p3:

-- Logs begin at Thu 2019-09-26 07:25:51 AEST, end at Fri 2019-11-22 18:03:11 AEDT. --
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT4._GTF.DSSP], AE_NOT_FOUND (20190703/psargs-330)
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT4._GTF due to previous error (AE_NOT_FOUND) (20190703/psparse-529)
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT4._GTF.DSSP], AE_NOT_FOUND (20190703/psargs-330)
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT4._GTF due to previous error (AE_NOT_FOUND) (20190703/psparse-529)
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxdrv'
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxpci'
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxnetadp'
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxnetflt'
Nov 21 19:46:21 Jayden-Desktop systemd-udevd[394]: could not read from '/sys/module/pcc_cpufreq/initstate': No such device
Nov 21 19:46:25 Jayden-Desktop systemd[807]: PAM failed: User account has expired
Nov 21 19:46:25 Jayden-Desktop systemd[807]: user@964.service: Failed to set up PAM session: Operation not permitted
Nov 21 19:46:25 Jayden-Desktop systemd[807]: user@964.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Nov 21 19:46:25 Jayden-Desktop systemd[1]: Failed to start User Manager for UID 964.
Nov 21 19:46:54 Jayden-Desktop lightdm[827]: gkr-pam: unable to locate daemon control file

Last edited by jaydenhawkes123 (2019-11-22 20:32:46)

Offline

#10 2019-11-22 18:26:37

Ropid
Member
Registered: 2015-03-09
Posts: 1,069

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

jaydenhawkes123 wrote:

I have just decided to try and check my GPU temperature to see if thats the problem and....

amdgpu-pci-0100
Adapter: PCI adapter
vddgfx:      1000.00 mV
fan1:             N/A  (min =    0 RPM, max =    0 RPM)
edge:         +60.0 C  (crit = +104000.0 C, hyst = -273.1 C)
power1:       41.16 W  (cap = 230.00 W)

Why is it my citical temperature is thousands of degrees C

What's worrying is the 60°C temperature you are seeing while your card appears to be idle or close to idle at just 40W power usage. Maybe your problems are because the card overheats when it's under load?

You should research what's going on with the card's cooling and your PC case's cooling. Maybe things are stuffed with dust. Just cleaning would then perhaps mean that your crash problems disappear.

If everything is clean and the fans are all working, then you could look into replacing the thermal paste used on the GPU core. Your card is several years old, and years old thermal paste can cause these kinds of temperature issues. There should be a video on Youtube about how to disassemble your exact card so that you can get a feel at what "replacing thermal paste" means exactly.

The "crit" temperature being totally off doesn't really mean anything. It's not a real temperature, it's just an idea in the driver.

Offline

#11 2019-11-22 20:22:10

2ManyDogs
Forum Moderator
Registered: 2012-01-15
Posts: 4,645

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

jaydenhawkes123, please edit your post and use [ code ] tags when posting output. This makes the output much easier to read.

https://wiki.archlinux.org/index.php/Co … s_and_code
https://bbs.archlinux.org/help.php#bbcode


How to post. A sincere effort to use modest and proper language and grammar is a sign of respect toward the community.

Offline

#12 2019-11-22 20:36:39

jaydenhawkes123
Member
Registered: 2019-11-16
Posts: 8

Re: Cannot boot with amdgpu dpm disabled, random crashes with dpm enabled

The fans do all spin so I tried cleaning out the dust in my card. There was a bit but I don't think it wasn't enough to clog up any of the airwaves. My card is running at 45c at 36W right now (started at 36 but is slowly rising), the high temp before might have been from me gaming earlier or something. Just in case that is the problem I will try replacing the thermal paste on my card. Is there any way in software to tell if a crash is an overheating problem?

Offline

Board footer

Powered by FluxBB