You are not logged in.
I have always had stability problems with my 390x with the graphics driver hanging and my screens going black, the keyboard lighs caps lock etc.. don't change when supposed to and sometimes the last few seconds of audio are played and repeated
This happens with the radeon driver during normal workloads and happens with the amdgpu driver but only when gaming. (I think the same thing happened back when I tried amdgpu-pro but i'm not sure and I don't really want a proprietary driver)
I wanted to try booting with amdgpu.dpm=0 as a kernel parameter but I just get a black screen on boot. Trying this alongside amdgpu.dc=0 gives me this error https://imgur.com/u1QUaCn
I have also tried moving from radv to amdvlk, X11 and wayland, most major distros, and switching to the Linux zen kernel.
When using amdvlk when arma3 proton edition crashes my screens go black then sometimes recovers with the game closed and discord asking me if I want to switch the the new sound device it has detected (my graphics card)
Any help would be greatly appreciated.
Last edited by jaydenhawkes123 (2019-11-20 08:14:39)
Offline
The screenshot indicates your processor may have firmware issues.
Often those issues are solved/workedaround by updating the microcode used in the prcoessor.
These can come with uefi/bios firmware updates from motherboard manufacturers , but they often stop releasing those updates rather fast.
Linux devs + amd + intel have designed a mechanism that allows the kernel to perform microcode updates for their respective processors.
The https://wiki.archlinux.org/index.php/Microcode page details how that's done on archlinux, other distros should have similar pages.
Whether microcode updates will solve the issues is unknown, but there's a good chance they'll improve stability of the system.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
Doing the microcode update has removed the firmware bug I have been seeing before, thanks. Still can't boot with dpm disabled but maybe now I won't need to, will try to do some gaming later today.
Offline
Unfortunately although updating the microcode has gotten rid of the occasional firmware bug warning, it has not solved my issue, games are still crashing.
Offline
I have just decided to try and check my GPU temperature to see if thats the problem and....
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx: 1000.00 mV
fan1: N/A (min = 0 RPM, max = 0 RPM)
edge: +60.0 C (crit = +104000.0 C, hyst = -273.1 C)
power1: 41.16 W (cap = 230.00 W)
Why is it my citical temperature is thousands of degrees C
Offline
Also I just noticed a very large red systemd error in my journalctl, Don't this this was there a few years ago when I was having the same problems but maybe it's relevant pastebin
Offline
Are you using https://www.archlinux.org/packages/extr … 4/tracker/ ?
What WM/DE are you using ?
Can you try with simple WMs like openbox or twm ?
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
Currently I'm using I3, but I have the same problem with gnome, kde, dwm and other DE's. I'm not sure what tracker is but I do have it installed.
Offline
Some extra information asked by a user of a different forum
inxi -Fxxxza --no-host
System: Kernel: 5.3.11-zen1-1-zen x86_64 bits: 64 compiler: gcc v: 9.2.0
parameters: BOOT_IMAGE=/boot/vmlinuz-linux-zen root=UUID=85183542-ef05-4aba-a8b0-506ee1f8e714 rw
radeon.cik_support=0 radeon.si_support=0 amdgpu.cik_support=1 amdgpu.si_support=0 radeon.dpm=0
Desktop: i3 4.17.1 info: polybar dm: LightDM 1.30.0 Distro: Arch Linux
Machine: Type: Desktop System: ASUS product: All Series v: N/A serial: <filter>
Mobo: ASUSTeK model: MAXIMUS VII HERO v: Rev 1.xx serial: <filter> UEFI: American Megatrends v: 3503
date: 04/18/2018
CPU: Topology: Quad Core model: Intel Core i7-4790K bits: 64 type: MT MCP arch: Haswell family: 6 model-id: 3C (60)
stepping: 3 microcode: 27 L2 cache: 8192 KiB
flags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 63963
Speed: 3999 MHz min/max: 800/4400 MHz Core speeds (MHz): 1: 3998 2: 3998 3: 4000 4: 3999 5: 4000 6: 4004 7: 4004
8: 4003
Vulnerabilities: Type: itlb_multihit status: KVM: Split huge pages
Type: l1tf mitigation: PTE Inversion; VMX: conditional cache flushes, SMT vulnerable
Type: mds mitigation: Clear CPU buffers; SMT vulnerable
Type: meltdown mitigation: PTI
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via prctl and seccomp
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer sanitization
Type: spectre_v2 mitigation: Full generic retpoline, IBPB: conditional, IBRS_FW, STIBP: conditional, RSB filling
Type: tsx_async_abort status: Not affected
Graphics: Device-1: Advanced Micro Devices [AMD/ATI] Hawaii XT / Grenada XT [Radeon R9 290X/390X] vendor: Micro-Star MSI
driver: amdgpu v: kernel bus ID: 01:00.0 chip ID: 1002:67b0
Display: x11 server: X.Org 1.20.5 driver: amdgpu resolution: 1920x1080~60Hz, 1920x1080~60Hz
OpenGL: renderer: AMD Radeon R9 390 Series (HAWAII DRM 3.33.0 5.3.11-zen1-1-zen LLVM 9.0.0) v: 4.5 Mesa 19.2.4
direct render: Yes
Audio: Device-1: Intel 9 Series Family HD Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel bus ID: 00:1b.0
chip ID: 8086:8ca0
Device-2: Advanced Micro Devices [AMD/ATI] Hawaii HDMI Audio [Radeon R9 290/290X / 390/390X] vendor: Micro-Star MSI
driver: snd_hda_intel v: kernel bus ID: 01:00.1 chip ID: 1002:aac8
Device-3: Logitech HD Webcam C910 type: USB driver: snd-usb-audio,uvcvideo bus ID: 2-4:2 chip ID: 046d:0821
serial: <filter>
Sound Server: ALSA v: k5.3.11-zen1-1-zen
Network: Device-1: Intel Ethernet I218-V vendor: ASUSTeK driver: e1000e v: 3.2.6-k port: f040 bus ID: 00:19.0
chip ID: 8086:15a1
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
Drives: Local Storage: total: 6.13 TiB used: 4.37 TiB (71.3%)
ID-1: /dev/sda vendor: Seagate model: ST3500418AS size: 465.76 GiB block size: physical: 512 B logical: 512 B
speed: 3.0 Gb/s rotation: 7200 rpm serial: <filter> rev: CC38 temp: 36 C scheme: MBR
ID-2: /dev/sdb vendor: Crucial model: CT240M500SSD1 size: 223.57 GiB block size: physical: 4096 B logical: 512 B
speed: 6.0 Gb/s serial: <filter> rev: MU05 temp: 36 C scheme: GPT
ID-3: /dev/sdc vendor: Seagate model: ST2000DX002-2DV164 size: 1.82 TiB block size: physical: 4096 B logical: 512 B
speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: CC41 temp: 38 C scheme: GPT
ID-4: /dev/sdd vendor: Seagate model: ST2000DX002-2DV164 size: 1.82 TiB block size: physical: 4096 B logical: 512 B
speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: CC41 temp: 37 C scheme: GPT
ID-5: /dev/sde vendor: Western Digital model: WD2003FZEX-00SRLA0 size: 1.82 TiB block size: physical: 4096 B
logical: 512 B speed: 6.0 Gb/s rotation: 7200 rpm serial: <filter> rev: 1A01 temp: 40 C
Partition: ID-1: / raw size: 214.77 GiB size: 210.39 GiB (97.96%) used: 184.19 GiB (87.5%) fs: ext4 dev: /dev/sdb3
ID-2: swap-1 size: 7.80 GiB used: 0 KiB (0.0%) fs: swap swappiness: 60 (default) cache pressure: 100 (default)
dev: /dev/sdb4
Sensors: System Temperatures: cpu: 29.8 C mobo: 27.8 C gpu: amdgpu temp: 58 C
Fan Speeds (RPM): cpu: 0
Info: Processes: 229 Uptime: 10h 53m Memory: 15.58 GiB used: 3.69 GiB (23.7%) Init: systemd v: 243 Compilers: gcc: 9.2.0
Shell: bash v: 5.0.11 running in: terminator inxi: 3.0.36
journalctl -b0 -p3:
-- Logs begin at Thu 2019-09-26 07:25:51 AEST, end at Fri 2019-11-22 18:03:11 AEDT. --
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT4._GTF.DSSP], AE_NOT_FOUND (20190703/psargs-330)
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT4._GTF due to previous error (AE_NOT_FOUND) (20190703/psparse-529)
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI BIOS Error (bug): Could not resolve symbol [\_SB.PCI0.SAT0.SPT4._GTF.DSSP], AE_NOT_FOUND (20190703/psargs-330)
Nov 21 19:46:21 Jayden-Desktop kernel: ACPI Error: Aborting method \_SB.PCI0.SAT0.SPT4._GTF due to previous error (AE_NOT_FOUND) (20190703/psparse-529)
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxdrv'
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxpci'
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxnetadp'
Nov 21 19:46:21 Jayden-Desktop systemd-modules-load[363]: Failed to find module 'vboxnetflt'
Nov 21 19:46:21 Jayden-Desktop systemd-udevd[394]: could not read from '/sys/module/pcc_cpufreq/initstate': No such device
Nov 21 19:46:25 Jayden-Desktop systemd[807]: PAM failed: User account has expired
Nov 21 19:46:25 Jayden-Desktop systemd[807]: user@964.service: Failed to set up PAM session: Operation not permitted
Nov 21 19:46:25 Jayden-Desktop systemd[807]: user@964.service: Failed at step PAM spawning /usr/lib/systemd/systemd: Operation not permitted
Nov 21 19:46:25 Jayden-Desktop systemd[1]: Failed to start User Manager for UID 964.
Nov 21 19:46:54 Jayden-Desktop lightdm[827]: gkr-pam: unable to locate daemon control file
Last edited by jaydenhawkes123 (2019-11-22 20:32:46)
Offline
I have just decided to try and check my GPU temperature to see if thats the problem and....
amdgpu-pci-0100
Adapter: PCI adapter
vddgfx: 1000.00 mV
fan1: N/A (min = 0 RPM, max = 0 RPM)
edge: +60.0 C (crit = +104000.0 C, hyst = -273.1 C)
power1: 41.16 W (cap = 230.00 W)Why is it my citical temperature is thousands of degrees C
What's worrying is the 60°C temperature you are seeing while your card appears to be idle or close to idle at just 40W power usage. Maybe your problems are because the card overheats when it's under load?
You should research what's going on with the card's cooling and your PC case's cooling. Maybe things are stuffed with dust. Just cleaning would then perhaps mean that your crash problems disappear.
If everything is clean and the fans are all working, then you could look into replacing the thermal paste used on the GPU core. Your card is several years old, and years old thermal paste can cause these kinds of temperature issues. There should be a video on Youtube about how to disassemble your exact card so that you can get a feel at what "replacing thermal paste" means exactly.
The "crit" temperature being totally off doesn't really mean anything. It's not a real temperature, it's just an idea in the driver.
Offline
jaydenhawkes123, please edit your post and use [ code ] tags when posting output. This makes the output much easier to read.
https://wiki.archlinux.org/index.php/Co … s_and_code
https://bbs.archlinux.org/help.php#bbcode
How to post. A sincere effort to use modest and proper language and grammar is a sign of respect toward the community.
Offline
The fans do all spin so I tried cleaning out the dust in my card. There was a bit but I don't think it wasn't enough to clog up any of the airwaves. My card is running at 45c at 36W right now (started at 36 but is slowly rising), the high temp before might have been from me gaming earlier or something. Just in case that is the problem I will try replacing the thermal paste on my card. Is there any way in software to tell if a crash is an overheating problem?
Offline