You are not logged in.

#1 2020-03-22 07:41:41

cypher_zero
Member
Registered: 2014-10-23
Posts: 50

[SOLVED?] Frequent Freezes/Crashes with AMD 5700 XT

I purchased an AMD 5700 XT about a month ago and I've been having issues with the system freezing up since I installed it.  This replaced a Nvidia GTX 980 in my system which was working without issue.  I've been trying to troubleshoot this issue off and on since then, but I've run out of ideas.

The freezes seem to happen mostly when the GPU is being taxxed, such as while gaming, though I have run into crashes while doing more trivial things

Here's what I was getting in the `journalctl` around this:

Mar 22 00:49:56 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2536301, emitted seq=2536303
Mar 22 00:49:56 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process DOOMEternalx64v pid 7073 thread DOOMEternalx64v pid 7073
Mar 22 00:49:56 cypherArchDsk kernel: [drm] GPU recovery disabled.
Mar 22 00:49:56 cypherArchDsk org_kde_powerdevil[1851]: powerdevil: Enforcing inhibition from ":1.13" "My SDL application" with cookie 92 and reason "Playing a game"
Mar 22 00:49:56 cypherArchDsk org_kde_powerdevil[1851]: powerdevil: Added change screen settings
Mar 22 00:49:56 cypherArchDsk org_kde_powerdevil[1851]: powerdevil: Added interrupt session
Mar 22 00:49:56 cypherArchDsk org_kde_powerdevil[1851]: powerdevil: Disabling DPMS due to inhibition
Mar 22 00:49:56 cypherArchDsk org_kde_powerdevil[1851]: powerdevil: Can't contact ck
Mar 22 00:49:58 cypherArchDsk kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 22 00:49:58 cypherArchDsk kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 22 00:50:01 cypherArchDsk kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
Mar 22 00:50:01 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=625958, emitted seq=625960
Mar 22 00:50:01 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Mar 22 00:50:01 cypherArchDsk kernel: [drm] GPU recovery disabled.

When this happens, a hard reboot of the system is required to get it to come back.

Since then, I've done a few things to try to alleviate the issue:

* Added the following kernel parameters:

amdgpu.gpu_recovery=1 amdgpu.lockup_timeout=3000

* Installed and enabled `amdgpu-fan` from the AUR

I'm still getting the freezes, but the GUP does recover now; hoping I can prevent it from freezing at all.

here's what I'm getting in `journalctl` now:

Mar 22 02:40:32 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=1827550, emitted seq=1827552
Mar 22 02:40:32 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process DOOMEternalx64v pid 10080 thread DOOMEternalx64v pid 10080
Mar 22 02:40:32 cypherArchDsk kernel: amdgpu 0000:45:00.0: GPU reset begin!
Mar 22 02:40:32 cypherArchDsk kernel: ------------[ cut here ]------------
Mar 22 02:40:32 cypherArchDsk kernel: WARNING: CPU: 15 PID: 5937 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:2959 dcn20_validate_bandwidth+0x99/0xb0 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process DOOMEternalx64v pid 10080 thread DOOMEternalx64v pid 10080
Mar 22 02:40:32 cypherArchDsk kernel: amdgpu 0000:45:00.0: GPU reset begin!
Mar 22 02:40:32 cypherArchDsk kernel: ------------[ cut here ]------------
Mar 22 02:40:32 cypherArchDsk kernel: WARNING: CPU: 15 PID: 5937 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn20/dcn20_resource.c:2959 dcn20_validate_bandwidth+0x99/0xb0 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel: Modules linked in: fuse xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 br_netfilter bridge stp llc nls_iso8859_1 nls_cp437 vfat fat ext4 wmi_bmof mxm_wmi mbcache jbd2>
Mar 22 02:40:32 cypherArchDsk kernel:  k10temp i2c_piix4 soundcore ccp i2c_algo_bit atlantic rng_core rfkill dca wmi pinctrl_amd gpio_amdpt evdev mac_hid acpi_cpufreq vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) drm uhid sg crypto_user agpgart ip_tables x_tables sd_mod hid_logitech_hidpp hid_logitech_dj sr_mod cdrom uas usb_s>
Mar 22 02:40:32 cypherArchDsk kernel: CPU: 15 PID: 5937 Comm: kworker/15:0 Tainted: G        W  OE     5.5.10-arch1-1 #1
Mar 22 02:40:32 cypherArchDsk kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X399 Professional Gaming, BIOS P3.30 08/14/2018
Mar 22 02:40:32 cypherArchDsk kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Mar 22 02:40:32 cypherArchDsk kernel: RIP: 0010:dcn20_validate_bandwidth+0x99/0xb0 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel: Code: 00 00 00 5d 41 5c e9 c6 f5 ff ff 31 d2 f2 0f 11 85 70 21 00 00 48 89 ee 4c 89 e7 e8 b1 f5 ff ff 89 c2 22 95 b8 1d 00 00 75 04 <0f> 0b eb b3 c6 85 b8 1d 00 00 00 89 d0 eb a8 0f 1f 84 00 00 00 00
Mar 22 02:40:32 cypherArchDsk kernel: RSP: 0018:ffffa8c1c0b4baf8 EFLAGS: 00010246
Mar 22 02:40:32 cypherArchDsk kernel: RAX: 0000000000000001 RBX: 0000000000000000 RCX: 00000000060a2a0f
Mar 22 02:40:32 cypherArchDsk kernel: RDX: 0000000000000000 RSI: ffff95037d3f21a0 RDI: 00000000000321a0
Mar 22 02:40:32 cypherArchDsk kernel: RBP: ffff94f969870000 R08: 0000000000000006 R09: 0000000000000000
Mar 22 02:40:32 cypherArchDsk kernel: R10: 0000000000000001 R11: 0000000100000001 R12: ffff950335ca0000
Mar 22 02:40:32 cypherArchDsk kernel: R13: ffff94fa74902c80 R14: 0000000000000000 R15: ffff95034207cc00
Mar 22 02:40:32 cypherArchDsk kernel: FS:  0000000000000000(0000) GS:ffff95037d3c0000(0000) knlGS:0000000000000000
Mar 22 02:40:32 cypherArchDsk kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 22 02:40:32 cypherArchDsk kernel: CR2: 00007f4f286f9ca4 CR3: 00000004fd88c000 CR4: 00000000003406e0
Mar 22 02:40:32 cypherArchDsk kernel: Call Trace:
Mar 22 02:40:32 cypherArchDsk kernel:  dc_validate_global_state+0x28a/0x310 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  ? drm_modeset_lock+0x31/0xb0 [drm]
Mar 22 02:40:32 cypherArchDsk kernel:  amdgpu_dm_atomic_check+0x5d8/0x870 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  drm_atomic_check_only+0x578/0x800 [drm]
Mar 22 02:40:32 cypherArchDsk kernel:  drm_atomic_commit+0x13/0x50 [drm]
Mar 22 02:40:32 cypherArchDsk kernel:  drm_atomic_helper_disable_all+0x175/0x190 [drm_kms_helper]
Mar 22 02:40:32 cypherArchDsk kernel:  drm_atomic_helper_suspend+0x73/0x120 [drm_kms_helper]
Mar 22 02:40:32 cypherArchDsk kernel:  dm_suspend+0x1c/0x60 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  amdgpu_device_ip_suspend_phase1+0x83/0xe0 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  ? _raw_spin_lock+0x13/0x30
Mar 22 02:40:32 cypherArchDsk kernel:  amdgpu_device_ip_suspend+0x1c/0x60 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  amdgpu_device_pre_asic_reset+0x191/0x1a4 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  amdgpu_device_gpu_recover+0x2ee/0xa0b [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  amdgpu_job_timedout+0x103/0x130 [amdgpu]
Mar 22 02:40:32 cypherArchDsk kernel:  drm_sched_job_timedout+0x3e/0x90 [gpu_sched]
Mar 22 02:40:32 cypherArchDsk kernel:  process_one_work+0x1e1/0x3d0
Mar 22 02:40:32 cypherArchDsk kernel:  worker_thread+0x4a/0x3d0
Mar 22 02:40:32 cypherArchDsk kernel:  kthread+0xfb/0x130
Mar 22 02:40:32 cypherArchDsk kernel:  ? process_one_work+0x3d0/0x3d0
Mar 22 02:40:32 cypherArchDsk kernel:  ? kthread_park+0x90/0x90
Mar 22 02:40:32 cypherArchDsk kernel:  ret_from_fork+0x22/0x40
Mar 22 02:40:32 cypherArchDsk kernel: ---[ end trace 6d1a2cf063733ef1 ]---

EDIT: It's also worth noting that even after the recovery happens, GPU performance is pretty terrible until the system is rebooted cleanly.

System Info:

# screenfetch -n
 OS: Arch Linux 
 Kernel: x86_64 Linux 5.5.10-arch1-1
 Uptime: 52m
 Packages: 1538
 Shell: zsh 5.8
 Resolution: 6400x1440
 DE: KDE 5.68.0 / Plasma 5.18.3
 WM: KWin
 GTK Theme: Breeze [GTK2/3]
 Icon Theme: breeze-dark
 Disk: 2.8T / 3.5T (81%)
 CPU: AMD Ryzen Threadripper 1900X 8-Core @ 16x 3.8GHz
 GPU: AMD NAVI10 (DRM 3.36.0, 5.5.10-arch1-1, LLVM 9.0.1)
 RAM: 5608MiB / 64243MiB

# lspci -v -s 45:00.0                                     
45:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c1) (prog-if 00 [VGA controller])
	Subsystem: Sapphire Technology Limited Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
	Flags: bus master, fast devsel, latency 0, IRQ 129
	Memory at 80000000 (64-bit, prefetchable) [size=256M]
	Memory at 90000000 (64-bit, prefetchable) [size=2M]
	I/O ports at 3000 [size=256]
	Memory at 9e700000 (32-bit, non-prefetchable) [size=512K]
	Expansion ROM at 000c0000 [disabled] [size=128K]
	Capabilities: <access denied>
	Kernel driver in use: amdgpu
	Kernel modules: amdgpu

Any suggestions would be greatly appreciated!

EDIT: Googling some more; I may be running into a known bug: https://gitlab.freedesktop.org/drm/amd/issues/892 ?

Last edited by cypher_zero (2020-03-22 21:31:52)

Offline

#2 2020-03-22 21:31:33

cypher_zero
Member
Registered: 2014-10-23
Posts: 50

Re: [SOLVED?] Frequent Freezes/Crashes with AMD 5700 XT

I believe I've solved the issue. It appears that the card needed a second, separate power line from the PSU to provide sufficient power.  Hoping I'm not declaring victory too soon, but I just played Doom Eternal for over an hour and no crashes; here's hoping that solved it.

Offline

#3 2020-04-23 12:47:08

nabos
Member
Registered: 2020-04-23
Posts: 1

Re: [SOLVED?] Frequent Freezes/Crashes with AMD 5700 XT

cypher_zero wrote:

I believe I've solved the issue. It appears that the card needed a second, separate power line from the PSU to provide sufficient power.  Hoping I'm not declaring victory too soon, but I just played Doom Eternal for over an hour and no crashes; here's hoping that solved it.

Any updates on this ?

Offline

Board footer

Powered by FluxBB