You are not logged in.

#1 2021-11-12 22:54:23

DemonicSavage
Member
Registered: 2016-10-20
Posts: 7

AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Starting a few days ago, randomly my GPU will seemingly crash, after a few minutes or hours of uptime.
The computer works fine, except there is no video at all, and the monitors lose signal.

I have a Ryzen 5 5600X, a RX 6700-XT, and 32 GiB of RAM.

Here is some useful logs from journalctl:

Nov 12 18:31:28 Belphegor kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:80:crtc-1] flip_done timed out
Nov 12 18:31:30 Belphegor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=703488, emitted seq=703490
Nov 12 18:31:30 Belphegor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Discord pid 86701 thread Discord:cs0 pid 86709
Nov 12 18:31:30 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
Nov 12 18:31:34 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: failed to suspend display audio
Nov 12 18:31:34 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
Nov 12 18:31:34 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: Failed to disable gfxoff!
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:434
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:508
Nov 12 18:31:34 Belphegor kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:442
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:516
Nov 12 18:31:34 Belphegor kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Nov 12 18:31:34 Belphegor kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 000000002e2d4f10; ring_buffer_end = 0000000075bea464; write_frame = 00000000457a5668
Nov 12 18:31:34 Belphegor kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Nov 12 18:31:35 Belphegor kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 12 18:31:35 Belphegor kernel: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_disable_crtc line:544
Nov 12 18:31:35 Belphegor kernel: BUG: unable to handle page fault for address: ffffb7aba05206f8
Nov 12 18:31:35 Belphegor kernel: #PF: supervisor read access in kernel mode
Nov 12 18:31:35 Belphegor kernel: #PF: error_code(0x0000) - not-present page
Nov 12 18:31:35 Belphegor kernel: PGD 100000067 P4D 100000067 PUD 0

Offline

#2 2021-11-13 07:49:35

m6x
Member
From: Germany
Registered: 2020-04-01
Posts: 15

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I've had this kind of problem for a long time with 2 different PCs and AMD GPUs (the older one was Zen1 based with a Vega 64 and the new one is Zen3 based with a RX 5700 XT). I've never been able to completely solve the issue, so it still occurs, but it occurs very rarely now so that I can tolerate it.
I've searched quite a bit for solutions to this and it's really mostly voodoo stuff. But I'll tell you the things I've tried personally, some of which might help according to other posts on the web, but they also might not help in your specific case. You probably have to try several things until the problem either goes away or almost goes away for your specific setup.
So in no particular order, these are the things I've done which mitigated the problem for me at least (it's likely that some of those things are completely irrelevant, but after doing all that the situation improved massively for me, so I'll just keep it that way):

  • Ensure you have the latest UEFI firmware and software updates in general

  • Disable any overclockings if you have any

  • Use the kernel parameters:

    amdgpu.noretry=0 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1 amdgpu.audio=0

    (audio=0 will disable the HDMI audio feature). You can also try

    iommu=pt

    or

    iommu=soft

    (IOMMU must be on or auto in UEFI). Another thing to try:

    pcie_aspm=off

    (that will disable a PCIe power management feature). I also use

    processor.max_cstate=5

    because my Ryzens seem to generally have issues with the C6 power saving state. But that has probably nothing to do with the GPU issue.

  • In UEFI, set PCIe slot generation from "Auto" to "Gen4" (which is probably what you have) and set power supply current idle control to "Typical". You can also try to disable even more power saving features in UEFI, but for me that didn't really help.

  • Try

    echo "high" > /sys/class/drm/card0/device/power_dpm_force_performance_level'

    (default is "auto")

Most of the tips I've found have to do with various power saving stuff as you can see. With the above tips I managed to have this issue appear only very rarely (like once every 1-3 months) instead of every 1-2 days, which is of course a massive improvement and makes it very usable. I still have found no explanation why this occurs to begin with. Faulty hardware could also be a thing. There are even more voodoo tips out there. Good luck. wink

Last edited by m6x (2021-11-13 07:50:27)


int pi = 3;

Offline

#3 2021-11-13 08:07:59

orlfman
Member
Registered: 2007-11-20
Posts: 98

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

have you tried your system in windows? if you have the same problems in windows its probably a bad video card.

Offline

#4 2021-11-22 19:14:06

prurigro
Member
Registered: 2008-03-14
Posts: 11

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

This started happening to me with my 6800x-xt with 5.15.x using the zen kernel. Downgrading to the zen version of 5.14 resolves the issue for me, and I'm currently testing the stock kernel to see if it happens there too.

EDIT: I should add that while it seems like the computer continues to be functional, what actually happens for me is the display turns off and the computer reboots, but the display doesn't turn back on until I power off and start it back up.

EDIT 2: Looks like stock 5.15.x has the same issue sad

EDIT3 : On the chance it's not the GPU I should also add that I have a Ryzen 5900x

Last edited by prurigro (2021-11-22 19:56:31)

Offline

#5 2021-11-24 14:50:32

Yukiseekyo
Member
Registered: 2017-12-07
Posts: 11

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I have the same issue only when i either lock the screen or blank it, but i could ssh into but couldn't shut it down

There's my specs

OS: Arch Linux x86_64
Kernel: 5.14.16-zen1-1-zen
CPU: Intel Xeon E5-1650 v2 (12) @ 4.000GHz
GPU: AMD ATI Radeon RX 6600/6600 XT/6600M
Memory: 765MiB / 32037MiB

Offline

#6 Today 14:33:09

a1ex
Member
From: Germany
Registered: 2007-02-16
Posts: 89

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Yep, 6600XT reporting in as broken.


When it tires to turn on the screens from standby (suspend or just idle, doesn't matter) they just flicker and go back off immediately. Turning one on and off might yield a picture, but it's frozen.
Journal gets spammed with this:

[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

After a bit timeout call traces into amdgpu and gpu_shed appear in the journal. Don't see a way of recovery from there except forcing the machine off.

Offline

Board footer

Powered by FluxBB