You are not logged in.

#1 2021-11-12 22:54:23

DemonicSavage
Member
Registered: 2016-10-20
Posts: 8

AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Starting a few days ago, randomly my GPU will seemingly crash, after a few minutes or hours of uptime.
The computer works fine, except there is no video at all, and the monitors lose signal.

I have a Ryzen 5 5600X, a RX 6700-XT, and 32 GiB of RAM.

Here is some useful logs from journalctl:

Nov 12 18:31:28 Belphegor kernel: [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] *ERROR* [CRTC:80:crtc-1] flip_done timed out
Nov 12 18:31:30 Belphegor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=703488, emitted seq=703490
Nov 12 18:31:30 Belphegor kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Discord pid 86701 thread Discord:cs0 pid 86709
Nov 12 18:31:30 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: GPU reset begin!
Nov 12 18:31:34 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: failed to suspend display audio
Nov 12 18:31:34 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: SMU: response:0xFFFFFFFF for index:41 param:0x00000000 message:DisallowGfxOff?
Nov 12 18:31:34 Belphegor kernel: amdgpu 0000:08:00.0: amdgpu: Failed to disable gfxoff!
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:434
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:508
Nov 12 18:31:34 Belphegor kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_dpp_pg_control line:442
Nov 12 18:31:34 Belphegor kernel: [drm] REG_WAIT timeout 1us * 1000 tries - dcn20_hubp_pg_control line:516
Nov 12 18:31:34 Belphegor kernel: [drm:dcn20_wait_for_blank_complete [amdgpu]] *ERROR* DC: failed to blank crtc!
Nov 12 18:31:34 Belphegor kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* ring_buffer_start = 000000002e2d4f10; ring_buffer_end = 0000000075bea464; write_frame = 00000000457a5668
Nov 12 18:31:34 Belphegor kernel: [drm:psp_ring_cmd_submit [amdgpu]] *ERROR* write_frame is pointing to address out of bounds
Nov 12 18:31:35 Belphegor kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Nov 12 18:31:35 Belphegor kernel: [drm] REG_WAIT timeout 1us * 100000 tries - optc1_disable_crtc line:544
Nov 12 18:31:35 Belphegor kernel: BUG: unable to handle page fault for address: ffffb7aba05206f8
Nov 12 18:31:35 Belphegor kernel: #PF: supervisor read access in kernel mode
Nov 12 18:31:35 Belphegor kernel: #PF: error_code(0x0000) - not-present page
Nov 12 18:31:35 Belphegor kernel: PGD 100000067 P4D 100000067 PUD 0

Offline

#2 2021-11-13 07:49:35

m6x
Member
From: Germany
Registered: 2020-04-01
Posts: 18

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I've had this kind of problem for a long time with 2 different PCs and AMD GPUs (the older one was Zen1 based with a Vega 64 and the new one is Zen3 based with a RX 5700 XT). I've never been able to completely solve the issue, so it still occurs, but it occurs very rarely now so that I can tolerate it.
I've searched quite a bit for solutions to this and it's really mostly voodoo stuff. But I'll tell you the things I've tried personally, some of which might help according to other posts on the web, but they also might not help in your specific case. You probably have to try several things until the problem either goes away or almost goes away for your specific setup.
So in no particular order, these are the things I've done which mitigated the problem for me at least (it's likely that some of those things are completely irrelevant, but after doing all that the situation improved massively for me, so I'll just keep it that way):

  • Ensure you have the latest UEFI firmware and software updates in general

  • Disable any overclockings if you have any

  • Use the kernel parameters:

    amdgpu.noretry=0 amdgpu.lockup_timeout=1000 amdgpu.gpu_recovery=1 amdgpu.audio=0

    (audio=0 will disable the HDMI audio feature). You can also try

    iommu=pt

    or

    iommu=soft

    (IOMMU must be on or auto in UEFI). Another thing to try:

    pcie_aspm=off

    (that will disable a PCIe power management feature). I also use

    processor.max_cstate=5

    because my Ryzens seem to generally have issues with the C6 power saving state. But that has probably nothing to do with the GPU issue.

  • In UEFI, set PCIe slot generation from "Auto" to "Gen4" (which is probably what you have) and set power supply current idle control to "Typical". You can also try to disable even more power saving features in UEFI, but for me that didn't really help.

  • Try

    echo "high" > /sys/class/drm/card0/device/power_dpm_force_performance_level'

    (default is "auto")

Most of the tips I've found have to do with various power saving stuff as you can see. With the above tips I managed to have this issue appear only very rarely (like once every 1-3 months) instead of every 1-2 days, which is of course a massive improvement and makes it very usable. I still have found no explanation why this occurs to begin with. Faulty hardware could also be a thing. There are even more voodoo tips out there. Good luck. wink

Last edited by m6x (2021-11-13 07:50:27)


int pi = 3;

Offline

#3 2021-11-13 08:07:59

orlfman
Member
Registered: 2007-11-20
Posts: 99

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

have you tried your system in windows? if you have the same problems in windows its probably a bad video card.

Offline

#4 2021-11-22 19:14:06

prurigro
Member
Registered: 2008-03-14
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

This started happening to me with my 6800x-xt with 5.15.x using the zen kernel. Downgrading to the zen version of 5.14 resolves the issue for me, and I'm currently testing the stock kernel to see if it happens there too.

EDIT: I should add that while it seems like the computer continues to be functional, what actually happens for me is the display turns off and the computer reboots, but the display doesn't turn back on until I power off and start it back up.

EDIT 2: Looks like stock 5.15.x has the same issue sad

EDIT3 : On the chance it's not the GPU I should also add that I have a Ryzen 5900x

Last edited by prurigro (2021-11-22 19:56:31)

Offline

#5 2021-11-24 14:50:32

Yukiseekyo
Member
Registered: 2017-12-07
Posts: 12

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I have the same issue only when i either lock the screen or blank it, but i could ssh into but couldn't shut it down

There's my specs

OS: Arch Linux x86_64
Kernel: 5.14.16-zen1-1-zen
CPU: Intel Xeon E5-1650 v2 (12) @ 4.000GHz
GPU: AMD ATI Radeon RX 6600/6600 XT/6600M
Memory: 765MiB / 32037MiB

Offline

#6 2021-12-02 14:33:09

a1ex
Member
From: Germany
Registered: 2007-02-16
Posts: 90

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Yep, 6600XT reporting in as broken.


When it tires to turn on the screens from standby (suspend or just idle, doesn't matter) they just flicker and go back off immediately. Turning one on and off might yield a picture, but it's frozen.
Journal gets spammed with this:

[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

After a bit timeout call traces into amdgpu and gpu_shed appear in the journal. Don't see a way of recovery from there except forcing the machine off.

Offline

#7 2021-12-02 23:16:03

prurigro
Member
Registered: 2008-03-14
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

This is sounding like a number of very similar but different issues- @a1ex, are you only seeing this on 5.15? (I see @Yukiseekyo is hitting their's on 5.14)

Offline

#8 2021-12-03 04:55:22

Yukiseekyo
Member
Registered: 2017-12-07
Posts: 12

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Only on 5.15, 5.14 works fine for me and 5.16rc works as well

Offline

#9 2021-12-03 16:06:09

prurigro
Member
Registered: 2008-03-14
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

That's good to hear that 5.16 isn't triggering it for you! Hopefully that means they solved whatever the issue was.

Offline

#10 2021-12-04 15:25:52

a1ex
Member
From: Germany
Registered: 2007-02-16
Posts: 90

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Gives me hope, best I could find is this bisection effort which apparently still failed in the end.
https://lore.kernel.org/regressions/8e4 … uin.co.uk/

I haven't yet found a pattern with this issue, it tends to work for a while after reboot. Then some display wakeup suddenly kills the gpu driver.

From the description and the versions this is very likely a different issue to the thread starter though. (first noticed it on some 5.15 version)

Offline

#11 2021-12-06 18:39:57

prurigro
Member
Registered: 2008-03-14
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

@a1ex: Yeah, it seems like that lines up more closely with your issue. It doesn't seem out of the question that more than one issue would have appeared in 5.15 considering all the AMD changes, but hopefully they're connected enough to be fixed together. Have you tested the 5.16 release candidate yet?

Offline

#12 2021-12-12 09:30:15

giulivo
Member
Registered: 2014-01-05
Posts: 4

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

a1ex wrote:

Yep, 6600XT reporting in as broken.

When it tires to turn on the screens from standby (suspend or just idle, doesn't matter) they just flicker and go back off immediately. Turning one on and off might yield a picture, but it's frozen.
Journal gets spammed with this:

[drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3

After a bit timeout call traces into amdgpu and gpu_shed appear in the journal. Don't see a way of recovery from there except forcing the machine off.

I am seeing this same behavior with an XFX Swift 6600XT and kernel 5.15.7 ... didn't try to rollback the kernel nor any of the other power saving "optimizations" suggested in previous posts

What seems to have improved the situation is switching the "power profile" to "compute" in /sys/class/drm/card0/device/pp_power_profile_mode which appears to be blocking both gpu and memory to go below a certain frequency ... maybe that has some indirect consequences on power saving features

Offline

#13 2021-12-12 14:55:24

stanczew
Member
Registered: 2021-03-02
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I'm not sure if my issue is connected to various ones described in this thread, but it was so frustrating for me that I'll post my observations anyway – maybe they'll help someone.

I have a 5700 XT that I've been using for 1.5 years without many issues. However recently (about a month ago, similarly to OP) I started seeing much more frequent crashes/hangs. In some cases the screen turned black, but the PC could still be pinged and Magic SysRq keys worked. But most of the time the PC completely froze – not even SysRq could reboot it, I had to physically hit the reset switch. And worst of all, in such cases there were not even any logs in the journal.

Around that time Arch updated the main kernel from 5.14.16 to 5.15.2. So, I tried with a newer kernel (linux-mainline 5.16rc3), and with an older one (linux-lts 5.10.83), but in both cases the hangs still occurred.
I also tried linking this to Firefox, as I saw page crashes (mostly videos/streams) happen more often than usual, and most of the time when the PC hanged I was actively using Firefox. But I managed to catch a hang with Firefox closed, so that was not it.

Defeated, I started thinking about swapping my PC components one by one to see if it's related to any specific piece of hardware. But then I got one soft crash, and in the journal there were some logs about X crashing. This gave me an idea; I took another look at pacman.log and found this:

[2021-11-10T09:37:35+0100] [ALPM] upgraded xorg-server-common (1.20.13-3 -> 21.1.1-2)
[2021-11-10T09:37:35+0100] [ALPM] upgraded xorg-server (1.20.13-3 -> 21.1.1-2)

It's a major version update (which introduced some regressions), and this date is awfully close to the moment when I started seeing the hangs.
Since I wanted to try out Wayland for some time, I decided to just do it and see if it has any impact on the issue. I've been running it for 4 days now, and the hard hangs are gone. I had two soft lockups during this time, but they might just be different issues.

My suggestion is that if you are experiencing more frequent hangs since around a month ago, there is a possibility that it was introduced by update of xorg-server to 21.1.1.
To confirm/deny this, you would need to build the older version of xorg-server (1.20.13) and see if the problem persists there. (Personally I'm not interested in doing this, as I'm happily on Wayland now – and I'm just hoping I didn't speak too soon about there being no hangs here.)
Or maybe I'm completely wrong and I just got lucky. In any case, I guess it's worth trying if you're out of other options.

Offline

#14 2021-12-13 18:06:57

jnsgruk
Member
Registered: 2021-12-13
Posts: 1

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I've had two hard lock ups on a 3800X / 5700XT system today. Never had issues with either, but since 5.15.7 it looks like something is bork.

Running Plasma & Chrome both times it happened. Output from kernel rung buffer/dmesg is here: https://pastebin.com/ULeT68h2

Snippet here:

[ 4948.937249] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec timeout, signaled seq=419914, emitted seq=419916
[ 4948.937594] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process chrome pid 4894 thread chrome:cs0 pid 4914
[ 4948.937916] amdgpu 0000:2f:00.0: amdgpu: GPU reset begin!
[ 4952.937938] amdgpu 0000:2f:00.0: amdgpu: failed to suspend display audio
[ 4953.384351] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[ 4953.636890] [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000290 != 0x00000230
[ 4953.888882] [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002
[ 4953.915067] [drm] free PSP TMR buffer
[ 4953.948647] amdgpu 0000:2f:00.0: amdgpu: BACO reset
[ 4957.098186] amdgpu 0000:2f:00.0: amdgpu: GPU reset succeeded, trying to resume
[ 4957.098372] [drm] PCIE GART of 512M enabled (table at 0x0000008001FA4000).
[ 4957.098397] [drm] VRAM is lost due to GPU reset!
[ 4957.098691] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 4957.098974] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 4957.099142] [drm] PSP is resuming...
[ 4957.138749] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 4957.267584] [drm] reserve 0x900000 from 0x81f1800000 for PSP TMR
[ 4957.306834] amdgpu 0000:2f:00.0: amdgpu: RAS: optional ras ta ucode is not available
[ 4957.310998] amdgpu 0000:2f:00.0: amdgpu: RAP: optional rap ta ucode is not available
[ 4957.310999] amdgpu 0000:2f:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
[ 4957.311001] amdgpu 0000:2f:00.0: amdgpu: SMU is resuming...
[ 4957.313401] amdgpu 0000:2f:00.0: amdgpu: SMU is resumed successfully!
[ 4957.474956] [drm] kiq ring mec 2 pipe 1 q 0
[ 4957.475502] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 4957.475974] [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
[ 4957.476958] [drm] VCN decode and encode initialized successfully(under DPG Mode).
[ 4957.477295] [drm] JPEG decode initialized successfully.
[ 4957.477358] amdgpu 0000:2f:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
[ 4957.477360] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
[ 4957.477361] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
[ 4957.477362] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
[ 4957.477363] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
[ 4957.477364] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
[ 4957.477365] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
[ 4957.477366] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
[ 4957.477367] amdgpu 0000:2f:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
[ 4957.477368] amdgpu 0000:2f:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
[ 4957.477369] amdgpu 0000:2f:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
[ 4957.477370] amdgpu 0000:2f:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
[ 4957.477371] amdgpu 0000:2f:00.0: amdgpu: ring vcn_dec uses VM inv eng 0 on hub 1
[ 4957.477372] amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 1 on hub 1
[ 4957.477372] amdgpu 0000:2f:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 4 on hub 1
[ 4957.477373] amdgpu 0000:2f:00.0: amdgpu: ring jpeg_dec uses VM inv eng 5 on hub 1
[ 4957.480317] amdgpu 0000:2f:00.0: amdgpu: recover vram bo from shadow start
[ 4957.480386] amdgpu 0000:2f:00.0: amdgpu: recover vram bo from shadow done
[ 4957.480393] [drm] Skip scheduling IBs!
[ 4957.480394] [drm] Skip scheduling IBs!
[ 4957.480406] amdgpu 0000:2f:00.0: amdgpu: GPU reset(1) succeeded!

Have downgraded to using linux-lts for now

Offline

#15 2021-12-14 11:30:48

azurite27
Member
Registered: 2021-12-14
Posts: 4

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I have an 6600xt with i5-8600 CPU,using kde+sddm.
After logging in from sddm,everytime I tried to shutdown my computer in kde,the dp out loses signal,but the computer's power LED did not go out.
Even though I downgraded to  linux-lts,which is 5.10.x kernel,sddm won't appear.
If I use linux-hardened,which is 5.14.x kernel,it will enter kde and shutdown normally.

this is the log produced using latest kernel.(5.15.7 at the time of writing)

kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=437, emitted seq=438
Dec 14 19:11:18 Falcon-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Dec 14 19:11:18 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Dec 14 19:11:18 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:18 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:19 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:19 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:19 Falcon-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=357, emitted seq=358
Dec 14 19:11:19 Falcon-PC kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Dec 14 19:11:19 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
Dec 14 19:11:19 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: Bailing on TDR for s_job:165, as another already in progress
Dec 14 19:11:19 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:19 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:19 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:19 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:20 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:20 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:20 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:20 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:20 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:20 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:21 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:21 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:21 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:21 Falcon-PC kernel: [drm] perform_link_training_with_retries: Link training attempt 1 of 4 failed
Dec 14 19:11:30 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:30 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:30 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:30 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:30 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:31 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:31 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:31 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:31 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:31 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:32 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:32 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:32 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:32 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:32 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:32 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:33 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:33 Falcon-PC kernel: [drm] perform_link_training_with_retries: Link training attempt 2 of 4 failed
Dec 14 19:11:41 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:42 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:42 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:42 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:42 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:42 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:42 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:43 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:43 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:43 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:43 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:43 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:44 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:44 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:44 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:44 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:44 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:44 Falcon-PC kernel: [drm] perform_link_training_with_retries: Link training attempt 3 of 4 failed
Dec 14 19:11:53 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:53 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:53 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:54 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:54 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:54 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:54 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:54 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:55 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:55 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:55 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:55 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:55 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:55 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:56 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:56 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:56 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:11:56 Falcon-PC kernel: [drm] enabling link 1 failed: 15
Dec 14 19:12:00 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:12:05 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:12:07 Falcon-PC dbus-daemon[364]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.3' (uid=0 pid=365 comm="/usr/bin/NetworkManager --no-daemon ")
Dec 14 19:12:07 Falcon-PC dbus-daemon[364]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation, D-Bus is shutting down.
Dec 14 19:12:09 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:12:15 Falcon-PC kernel: [drm:dc_dmub_srv_wait_idle [amdgpu]] *ERROR* Error waiting for DMUB idle: status=3
Dec 14 19:12:17 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable gfxoff!
Dec 14 19:12:29 Falcon-PC kernel: [drm:drm_atomic_helper_wait_for_flip_done] *ERROR* [CRTC:72:crtc-0] flip_done timed out
Dec 14 19:12:32 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
Dec 14 19:12:33 Falcon-PC systemd[1]: sddm.service: State 'stop-sigterm' timed out. Killing.
Dec 14 19:12:33 Falcon-PC systemd[1]: sddm.service: Killing process 389 (sddm) with signal SIGKILL.
Dec 14 19:12:33 Falcon-PC systemd[1]: sddm.service: Killing process 985 (Xorg) with signal SIGKILL.
Dec 14 19:12:33 Falcon-PC systemd[1]: sddm.service: Killing process 402 (QDBusConnection) with signal SIGKILL.
Dec 14 19:12:33 Falcon-PC systemd[1]: sddm.service: Main process exited, code=killed, status=9/KILL
░░ Subject: Unit process exited
░░ Defined-By: systemd
░░ Support: https://lists.freedesktop.org/mailman/listinfo/systemd-devel
░░ 
░░ An ExecStart= process belonging to unit sddm.service has exited.
░░ 
░░ The process' exit code is 'killed' and its exit status is 9.
Dec 14 19:12:35 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
Dec 14 19:12:36 Falcon-PC kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Dec 14 19:12:36 Falcon-PC kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Dec 14 19:12:37 Falcon-PC kernel: amdgpu 0000:03:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Dec 14 19:12:37 Falcon-PC kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Dec 14 19:12:37 Falcon-PC kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Dec 14 19:12:40 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
Dec 14 19:12:40 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: Failed to disable smu features.
Dec 14 19:12:40 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: Fail to disable dpm features!
Dec 14 19:12:40 Falcon-PC kernel: [drm:amdgpu_device_ip_suspend_phase2 [amdgpu]] *ERROR* suspend of IP block <smu> failed -62
Dec 14 19:12:40 Falcon-PC kernel: [drm] free PSP TMR buffer
Dec 14 19:12:41 Falcon-PC kernel: [drm] psp gfx command DESTROY_TMR(0x7) failed and response status is (0x80000306)
Dec 14 19:12:41 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: MODE1 reset
Dec 14 19:12:41 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset
Dec 14 19:12:41 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU smu mode1 reset
Dec 14 19:12:44 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: SMU: I'm not done with your previous command!
Dec 14 19:12:44 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU mode1 reset failed
Dec 14 19:12:44 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:03:00.0
Dec 14 19:12:46 Falcon-PC kernel: [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
Dec 14 19:12:46 Falcon-PC kernel: [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CRTC:72:crtc-0] commit wait timed out
Dec 14 19:12:55 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
Dec 14 19:12:55 Falcon-PC kernel: [drm] PCIE GART of 512M enabled (table at 0x0000008000300000).
Dec 14 19:12:55 Falcon-PC kernel: [drm] VRAM is lost due to GPU reset!
Dec 14 19:12:55 Falcon-PC kernel: [drm] PSP is resuming...
Dec 14 19:12:56 Falcon-PC kernel: [drm] failed to load ucode SMC(0x18) 
Dec 14 19:12:56 Falcon-PC kernel: [drm] psp gfx command LOAD_IP_FW(0x6) failed and response status is (0x80000306)
Dec 14 19:12:56 Falcon-PC kernel: [drm] reserve 0xa00000 from 0x81fe000000 for PSP TMR
Dec 14 19:12:56 Falcon-PC kernel: [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
Dec 14 19:12:56 Falcon-PC kernel: [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [CONNECTOR:93:DP-2] commit wait timed out
Dec 14 19:12:57 Falcon-PC dbus-daemon[364]: [system] Activating via systemd: service name='org.freedesktop.nm_dispatcher' unit='dbus-org.freedesktop.nm-dispatcher.service' requested by ':1.3' (uid=0 pid=365 comm="/usr/bin/NetworkManager --no-daemon ")
Dec 14 19:12:57 Falcon-PC dbus-daemon[364]: [system] Activation via systemd failed for unit 'dbus-org.freedesktop.nm-dispatcher.service': Refusing activation, D-Bus is shutting down.
Dec 14 19:12:58 Falcon-PC kernel: [drm] psp gfx command AUTOLOAD_RLC(0x21) failed and response status is (0x0)
Dec 14 19:12:58 Falcon-PC kernel: [drm:psp_load_non_psp_fw [amdgpu]] *ERROR* Failed to start rlc autoload
Dec 14 19:12:58 Falcon-PC kernel: [drm:psp_resume [amdgpu]] *ERROR* PSP resume failed
Dec 14 19:12:58 Falcon-PC kernel: [drm:amdgpu_device_fw_loading [amdgpu]] *ERROR* resume of IP block <psp> failed -22
Dec 14 19:12:58 Falcon-PC kernel: [drm] Skip scheduling IBs!
Dec 14 19:12:58 Falcon-PC kernel: [drm] Skip scheduling IBs!
Dec 14 19:12:58 Falcon-PC kernel: [drm] Skip scheduling IBs!
Dec 14 19:12:58 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(1) failed
Dec 14 19:12:58 Falcon-PC kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset end with ret = -22
Dec 14 19:12:58 Falcon-PC kernel: snd_hda_intel 0000:03:00.1: refused to change power state from D0 to D3hot
Dec 14 19:13:06 Falcon-PC kernel: [drm:drm_crtc_commit_wait] *ERROR* flip_done timed out
Dec 14 19:13:06 Falcon-PC kernel: [drm:drm_atomic_helper_wait_for_dependencies] *ERROR* [PLANE:60:plane-4] commit wait timed out

E3-1231 v3, B85M-F, HD7850. KDE User.

Offline

#16 2021-12-21 10:06:10

giulivo
Member
Registered: 2014-01-05
Posts: 4

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

I am now testing same kernel but also applying https://patchwork.freedesktop.org/patch/466321/ , as suggested in https://gitlab.freedesktop.org/drm/amd/-/issues/1824 and I am not seeing the issue from a couple of days

Offline

#17 2022-01-13 01:26:20

prurigro
Member
Registered: 2008-03-14
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

So I've been testing 5.16.0 today and so far I haven't had any issues for almost 5 hours now, which is longer than my system ever lasted on 5.15. I also noticed that 5.16.0 includes the patch mentioned by @guilivo (https://patchwork.freedesktop.org/patch/466321/), which is promising. I'll report back if I do run into any lockups, or (hopefully) after a couple of days if everything continues working smoothly.

Offline

#18 2022-01-13 04:57:02

prurigro
Member
Registered: 2008-03-14
Posts: 16

Re: AMD RX 6700-XT GPU randomly turns screens black, requiring a reboot.

Well, I came back at the end of the evening and I could ssh into my computer but the display was dead- required a full reboot to fix. I guess 5.16.0 is no bueno for me sad

Offline

Board footer

Powered by FluxBB