You are not logged in.

#1 2021-05-31 08:06:08

nattravnen
Member
Registered: 2021-04-11
Posts: 11

AMD GPU black screen and reset

Hello,

I was using my machine and suddenly it went in a frozen state. After that I had a black screen with some colored pixel all around and I could still move the mouse cursor,  I have been able to log into tty  (the screen responded fine) and I sent a reboot. Looking at journalctl logs roughly when everything started I could see something correlated to my AMD GPU.

May 31 09:10:37 xunilhcra kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
May 31 09:10:37 xunilhcra kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=545189, emitted seq=545191
May 31 09:10:37 xunilhcra kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xorg pid 972 thread Xorg:cs0 pid 973
May 31 09:10:37 xunilhcra kernel: amdgpu 0000:0b:00.0: amdgpu: GPU reset begin!

Any help on this ?

Cheers

Offline

#2 2021-05-31 08:19:29

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 24,547

Re: AMD GPU black screen and reset

Which card and general system, which kernel? You'll probably want to look for similar issues mentioned in: https://gitlab.freedesktop.org/drm/amd/-/issues and follow there and/or create a new issue

Offline

#3 2021-06-01 10:07:31

nattravnen
Member
Registered: 2021-04-11
Posts: 11

Re: AMD GPU black screen and reset

My graphic card is an AMD Radeon RX 3600 XT Mech OC, kernel version 5.12.8.arch1-1. I'll have a look at https://gitlab.freedesktop.org/drm/amd/-/issues for similar issues otherwise I'm going to create one.

Thanks.

Offline

#4 2021-06-01 16:04:22

Commander
Member
Registered: 2011-02-12
Posts: 43

Re: AMD GPU black screen and reset

nattravnen wrote:

My graphic card is an AMD Radeon RX 3600 XT Mech OC, kernel version 5.12.8.arch1-1. I'll have a look at https://gitlab.freedesktop.org/drm/amd/-/issues for similar issues otherwise I'm going to create one.

Thanks.

Did you find anything good?

I started to have same issues on my 6900XT.
Just started like few days ago when running on the mesa/radeon-git. Went back to regular and same issues.

https://gist.github.com/CommanderAlchem … d09c89dbbb
https://gist.github.com/CommanderAlchem … b19d496439

Thought my card has died since it did not want to boot at all after a hang but after switching gpu and updating bios it suddenly woke to life but the hangs remains. Now unsure if its hardware issue that just suddenly came on or actual driver issue.

Offline

#5 2021-06-02 09:19:20

nattravnen
Member
Registered: 2021-04-11
Posts: 11

Re: AMD GPU black screen and reset

Hi,

I did find some posts, some of them are from 2016 they looks very similar on the issue I'm running into, however I'm not in a position to say that its indeed that the root cause. The only think I'm really certain of, every time this problem occurs it is when I'm using Firefox.

I'll do more research and I'll let you know. In case you know something please let's keep us update.

Thanks.

Offline

#6 2021-06-03 12:31:39

Supay
Member
From: Eastbourne, UK
Registered: 2015-10-08
Posts: 11

Re: AMD GPU black screen and reset

I have been having similar issues for a while on my 3400G but they were throwing me as they only ever occurred when playing Unity based games.  They occurred occasionally maybe around 1.5 years ago and went away for a while after a mesa package update but have returned with a vengeance for the last few months and I now am experiencing regular issues.  I have tried fresh installs but no change.  Initially they were black screen with instant system reboot, and seemed to be so sudden that I struggled to get anything from any logs or journalctl etc.  I have been playing with kernel parameters found elsewhere online and have managed to get it to a point where it doesn't always black screen reboot and instead sometimes flickers to a blackscreen but recovers or hangs but does not auto reboot and I am getting log data now.  Which also looks very similar to other Radeon users and amdgpu/mesa bugs that were common a few years back.  I have switched from x11 to Wayland and the same issue persists.

Below is when it flickers and recovers.

Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00141051
Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu:          MORE_FAULTS: 0x1
Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu:          WALKER_ERROR: 0x0
Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jun 03 13:17:19 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu:          RW: 0x1
Jun 03 13:17:21 james-desktop rtkit-daemon[569]: Supervising 9 threads of 5 processes of 1 users.
Jun 03 13:17:21 james-desktop rtkit-daemon[569]: Supervising 9 threads of 5 processes of 1 users.
Jun 03 13:17:21 james-desktop rtkit-daemon[569]: Supervising 9 threads of 5 processes of 1 users.
Jun 03 13:17:21 james-desktop rtkit-daemon[569]: Successfully made thread 1129 of process 1129 owned by '1000' high priority at nice level -10.
Jun 03 13:17:21 james-desktop rtkit-daemon[569]: Supervising 10 threads of 6 processes of 1 users.
Jun 03 13:17:22 james-desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=624, emitted seq=626
Jun 03 13:17:22 james-desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process steamwebhelper pid 900 thread steamwebhe:cs0 pid 921
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset begin!
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fac0 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fae0 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fb00 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fb20 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fb40 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fb60 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fb80 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fba0 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fbc0 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x114d2fbe0 flags=0x0070]
Jun 03 13:17:22 james-desktop kernel: [drm] free PSP TMR buffer
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: MODE2 reset
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 03 13:17:22 james-desktop kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
Jun 03 13:17:22 james-desktop kernel: [drm] PSP is resuming...
Jun 03 13:17:22 james-desktop kernel: [drm] reserve 0x400000 from 0xf40fc00000 for PSP TMR
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jun 03 13:17:22 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jun 03 13:17:23 james-desktop kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jun 03 13:17:23 james-desktop kernel: [drm] VCN decode and encode initialized successfully(under SPG Mode).
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: recover vram bo from shadow start
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: recover vram bo from shadow done
Jun 03 13:17:23 james-desktop kernel: [drm] Skip scheduling IBs!
Jun 03 13:17:23 james-desktop kernel: amdgpu 0000:30:00.0: amdgpu: GPU reset(2) succeeded!
Jun 03 13:17:23 james-desktop kernel: [drm] Skip scheduling IBs!
Jun 03 13:17:23 james-desktop kernel: [drm] Skip scheduling IBs!

And below is when it seems to just completely hang.

Jun 03 11:51:32 james-desktop kernel: watchdog: BUG: soft lockup - CPU#1 stuck for 328s! [BattleTech:8039]
Jun 03 11:51:32 james-desktop kernel: Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace sunrpc nfs_ssc fscache snd_seq_dummy snd_hrtimer snd_seq cfg80211 uvcvideo snd_usb_audio v>
Jun 03 11:51:32 james-desktop kernel:  gpio_amdpt mac_hid gpio_generic acpi_cpufreq drm fuse agpgart bpf_preload ip_tables x_tables btrfs blake2b_generic libcrc32c crc32c_generic xor xhci_pci raid6_pq crc32c_in>
Jun 03 11:51:32 james-desktop kernel: CPU: 1 PID: 8039 Comm: BattleTech Tainted: G             L    5.12.8-arch1-1 #1
Jun 03 11:51:32 james-desktop kernel: Hardware name: Micro-Star International Co., Ltd. MS-7A40/B450I GAMING PLUS AC (MS-7A40), BIOS A.F4 04/19/2021
Jun 03 11:51:32 james-desktop kernel: RIP: 0010:native_queued_spin_lock_slowpath+0x1ce/0x200
Jun 03 11:51:32 james-desktop kernel: Code: c1 ef 12 83 e0 03 83 ef 01 48 c1 e0 05 48 63 ff 48 05 80 d5 02 00 48 03 04 fd 00 e9 63 a8 48 89 08 8b 41 08 85 c0 75 09 f3 90 <8b> 41 08 85 c0 74 f7 48 8b 39 48 85 ff>
Jun 03 11:51:32 james-desktop kernel: RSP: 0018:ffffac6e8415bc88 EFLAGS: 00000246
Jun 03 11:51:32 james-desktop kernel: RAX: 0000000000000000 RBX: ffff95f329e58000 RCX: ffff95f32ee6d580
Jun 03 11:51:32 james-desktop kernel: RDX: ffff95f000c96444 RSI: 0000000000080000 RDI: 0000000000000003
Jun 03 11:51:32 james-desktop kernel: RBP: ffffac6e8415bcf0 R08: 0000000000080000 R09: 0000000000000000
Jun 03 11:51:32 james-desktop kernel: R10: 00007f4a7b9246f8 R11: 0000000000000000 R12: 00007f4a7b9246f8
Jun 03 11:51:32 james-desktop kernel: R13: ffffac6e8415bd40 R14: ffff95f000c96440 R15: ffffac6e8415bd78
Jun 03 11:51:32 james-desktop kernel: FS:  00007f4a23213640(0000) GS:ffff95f32ee40000(0000) knlGS:0000000000000000
Jun 03 11:51:32 james-desktop kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun 03 11:51:32 james-desktop kernel: CR2: 00007fa1481c1c00 CR3: 00000000514d6000 CR4: 00000000003506e0
Jun 03 11:51:32 james-desktop kernel: Call Trace:
Jun 03 11:51:32 james-desktop kernel:  _raw_spin_lock+0x21/0x30
Jun 03 11:51:32 james-desktop kernel:  futex_wait_setup+0x62/0xe0
Jun 03 11:51:32 james-desktop kernel:  futex_wait+0xe0/0x250
Jun 03 11:51:32 james-desktop kernel:  do_futex+0x180/0xb20
Jun 03 11:51:32 james-desktop kernel:  __do_sys_futex+0x90/0x1c0
Jun 03 11:51:32 james-desktop kernel:  do_syscall_64+0x33/0x40
Jun 03 11:51:32 james-desktop kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
Jun 03 11:51:32 james-desktop kernel: RIP: 0033:0x7f4a7dce48ca
Jun 03 11:51:32 james-desktop kernel: Code: 24 08 e8 a9 cb ff ff 4c 8b 54 24 18 45 31 c0 44 89 e2 89 c5 8b 74 24 08 48 8b 7c 24 10 41 b9 ff ff ff ff b8 ca 00 00 00 0f 05 <89> ef 48 89 44 24 08 e8 fa cb ff ff 48>
Jun 03 11:51:32 james-desktop kernel: RSP: 002b:00007f4a23212bf0 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
Jun 03 11:51:32 james-desktop kernel: RAX: ffffffffffffffda RBX: 00007f4a7b9246f8 RCX: 00007f4a7dce48ca
Jun 03 11:51:32 james-desktop kernel: RDX: 0000000000000000 RSI: 0000000000000189 RDI: 00007f4a7b9246f8
Jun 03 11:51:32 james-desktop kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000ffffffff
Jun 03 11:51:32 james-desktop kernel: R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
Jun 03 11:51:32 james-desktop kernel: R13: fffffffeffffffff R14: 00007f4a7b9246f8 R15: 000000000000000a

I have had it occur when Firefox is running as well as not running, so not sure whether that is a factor.

Last edited by Supay (2021-06-03 12:32:25)

Offline

Board footer

Powered by FluxBB