You are not logged in.

#1 2021-08-04 22:14:04

croxis
Member
Registered: 2015-01-29
Posts: 8

AMDGPU refuses to wake up after system sleep or hibernate

Linux 5.13.7-arch1-1 x86_64
Mesa 21.1.6
VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c4)   
CPU is Ryzen 1700

I am able to ssh in so it isn't a full system crash.

Kernel parameters

title   Arch Linux
linux   /vmlinuz-linux
initrd  /amd-ucode.img
initrd  /initramfs-linux.img
options root=PARTUUID=25d80518-3e72-402c-ace0-4f2de69ca92f rootflags=subvol=root_arch rw amd_iommu=on idle=nomwait resume=UUID=535a2ae2-f59b-47e2-abb7-da5423f0ae82

Here is the journalctl log for what I hope is the relevant section after a sleep attempt.

Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not available
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: RAP: optional rap ta ucode is not available
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: SMU is resuming...
Aug 04 14:56:51 babylon kernel: r8169 0000:2a:00.0 enp42s0: Link is Up - 1Gbps/Full - flow control rx/tx
Aug 04 14:56:51 babylon kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 04 14:56:51 babylon kernel: ata3.00: configured for UDMA/133
Aug 04 14:56:51 babylon kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 04 14:56:51 babylon kernel: ata1.00: configured for UDMA/133
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: message:          RunBtc (58)         param: 0x00000000 is timeout (no response)
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: RunBtc failed!
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: Failed to setup smc hw!
Aug 04 14:56:51 babylon kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Aug 04 14:56:51 babylon kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: PM: failed to resume async: error -62
Aug 04 14:56:51 babylon kernel: OOM killer enabled.
Aug 04 14:56:51 babylon kernel: Restarting tasks ... done.
Aug 04 14:56:51 babylon kernel: PM: suspend exit
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.643:240): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostna>
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.643:241): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=cups-browsed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? term>
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.650:242): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rpc-statd-notify comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? >
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.656:243): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? t>
Aug 04 14:56:51 babylon kernel: audit: type=1131 audit(1628114211.656:244): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? t>
Aug 04 14:56:51 babylon kernel: snd_hda_intel 0000:29:00.1: refused to change power state from D0 to D3hot
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=f>
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=cups-browsed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rpc-statd-notify comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon rtkit-daemon[1500]: The canary thread is apparently starving. Taking action.
Aug 04 14:56:51 babylon systemd-networkd-wait-online[1174]: Timeout occurred while waiting for network connectivity.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Demoting known real-time threads.
Aug 04 14:56:51 babylon systemd-resolved[1175]: Clock change detected. Flushing caches.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1507 of process 1490.
Aug 04 14:56:51 babylon systemd-sleep[6756]: System returned from sleep state.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1490 of process 1490.
Aug 04 14:56:51 babylon systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1505 of process 1491.
Aug 04 14:56:51 babylon systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1491 of process 1491.
Aug 04 14:56:51 babylon systemd[1]: Failed to start Wait for Network to be Configured.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1506 of process 1492.
Aug 04 14:56:51 babylon systemd[1]: Reached target Network is Online.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1492 of process 1492.
Aug 04 14:56:51 babylon systemd[1]: Started Make remote CUPS printers available locally.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Demoted 6 threads.
Aug 04 14:56:51 babylon systemd[1]: Reached target Multi-User System.
Aug 04 14:56:51 babylon sm-notify[7113]: Version 2.5.4 starting
Aug 04 14:56:51 babylon systemd[1]: Reached target Graphical Interface.
Aug 04 14:56:51 babylon avahi-daemon[1057]: Withdrawing address record for 192.168.1.180 on enp42s0.
Aug 04 14:56:51 babylon systemd[1]: Starting Notify NFS peers of a restart...
Aug 04 14:56:51 babylon avahi-daemon[1057]: Leaving mDNS multicast group on interface enp42s0.IPv4 with address 192.168.1.180.
Aug 04 14:56:51 babylon systemd[1]: Started Notify NFS peers of a restart.
Aug 04 14:56:51 babylon avahi-daemon[1057]: Interface enp42s0.IPv4 no longer relevant for mDNS.
Aug 04 14:56:51 babylon systemd[1]: systemd-suspend.service: Deactivated successfully.
Aug 04 14:56:51 babylon systemd[1]: Finished System Suspend.
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb4/4-2
Aug 04 14:56:51 babylon systemd[1]: Stopped target Sleep.
Aug 04 14:56:51 babylon systemd[1]: Reached target Suspend.
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb3/3-2
Aug 04 14:56:51 babylon systemd[1]: Startup finished in 16.597s (firmware) + 3.211s (loader) + 4.055s (kernel) + 1.497s (initrd) + 2min 7.562s (userspace) = 2min 32.924s.
Aug 04 14:56:51 babylon systemd[1]: Stopped target Suspend.
Aug 04 14:56:51 babylon systemd-logind[1072]: Operation 'sleep' finished.
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-1
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb4/4-2
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb3/3-2
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: DHCP lease lost
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: DHCPv6 lease lost
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: Reset carrier
Aug 04 14:56:51 babylon systemd-networkd[1173]: lo: Reset carrier
Aug 04 14:56:51 babylon akonadi_davgroupware_resource[3586]: org.kde.pim.davresource: Unable to fetch collections 300 "There was a problem with the request.\nHTTP error (0)."
Aug 04 14:56:51 babylon akonadi_davgroupware_resource[3587]: org.kde.pim.davresource: Unable to fetch collections 300 "There was a problem with the request.\nHTTP error (0)."
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-1
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: DHCPv4 address 192.168.1.180/22 via 192.168.0.1
Aug 04 14:56:51 babylon avahi-daemon[1057]: Joining mDNS multicast group on interface enp42s0.IPv4 with address 192.168.1.180.
Aug 04 14:56:51 babylon avahi-daemon[1057]: New relevant interface enp42s0.IPv4 for mDNS.
Aug 04 14:56:51 babylon avahi-daemon[1057]: Registering new address record for 192.168.1.180 on enp42s0.IPv4.
Aug 04 14:56:52 babylon systemd-resolved[1175]: Clock change detected. Flushing caches.
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:53 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:53 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Deactivated successfully.
Aug 04 14:56:59 babylon audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Unit process 1308 (dnsmasq) remains running after unit stopped.
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Unit process 1309 (dnsmasq) remains running after unit stopped.
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Unit process 1339 (mount.ntfs) remains running after unit stopped.
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Consumed 1.550s CPU time.
Aug 04 14:56:59 babylon kernel: audit: type=1131 audit(1628114219.868:245): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal>
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=669, emitted seq=671
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2252, emitted seq=2254
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 04 14:57:02 babylon kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Aug 04 14:57:02 babylon kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Aug 04 14:57:02 babylon kernel: amdgpu 0000:29:00.0: amdgpu: Bailing on TDR for s_job:643, as another already in progress
Aug 04 14:57:02 babylon kernel: BUG: unable to handle page fault for address: ffff8f8ed0238000
Aug 04 14:57:02 babylon kernel: #PF: supervisor write access in kernel mode
Aug 04 14:57:03 babylon kernel: #PF: error_code(0x0003) - permissions violation
Aug 04 14:57:03 babylon kernel: PGD 57a601067 P4D 57a601067 PUD 100fae063 PMD 1103dc063 PTE 8000000110238161
Aug 04 14:57:03 babylon kernel: Oops: 0003 [#1] PREEMPT SMP NOPTI
Aug 04 14:57:03 babylon kernel: CPU: 10 PID: 372 Comm: kworker/10:2 Tainted: P        W  OE     5.13.7-arch1-1 #1
Aug 04 14:57:03 babylon kernel: Hardware name: Micro-Star International Co., Ltd. MS-7A32/X370 GAMING PRO CARBON (MS-7A32), BIOS 1.NX 07/07/2020
Aug 04 14:57:03 babylon kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Aug 04 14:57:03 babylon kernel: RIP: 0010:kfd_gtt_sa_free+0x39/0x80 [amdgpu]
Aug 04 14:57:03 babylon kernel: Code: f5 53 48 89 fb 0f 1f 44 00 00 4c 8d a3 70 01 00 00 4c 89 e7 e8 68 86 2d de 8b 45 00 3b 45 04 77 16 48 8b 93 68 01 00 00 89 c1 <f0> 48 0f b3 0a 83 c0 01 39 45 04 73 ea 4c 89>
Aug 04 14:57:03 babylon kernel: RSP: 0018:ffffa921816f7d10 EFLAGS: 00010282
Aug 04 14:57:03 babylon kernel: RAX: 000000000803e000 RBX: ffff8f8ecc3c8000 RCX: 000000000803e000
Aug 04 14:57:03 babylon kernel: RDX: ffff8f8ecf230400 RSI: ffff8f8ecce14660 RDI: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: RBP: ffff8f8ecce14660 R08: 0000000000000000 R09: ffffa921816f7d98
Aug 04 14:57:03 babylon kernel: R10: ffffffffffffffff R11: ffffffffffffffff R12: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: R13: 0000000000000000 R14: ffff8f8ec180b000 R15: ffff8f8ec180b0c8
Aug 04 14:57:03 babylon kernel: FS:  0000000000000000(0000) GS:ffff8f95aec80000(0000) knlGS:0000000000000000
Aug 04 14:57:03 babylon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 04 14:57:03 babylon kernel: CR2: ffff8f8ed0238000 CR3: 00000001c70d0000 CR4: 00000000003506e0
Aug 04 14:57:03 babylon kernel: Call Trace:
Aug 04 14:57:03 babylon kernel:  stop_cpsch+0x94/0xc0 [amdgpu]
Aug 04 14:57:03 babylon kernel:  kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
Aug 04 14:57:03 babylon kernel:  kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
Aug 04 14:57:03 babylon kernel:  amdgpu_device_gpu_recover.cold+0x291/0x865 [amdgpu]
Aug 04 14:57:03 babylon kernel:  amdgpu_job_timedout+0x12d/0x150 [amdgpu]
Aug 04 14:57:03 babylon kernel:  drm_sched_job_timedout+0x70/0xf0 [gpu_sched]
Aug 04 14:57:03 babylon kernel:  process_one_work+0x1e3/0x3b0
Aug 04 14:57:03 babylon kernel:  worker_thread+0x50/0x3b0
Aug 04 14:57:03 babylon kernel:  ? process_one_work+0x3b0/0x3b0
Aug 04 14:57:03 babylon kernel:  kthread+0x133/0x160
Aug 04 14:57:03 babylon kernel:  ? set_kthread_struct+0x40/0x40
Aug 04 14:57:03 babylon kernel:  ret_from_fork+0x22/0x30
Aug 04 14:57:03 babylon kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nft_chain_nat nf_nat bridge stp llc nft_reject_>
Aug 04 14:57:03 babylon kernel:  snd_timer mdio_devres igb sp5100_tco snd libphy pcspkr k10temp i2c_piix4 joydev mc soundcore dca wmi gpio_amdpt pinctrl_amd gpio_generic mac_hid acpi_cpufreq zfs(POE) zunicode(P>
Aug 04 14:57:03 babylon kernel:  snd_timer mdio_devres igb sp5100_tco snd libphy pcspkr k10temp i2c_piix4 joydev mc soundcore dca wmi gpio_amdpt pinctrl_amd gpio_generic mac_hid acpi_cpufreq zfs(POE) zunicode(P>
Aug 04 14:57:03 babylon kernel: CR2: ffff8f8ed0238000
Aug 04 14:57:03 babylon kernel: ---[ end trace ff429f9ab37d5705 ]---
Aug 04 14:57:03 babylon kernel: RIP: 0010:kfd_gtt_sa_free+0x39/0x80 [amdgpu]
Aug 04 14:57:03 babylon kernel: Code: f5 53 48 89 fb 0f 1f 44 00 00 4c 8d a3 70 01 00 00 4c 89 e7 e8 68 86 2d de 8b 45 00 3b 45 04 77 16 48 8b 93 68 01 00 00 89 c1 <f0> 48 0f b3 0a 83 c0 01 39 45 04 73 ea 4c 89>
Aug 04 14:57:03 babylon kernel: RSP: 0018:ffffa921816f7d10 EFLAGS: 00010282
Aug 04 14:57:03 babylon kernel: RAX: 000000000803e000 RBX: ffff8f8ecc3c8000 RCX: 000000000803e000
Aug 04 14:57:03 babylon kernel: RDX: ffff8f8ecf230400 RSI: ffff8f8ecce14660 RDI: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: RBP: ffff8f8ecce14660 R08: 0000000000000000 R09: ffffa921816f7d98
Aug 04 14:57:03 babylon kernel: R10: ffffffffffffffff R11: ffffffffffffffff R12: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: R13: 0000000000000000 R14: ffff8f8ec180b000 R15: ffff8f8ec180b0c8
Aug 04 14:57:03 babylon kernel: FS:  0000000000000000(0000) GS:ffff8f95aec80000(0000) knlGS:0000000000000000
Aug 04 14:57:03 babylon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 04 14:57:03 babylon kernel: CR2: ffff8f8ed0238000 CR3: 00000001c70d0000 CR4: 00000000003506e0
Aug 04 14:57:12 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2252, emitted seq=2254
Aug 04 14:57:12 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 04 14:57:12 babylon kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Aug 04 14:57:12 babylon kernel: amdgpu 0000:29:00.0: amdgpu: Bailing on TDR for s_job:644, as another already in progress

Offline

#2 2021-08-11 20:37:21

johnny.honu
Member
Registered: 2021-04-18
Posts: 17

Re: AMDGPU refuses to wake up after system sleep or hibernate

Same problem here on my Asus ZenBook 14.

Offline

#3 2021-09-02 08:00:27

coxackie
Member
Registered: 2019-11-10
Posts: 8

Re: AMDGPU refuses to wake up after system sleep or hibernate

I have the same problem on Alienware Aurora R10 - AMD Radeon RX 5700 GPU

Logs:

```
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: message:          RunBtc (58)         param: 0x00000000 is timeout (no response)
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: RunBtc failed!
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
Sep 01 17:18:23 archer kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Sep 01 17:18:23 archer kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: PM: failed to resume async: error -62
Sep 01 17:18:23 archer kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 01 17:18:23 archer kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Sep 01 17:18:23 archer kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 01 17:18:23 archer kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Sep 01 17:18:23 archer kernel: amdgpu: Move buffer fallback to memcpy unavailable
```
If anyone has any solution, please share!

Offline

#4 2022-01-18 11:00:54

zesko
Member
Registered: 2021-08-22
Posts: 11

Re: AMDGPU refuses to wake up after system sleep or hibernate

I have the same issue.
Linux Kernel: 5.16.1
GPU: RX 5700 connects to dual monitors via DisplayPorts.

Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: amdgpu: RunBtc failed!
Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: amdgpu: Failed to setup smc hw!
Jan 18 11:45:41 zesko kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Jan 18 11:45:41 zesko kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: PM: failed to resume async: error -62
Jan 18 11:45:41 zesko kernel: amdgpu: Move buffer fallback to memcpy unavailable
Jan 18 11:45:41 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Jan 18 11:45:42 zesko kernel: amdgpu: Move buffer fallback to memcpy unavailable
Jan 18 11:45:42 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2405, emitted seq=2407
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=716, emitted seq=718
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 18 11:45:52 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 18 11:45:52 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Jan 18 11:45:52 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 18 11:45:52 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jan 18 11:45:52 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Jan 18 11:45:57 zesko kernel: amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000
Jan 18 11:45:57 zesko kernel: amdgpu 0000:09:00.0: amdgpu: Failed to enter BACO state!
Jan 18 11:45:57 zesko kernel: amdgpu 0000:09:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:09:00.0
Jan 18 11:46:02 zesko kernel: amdgpu 0000:09:00.0: amdgpu: RunBtc failed!
Jan 18 11:46:02 zesko kernel: amdgpu 0000:09:00.0: amdgpu: Failed to setup smc hw!
Jan 18 11:46:02 zesko kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62

Last edited by zesko (2022-01-18 11:06:35)

Offline

#5 2022-01-18 15:53:26

doctorvi
Member
Registered: 2022-01-18
Posts: 1

Re: AMDGPU refuses to wake up after system sleep or hibernate

Same problem here on a HP laptop with ryzen 4700U on 5.16.1-arch1-1

Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d0000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d1000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d3000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d5000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
... Lot of the same 
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jan 18 16:32:55 Nudel systemd-logind[481]: Lid opened.
Jan 18 16:32:57 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1768, emitted seq=1771
Jan 18 16:32:57 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Jan 18 16:32:57 Nudel kernel: [drm] free PSP TMR buffer
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: MODE2 reset
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 18 16:32:57 Nudel kernel: [drm] PCIE GART of 1024M enabled.
Jan 18 16:32:57 Nudel kernel: [drm] PTB located at 0x000000F400900000
Jan 18 16:32:57 Nudel kernel: [drm] PSP is resuming...
Jan 18 16:32:57 Nudel kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: dpm has been disabled
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
Jan 18 16:32:57 Nudel kernel: [drm] DMUB hardware initialized: version=0x0101001C
Jan 18 16:32:58 Nudel kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jan 18 16:32:58 Nudel kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 18 16:32:58 Nudel kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
Jan 18 16:32:58 Nudel kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Jan 18 16:32:58 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(3) failed
Jan 18 16:32:58 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -110
Jan 18 16:33:08 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1771, emitted seq=1771
Jan 18 16:33:08 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303
Jan 18 16:33:08 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!

Offline

#6 2022-04-19 11:33:36

ts
Member
Registered: 2021-04-22
Posts: 10

Re: AMDGPU refuses to wake up after system sleep or hibernate

This just started happening to me on my LTS kernel (5.15.34-1-lts) that I updated on 4/18/2022.  I am passing along to assist with more details.

 
arch kernel: PM: Some devices failed to suspend, or early wake event detected
Apr 19 06:07:38 kernel: amdgpu 0000:0b:00.0: PM: failed to suspend async: error -5
Apr 19 06:07:38  kernel: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
Apr 19 06:07:38 kernel: PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x70 [amdgpu] returns -5
Apr 19 06:07:38 kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to enter BACO state!

My system info is as follows

 Kernel: 5.15.34-1-lts arch: x86_64 bits: 64 Desktop: KDE Plasma v: 5.24.4
    Distro: Arch Linux
Machine:
  Type: Desktop Mobo: ASUSTeK model: TUF GAMING X570-PLUS (WI-FI) v: Rev X.0x
    serial: <superuser required> UEFI: American Megatrends v: 4021
    date: 08/10/2021
CPU:
  Info: 8-core model: AMD Ryzen 7 3800X bits: 64 type: MT MCP cache:
    L2: 4 MiB
  Speed (MHz): avg: 2300 min/max: 2200/4559 cores: 1: 2050 2: 2053 3: 2197
    4: 2196 5: 2795 6: 1861 7: 1862 8: 1862 9: 2196 10: 2196 11: 3588 12: 2051
    13: 2396 14: 2052 15: 3590 16: 1870
Graphics:
  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
    driver: amdgpu v: kernel
  Display: x11 server: X.Org v: 21.1.3 with: Xwayland v: 22.1.1 driver: X:
    loaded: modesetting unloaded: vesa gpu: amdgpu resolution: 3440x1440~60Hz
  OpenGL:
    renderer: AMD Radeon RX 5600 XT (navi10 LLVM 13.0.1 DRM 3.42 5.15.34-1-lts)
    v: 4.6 Mesa 22.0.1
Audio:
  Device-1: AMD Navi 10 HDMI Audio driver: snd_hda_intel
  Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel
  Sound Server-1: ALSA v: k5.15.34-1-lts running: yes
  Sound Server-2: PipeWire v: 0.3.50 running: yes
Network:
  Device-1: Intel Wireless-AC 9260 driver: iwlwifi
  IF: wlp4s0 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    driver: r8169
  IF: enp5s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: wg-mullvad state: unknown speed: N/A duplex: N/A mac: N/A
Bluetooth:
  Device-1: Intel Wireless-AC 9260 Bluetooth Adapter type: USB driver: btusb
  Report: rfkill ID: hci0 state: up address: see --recommends
Drives:
  Local Storage: total: 2.69 TiB used: 922.26 GiB (33.4%)
  ID-1: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
  ID-2: /dev/sdb vendor: Kingston model: SA400S37480G size: 447.13 GiB
  ID-3: /dev/sdc vendor: Kingston model: SA400S37480G size: 447.13 GiB
Partition:
  ID-1: / size: 402.57 GiB used: 180.77 GiB (44.9%) fs: ext4 dev: /dev/sdb2
  ID-2: /boot size: 511 MiB used: 84.8 MiB (16.6%) fs: vfat dev: /dev/sdb1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 51.0 C
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info:
  Processes: 350 Uptime: 18m Memory: 62.78 GiB used: 3.85 GiB (6.1%)
  Shell: Zsh inxi: 3.3.15

Offline

#7 2022-04-19 13:49:32

seth
Member
Registered: 2012-09-03
Posts: 59,345

Re: AMDGPU refuses to wake up after system sleep or hibernate

amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 pcie_aspm=off

Possibly

amdgpu.dpm=0

but that's otr for causing the GPU to fail to power up entirely.

https://wiki.archlinux.org/title/Kernel_parameters - I recommend to transiently test them by editing the commandline at the bootloader, not to alter the configuration statically (because of the dpm situation)

Online

#8 2022-04-23 22:58:11

ts
Member
Registered: 2021-04-22
Posts: 10

Re: AMDGPU refuses to wake up after system sleep or hibernate

I updated to the current stable release and the problem exists still..

For me now, bluetooth is throwing errors and running the below is a workaround

 
sudo systemctl stop bluetooth.service
sudo systemctl disable bluetooth.service

Offline

#9 2023-02-27 02:47:56

jonathannerat
Member
Registered: 2023-02-27
Posts: 1

Re: AMDGPU refuses to wake up after system sleep or hibernate

Just sharing in case it helps someone. In my case, suspend worked, but hibernate didn't. Sometimes it turns off my machine, sometimes it doesn't (just screen off, but power indicator is still on), but after waking up from hibernate, it shows the desktop for a moment, and then exits to my login manager.

I managed to get it working by removing the `kms` hook from the `HOOKS` array in `mkinitcpio.conf` and regenerating the initramfs (`sudo mkinitcpio -P`)

Offline

#10 2023-02-27 06:50:12

seth
Member
Registered: 2012-09-03
Posts: 59,345

Re: AMDGPU refuses to wake up after system sleep or hibernate

Do you have a hybrid graphics system? (lspci)

Online

Board footer

Powered by FluxBB