You are not logged in.

#1 2021-08-04 22:14:04

croxis
Member
Registered: 2015-01-29
Posts: 8

AMDGPU refuses to wake up after system sleep or hibernate

Linux 5.13.7-arch1-1 x86_64
Mesa 21.1.6
VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT] (rev c4)   
CPU is Ryzen 1700

I am able to ssh in so it isn't a full system crash.

Kernel parameters

title   Arch Linux
linux   /vmlinuz-linux
initrd  /amd-ucode.img
initrd  /initramfs-linux.img
options root=PARTUUID=25d80518-3e72-402c-ace0-4f2de69ca92f rootflags=subvol=root_arch rw amd_iommu=on idle=nomwait resume=UUID=535a2ae2-f59b-47e2-abb7-da5423f0ae82

Here is the journalctl log for what I hope is the relevant section after a sleep attempt.

Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: RAS: optional ras ta ucode is not available
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: RAP: optional rap ta ucode is not available
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: SMU is resuming...
Aug 04 14:56:51 babylon kernel: r8169 0000:2a:00.0 enp42s0: Link is Up - 1Gbps/Full - flow control rx/tx
Aug 04 14:56:51 babylon kernel: ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 04 14:56:51 babylon kernel: ata3.00: configured for UDMA/133
Aug 04 14:56:51 babylon kernel: ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
Aug 04 14:56:51 babylon kernel: ata1.00: configured for UDMA/133
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: message:          RunBtc (58)         param: 0x00000000 is timeout (no response)
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: RunBtc failed!
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: Failed to setup smc hw!
Aug 04 14:56:51 babylon kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Aug 04 14:56:51 babylon kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Aug 04 14:56:51 babylon kernel: amdgpu 0000:29:00.0: PM: failed to resume async: error -62
Aug 04 14:56:51 babylon kernel: OOM killer enabled.
Aug 04 14:56:51 babylon kernel: Restarting tasks ... done.
Aug 04 14:56:51 babylon kernel: PM: suspend exit
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.643:240): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostna>
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.643:241): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=cups-browsed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? term>
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.650:242): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rpc-statd-notify comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? >
Aug 04 14:56:51 babylon kernel: audit: type=1130 audit(1628114211.656:243): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? t>
Aug 04 14:56:51 babylon kernel: audit: type=1131 audit(1628114211.656:244): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? t>
Aug 04 14:56:51 babylon kernel: snd_hda_intel 0000:29:00.1: refused to change power state from D0 to D3hot
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-networkd-wait-online comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=f>
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=cups-browsed comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=rpc-statd-notify comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=systemd-suspend comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:51 babylon rtkit-daemon[1500]: The canary thread is apparently starving. Taking action.
Aug 04 14:56:51 babylon systemd-networkd-wait-online[1174]: Timeout occurred while waiting for network connectivity.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Demoting known real-time threads.
Aug 04 14:56:51 babylon systemd-resolved[1175]: Clock change detected. Flushing caches.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1507 of process 1490.
Aug 04 14:56:51 babylon systemd-sleep[6756]: System returned from sleep state.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1490 of process 1490.
Aug 04 14:56:51 babylon systemd[1]: systemd-networkd-wait-online.service: Main process exited, code=exited, status=1/FAILURE
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1505 of process 1491.
Aug 04 14:56:51 babylon systemd[1]: systemd-networkd-wait-online.service: Failed with result 'exit-code'.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1491 of process 1491.
Aug 04 14:56:51 babylon systemd[1]: Failed to start Wait for Network to be Configured.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1506 of process 1492.
Aug 04 14:56:51 babylon systemd[1]: Reached target Network is Online.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Successfully demoted thread 1492 of process 1492.
Aug 04 14:56:51 babylon systemd[1]: Started Make remote CUPS printers available locally.
Aug 04 14:56:51 babylon rtkit-daemon[1500]: Demoted 6 threads.
Aug 04 14:56:51 babylon systemd[1]: Reached target Multi-User System.
Aug 04 14:56:51 babylon sm-notify[7113]: Version 2.5.4 starting
Aug 04 14:56:51 babylon systemd[1]: Reached target Graphical Interface.
Aug 04 14:56:51 babylon avahi-daemon[1057]: Withdrawing address record for 192.168.1.180 on enp42s0.
Aug 04 14:56:51 babylon systemd[1]: Starting Notify NFS peers of a restart...
Aug 04 14:56:51 babylon avahi-daemon[1057]: Leaving mDNS multicast group on interface enp42s0.IPv4 with address 192.168.1.180.
Aug 04 14:56:51 babylon systemd[1]: Started Notify NFS peers of a restart.
Aug 04 14:56:51 babylon avahi-daemon[1057]: Interface enp42s0.IPv4 no longer relevant for mDNS.
Aug 04 14:56:51 babylon systemd[1]: systemd-suspend.service: Deactivated successfully.
Aug 04 14:56:51 babylon systemd[1]: Finished System Suspend.
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb4/4-2
Aug 04 14:56:51 babylon systemd[1]: Stopped target Sleep.
Aug 04 14:56:51 babylon systemd[1]: Reached target Suspend.
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb3/3-2
Aug 04 14:56:51 babylon systemd[1]: Startup finished in 16.597s (firmware) + 3.211s (loader) + 4.055s (kernel) + 1.497s (initrd) + 2min 7.562s (userspace) = 2min 32.924s.
Aug 04 14:56:51 babylon systemd[1]: Stopped target Suspend.
Aug 04 14:56:51 babylon systemd-logind[1072]: Operation 'sleep' finished.
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-1
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb4/4-2
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.2/0000:20:08.0/0000:26:00.0/usb3/3-2
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: DHCP lease lost
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: DHCPv6 lease lost
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: Reset carrier
Aug 04 14:56:51 babylon systemd-networkd[1173]: lo: Reset carrier
Aug 04 14:56:51 babylon akonadi_davgroupware_resource[3586]: org.kde.pim.davresource: Unable to fetch collections 300 "There was a problem with the request.\nHTTP error (0)."
Aug 04 14:56:51 babylon akonadi_davgroupware_resource[3587]: org.kde.pim.davresource: Unable to fetch collections 300 "There was a problem with the request.\nHTTP error (0)."
Aug 04 14:56:51 babylon upowerd[1450]: treating change event as add on /sys/devices/pci0000:00/0000:00:01.3/0000:03:00.0/usb1/1-1
Aug 04 14:56:51 babylon systemd-networkd[1173]: enp42s0: DHCPv4 address 192.168.1.180/22 via 192.168.0.1
Aug 04 14:56:51 babylon avahi-daemon[1057]: Joining mDNS multicast group on interface enp42s0.IPv4 with address 192.168.1.180.
Aug 04 14:56:51 babylon avahi-daemon[1057]: New relevant interface enp42s0.IPv4 for mDNS.
Aug 04 14:56:51 babylon avahi-daemon[1057]: Registering new address record for 192.168.1.180 on enp42s0.IPv4.
Aug 04 14:56:52 babylon systemd-resolved[1175]: Clock change detected. Flushing caches.
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:52 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:52 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:53 babylon kernel: amdgpu: Move buffer fallback to memcpy unavailable
Aug 04 14:56:53 babylon kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Deactivated successfully.
Aug 04 14:56:59 babylon audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Unit process 1308 (dnsmasq) remains running after unit stopped.
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Unit process 1309 (dnsmasq) remains running after unit stopped.
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Unit process 1339 (mount.ntfs) remains running after unit stopped.
Aug 04 14:56:59 babylon systemd[1]: libvirtd.service: Consumed 1.550s CPU time.
Aug 04 14:56:59 babylon kernel: audit: type=1131 audit(1628114219.868:245): pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=libvirtd comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal>
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=669, emitted seq=671
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2252, emitted seq=2254
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 04 14:57:02 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 04 14:57:02 babylon kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Aug 04 14:57:02 babylon kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Aug 04 14:57:02 babylon kernel: amdgpu 0000:29:00.0: amdgpu: Bailing on TDR for s_job:643, as another already in progress
Aug 04 14:57:02 babylon kernel: BUG: unable to handle page fault for address: ffff8f8ed0238000
Aug 04 14:57:02 babylon kernel: #PF: supervisor write access in kernel mode
Aug 04 14:57:03 babylon kernel: #PF: error_code(0x0003) - permissions violation
Aug 04 14:57:03 babylon kernel: PGD 57a601067 P4D 57a601067 PUD 100fae063 PMD 1103dc063 PTE 8000000110238161
Aug 04 14:57:03 babylon kernel: Oops: 0003 [#1] PREEMPT SMP NOPTI
Aug 04 14:57:03 babylon kernel: CPU: 10 PID: 372 Comm: kworker/10:2 Tainted: P        W  OE     5.13.7-arch1-1 #1
Aug 04 14:57:03 babylon kernel: Hardware name: Micro-Star International Co., Ltd. MS-7A32/X370 GAMING PRO CARBON (MS-7A32), BIOS 1.NX 07/07/2020
Aug 04 14:57:03 babylon kernel: Workqueue: events drm_sched_job_timedout [gpu_sched]
Aug 04 14:57:03 babylon kernel: RIP: 0010:kfd_gtt_sa_free+0x39/0x80 [amdgpu]
Aug 04 14:57:03 babylon kernel: Code: f5 53 48 89 fb 0f 1f 44 00 00 4c 8d a3 70 01 00 00 4c 89 e7 e8 68 86 2d de 8b 45 00 3b 45 04 77 16 48 8b 93 68 01 00 00 89 c1 <f0> 48 0f b3 0a 83 c0 01 39 45 04 73 ea 4c 89>
Aug 04 14:57:03 babylon kernel: RSP: 0018:ffffa921816f7d10 EFLAGS: 00010282
Aug 04 14:57:03 babylon kernel: RAX: 000000000803e000 RBX: ffff8f8ecc3c8000 RCX: 000000000803e000
Aug 04 14:57:03 babylon kernel: RDX: ffff8f8ecf230400 RSI: ffff8f8ecce14660 RDI: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: RBP: ffff8f8ecce14660 R08: 0000000000000000 R09: ffffa921816f7d98
Aug 04 14:57:03 babylon kernel: R10: ffffffffffffffff R11: ffffffffffffffff R12: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: R13: 0000000000000000 R14: ffff8f8ec180b000 R15: ffff8f8ec180b0c8
Aug 04 14:57:03 babylon kernel: FS:  0000000000000000(0000) GS:ffff8f95aec80000(0000) knlGS:0000000000000000
Aug 04 14:57:03 babylon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 04 14:57:03 babylon kernel: CR2: ffff8f8ed0238000 CR3: 00000001c70d0000 CR4: 00000000003506e0
Aug 04 14:57:03 babylon kernel: Call Trace:
Aug 04 14:57:03 babylon kernel:  stop_cpsch+0x94/0xc0 [amdgpu]
Aug 04 14:57:03 babylon kernel:  kgd2kfd_suspend.part.0+0x2f/0x40 [amdgpu]
Aug 04 14:57:03 babylon kernel:  kgd2kfd_pre_reset+0x3f/0x50 [amdgpu]
Aug 04 14:57:03 babylon kernel:  amdgpu_device_gpu_recover.cold+0x291/0x865 [amdgpu]
Aug 04 14:57:03 babylon kernel:  amdgpu_job_timedout+0x12d/0x150 [amdgpu]
Aug 04 14:57:03 babylon kernel:  drm_sched_job_timedout+0x70/0xf0 [gpu_sched]
Aug 04 14:57:03 babylon kernel:  process_one_work+0x1e3/0x3b0
Aug 04 14:57:03 babylon kernel:  worker_thread+0x50/0x3b0
Aug 04 14:57:03 babylon kernel:  ? process_one_work+0x3b0/0x3b0
Aug 04 14:57:03 babylon kernel:  kthread+0x133/0x160
Aug 04 14:57:03 babylon kernel:  ? set_kthread_struct+0x40/0x40
Aug 04 14:57:03 babylon kernel:  ret_from_fork+0x22/0x30
Aug 04 14:57:03 babylon kernel: Modules linked in: rfcomm snd_seq_dummy snd_hrtimer snd_seq xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat nft_chain_nat nf_nat bridge stp llc nft_reject_>
Aug 04 14:57:03 babylon kernel:  snd_timer mdio_devres igb sp5100_tco snd libphy pcspkr k10temp i2c_piix4 joydev mc soundcore dca wmi gpio_amdpt pinctrl_amd gpio_generic mac_hid acpi_cpufreq zfs(POE) zunicode(P>
Aug 04 14:57:03 babylon kernel:  snd_timer mdio_devres igb sp5100_tco snd libphy pcspkr k10temp i2c_piix4 joydev mc soundcore dca wmi gpio_amdpt pinctrl_amd gpio_generic mac_hid acpi_cpufreq zfs(POE) zunicode(P>
Aug 04 14:57:03 babylon kernel: CR2: ffff8f8ed0238000
Aug 04 14:57:03 babylon kernel: ---[ end trace ff429f9ab37d5705 ]---
Aug 04 14:57:03 babylon kernel: RIP: 0010:kfd_gtt_sa_free+0x39/0x80 [amdgpu]
Aug 04 14:57:03 babylon kernel: Code: f5 53 48 89 fb 0f 1f 44 00 00 4c 8d a3 70 01 00 00 4c 89 e7 e8 68 86 2d de 8b 45 00 3b 45 04 77 16 48 8b 93 68 01 00 00 89 c1 <f0> 48 0f b3 0a 83 c0 01 39 45 04 73 ea 4c 89>
Aug 04 14:57:03 babylon kernel: RSP: 0018:ffffa921816f7d10 EFLAGS: 00010282
Aug 04 14:57:03 babylon kernel: RAX: 000000000803e000 RBX: ffff8f8ecc3c8000 RCX: 000000000803e000
Aug 04 14:57:03 babylon kernel: RDX: ffff8f8ecf230400 RSI: ffff8f8ecce14660 RDI: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: RBP: ffff8f8ecce14660 R08: 0000000000000000 R09: ffffa921816f7d98
Aug 04 14:57:03 babylon kernel: R10: ffffffffffffffff R11: ffffffffffffffff R12: ffff8f8ecc3c8170
Aug 04 14:57:03 babylon kernel: R13: 0000000000000000 R14: ffff8f8ec180b000 R15: ffff8f8ec180b0c8
Aug 04 14:57:03 babylon kernel: FS:  0000000000000000(0000) GS:ffff8f95aec80000(0000) knlGS:0000000000000000
Aug 04 14:57:03 babylon kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Aug 04 14:57:03 babylon kernel: CR2: ffff8f8ed0238000 CR3: 00000001c70d0000 CR4: 00000000003506e0
Aug 04 14:57:12 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2252, emitted seq=2254
Aug 04 14:57:12 babylon kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Aug 04 14:57:12 babylon kernel: amdgpu 0000:29:00.0: amdgpu: GPU reset begin!
Aug 04 14:57:12 babylon kernel: amdgpu 0000:29:00.0: amdgpu: Bailing on TDR for s_job:644, as another already in progress

Offline

#2 2021-08-11 20:37:21

johnny.honu
Member
Registered: 2021-04-18
Posts: 17

Re: AMDGPU refuses to wake up after system sleep or hibernate

Same problem here on my Asus ZenBook 14.

Offline

#3 2021-09-02 08:00:27

coxackie
Member
Registered: 2019-11-10
Posts: 4

Re: AMDGPU refuses to wake up after system sleep or hibernate

I have the same problem on Alienware Aurora R10 - AMD Radeon RX 5700 GPU

Logs:

```
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: message:          RunBtc (58)         param: 0x00000000 is timeout (no response)
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: RunBtc failed!
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: Failed to setup smc hw!
Sep 01 17:18:23 archer kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Sep 01 17:18:23 archer kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Sep 01 17:18:23 archer kernel: amdgpu 0000:0d:00.0: PM: failed to resume async: error -62
Sep 01 17:18:23 archer kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 01 17:18:23 archer kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Sep 01 17:18:23 archer kernel: amdgpu: Move buffer fallback to memcpy unavailable
Sep 01 17:18:23 archer kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Sep 01 17:18:23 archer kernel: amdgpu: Move buffer fallback to memcpy unavailable
```
If anyone has any solution, please share!

Offline

#4 2022-01-18 11:00:54

zesko
Member
Registered: 2021-08-22
Posts: 9

Re: AMDGPU refuses to wake up after system sleep or hibernate

I have the same issue.
Linux Kernel: 5.16.1
GPU: RX 5700 connects to dual monitors via DisplayPorts.

Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: amdgpu: RunBtc failed!
Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: amdgpu: Failed to setup smc hw!
Jan 18 11:45:41 zesko kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62
Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: amdgpu: amdgpu_device_ip_resume failed (-62).
Jan 18 11:45:41 zesko kernel: PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -62
Jan 18 11:45:41 zesko kernel: amdgpu 0000:09:00.0: PM: failed to resume async: error -62
Jan 18 11:45:41 zesko kernel: amdgpu: Move buffer fallback to memcpy unavailable
Jan 18 11:45:41 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Jan 18 11:45:42 zesko kernel: amdgpu: Move buffer fallback to memcpy unavailable
Jan 18 11:45:42 zesko kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to process the buffer list -19!
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=2405, emitted seq=2407
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma1 timeout, signaled seq=716, emitted seq=718
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 18 11:45:51 zesko kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process  pid 0 thread  pid 0
Jan 18 11:45:52 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 18 11:45:52 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KGQ disable failed
Jan 18 11:45:52 zesko kernel: amdgpu 0000:09:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 18 11:45:52 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* KCQ disable failed
Jan 18 11:45:52 zesko kernel: [drm:gfx_v10_0_hw_fini [amdgpu]] *ERROR* failed to halt cp gfx
Jan 18 11:45:57 zesko kernel: amdgpu 0000:09:00.0: amdgpu: SMU: I'm not done with your previous command: SMN_C2PMSG_66:0x0000003A SMN_C2PMSG_82:0x00000000
Jan 18 11:45:57 zesko kernel: amdgpu 0000:09:00.0: amdgpu: Failed to enter BACO state!
Jan 18 11:45:57 zesko kernel: amdgpu 0000:09:00.0: amdgpu: ASIC reset failed with error, -62 for drm dev, 0000:09:00.0
Jan 18 11:46:02 zesko kernel: amdgpu 0000:09:00.0: amdgpu: RunBtc failed!
Jan 18 11:46:02 zesko kernel: amdgpu 0000:09:00.0: amdgpu: Failed to setup smc hw!
Jan 18 11:46:02 zesko kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <smu> failed -62

Last edited by zesko (2022-01-18 11:06:35)

Offline

#5 2022-01-18 15:53:26

doctorvi
Member
Registered: 2022-01-18
Posts: 1

Re: AMDGPU refuses to wake up after system sleep or hibernate

Same problem here on a HP laptop with ryzen 4700U on 5.16.1-arch1-1

Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d0000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d1000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d3000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:4 pasid:32779, for process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:   in page starting at address 0x00008001078d5000 from IH client 0x1b (UTCL2)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00441051
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          Faulty UTCL2 client ID: TCP (0x8)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MORE_FAULTS: 0x1
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          WALKER_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          PERMISSION_FAULTS: 0x5
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          MAPPING_ERROR: 0x0
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu:          RW: 0x1
Jan 18 16:32:37 Nudel kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
... Lot of the same 
Jan 18 16:32:37 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: couldn't schedule ib on ring <sdma0>
Jan 18 16:32:37 Nudel kernel: [drm:amdgpu_job_run [amdgpu]] *ERROR* Error scheduling IBs (-22)
Jan 18 16:32:55 Nudel systemd-logind[481]: Lid opened.
Jan 18 16:32:57 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1768, emitted seq=1771
Jan 18 16:32:57 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!
Jan 18 16:32:57 Nudel kernel: [drm] free PSP TMR buffer
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: MODE2 reset
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset succeeded, trying to resume
Jan 18 16:32:57 Nudel kernel: [drm] PCIE GART of 1024M enabled.
Jan 18 16:32:57 Nudel kernel: [drm] PTB located at 0x000000F400900000
Jan 18 16:32:57 Nudel kernel: [drm] PSP is resuming...
Jan 18 16:32:57 Nudel kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: RAS: optional ras ta ucode is not available
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: RAP: optional rap ta ucode is not available
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resuming...
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: dpm has been disabled
Jan 18 16:32:57 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: SMU is resumed successfully!
Jan 18 16:32:57 Nudel kernel: [drm] DMUB hardware initialized: version=0x0101001C
Jan 18 16:32:58 Nudel kernel: [drm] kiq ring mec 2 pipe 1 q 0
Jan 18 16:32:58 Nudel kernel: amdgpu 0000:04:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed (-110)
Jan 18 16:32:58 Nudel kernel: [drm:amdgpu_gfx_enable_kcq.cold [amdgpu]] *ERROR* KCQ enable failed
Jan 18 16:32:58 Nudel kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <gfx_v9_0> failed -110
Jan 18 16:32:58 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset(3) failed
Jan 18 16:32:58 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset end with ret = -110
Jan 18 16:33:08 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=1771, emitted seq=1771
Jan 18 16:33:08 Nudel kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process kscreenlocker_g pid 1301 thread kscreenloc:cs0 pid 1303
Jan 18 16:33:08 Nudel kernel: amdgpu 0000:04:00.0: amdgpu: GPU reset begin!

Offline

#6 2022-04-19 11:33:36

ts
Member
Registered: 2021-04-22
Posts: 10

Re: AMDGPU refuses to wake up after system sleep or hibernate

This just started happening to me on my LTS kernel (5.15.34-1-lts) that I updated on 4/18/2022.  I am passing along to assist with more details.

 
arch kernel: PM: Some devices failed to suspend, or early wake event detected
Apr 19 06:07:38 kernel: amdgpu 0000:0b:00.0: PM: failed to suspend async: error -5
Apr 19 06:07:38  kernel: PM: dpm_run_callback(): pci_pm_suspend+0x0/0x160 returns -5
Apr 19 06:07:38 kernel: PM: pci_pm_suspend(): amdgpu_pmops_suspend+0x0/0x70 [amdgpu] returns -5
Apr 19 06:07:38 kernel: amdgpu 0000:0b:00.0: amdgpu: Failed to enter BACO state!

My system info is as follows

 Kernel: 5.15.34-1-lts arch: x86_64 bits: 64 Desktop: KDE Plasma v: 5.24.4
    Distro: Arch Linux
Machine:
  Type: Desktop Mobo: ASUSTeK model: TUF GAMING X570-PLUS (WI-FI) v: Rev X.0x
    serial: <superuser required> UEFI: American Megatrends v: 4021
    date: 08/10/2021
CPU:
  Info: 8-core model: AMD Ryzen 7 3800X bits: 64 type: MT MCP cache:
    L2: 4 MiB
  Speed (MHz): avg: 2300 min/max: 2200/4559 cores: 1: 2050 2: 2053 3: 2197
    4: 2196 5: 2795 6: 1861 7: 1862 8: 1862 9: 2196 10: 2196 11: 3588 12: 2051
    13: 2396 14: 2052 15: 3590 16: 1870
Graphics:
  Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT / 5700/5700 XT]
    driver: amdgpu v: kernel
  Display: x11 server: X.Org v: 21.1.3 with: Xwayland v: 22.1.1 driver: X:
    loaded: modesetting unloaded: vesa gpu: amdgpu resolution: 3440x1440~60Hz
  OpenGL:
    renderer: AMD Radeon RX 5600 XT (navi10 LLVM 13.0.1 DRM 3.42 5.15.34-1-lts)
    v: 4.6 Mesa 22.0.1
Audio:
  Device-1: AMD Navi 10 HDMI Audio driver: snd_hda_intel
  Device-2: AMD Starship/Matisse HD Audio driver: snd_hda_intel
  Sound Server-1: ALSA v: k5.15.34-1-lts running: yes
  Sound Server-2: PipeWire v: 0.3.50 running: yes
Network:
  Device-1: Intel Wireless-AC 9260 driver: iwlwifi
  IF: wlp4s0 state: up mac: <filter>
  Device-2: Realtek RTL8111/8168/8411 PCI Express Gigabit Ethernet
    driver: r8169
  IF: enp5s0 state: up speed: 1000 Mbps duplex: full mac: <filter>
  IF-ID-1: wg-mullvad state: unknown speed: N/A duplex: N/A mac: N/A
Bluetooth:
  Device-1: Intel Wireless-AC 9260 Bluetooth Adapter type: USB driver: btusb
  Report: rfkill ID: hci0 state: up address: see --recommends
Drives:
  Local Storage: total: 2.69 TiB used: 922.26 GiB (33.4%)
  ID-1: /dev/sda vendor: Seagate model: ST2000DM008-2FR102 size: 1.82 TiB
  ID-2: /dev/sdb vendor: Kingston model: SA400S37480G size: 447.13 GiB
  ID-3: /dev/sdc vendor: Kingston model: SA400S37480G size: 447.13 GiB
Partition:
  ID-1: / size: 402.57 GiB used: 180.77 GiB (44.9%) fs: ext4 dev: /dev/sdb2
  ID-2: /boot size: 511 MiB used: 84.8 MiB (16.6%) fs: vfat dev: /dev/sdb1
Swap:
  Alert: No swap data was found.
Sensors:
  System Temperatures: cpu: N/A mobo: N/A gpu: amdgpu temp: 51.0 C
  Fan Speeds (RPM): N/A gpu: amdgpu fan: 0
Info:
  Processes: 350 Uptime: 18m Memory: 62.78 GiB used: 3.85 GiB (6.1%)
  Shell: Zsh inxi: 3.3.15

Offline

#7 2022-04-19 13:49:32

seth
Member
Registered: 2012-09-03
Posts: 29,924

Re: AMDGPU refuses to wake up after system sleep or hibernate

amdgpu.aspm=0 amdgpu.runpm=0 amdgpu.bapm=0 pcie_aspm=off

Possibly

amdgpu.dpm=0

but that's otr for causing the GPU to fail to power up entirely.

https://wiki.archlinux.org/title/Kernel_parameters - I recommend to transiently test them by editing the commandline at the bootloader, not to alter the configuration statically (because of the dpm situation)

Offline

#8 2022-04-23 22:58:11

ts
Member
Registered: 2021-04-22
Posts: 10

Re: AMDGPU refuses to wake up after system sleep or hibernate

I updated to the current stable release and the problem exists still..

For me now, bluetooth is throwing errors and running the below is a workaround

 
sudo systemctl stop bluetooth.service
sudo systemctl disable bluetooth.service

Offline

Board footer

Powered by FluxBB