You are not logged in.
After upgrading to linux 4.16 my arch linux can no longer resume from the suspend. The screen will get covered with artefacts and the monitor becomes unresponsive. CTRL-ALT-F1 also does not work. The only way to reboot is using kernel magic keys.
Downgrading to linux-lts fixes the issue.
GPU model: RX 480
Journalctl logs:
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d004801
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0906C3F5
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08088001
kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4) at page 151438325, read from 'TC6' (0x54433600) (136)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0f104801
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0906C3E2
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048001
kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4) at page 151438306, read from 'TC4' (0x54433400) (72)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
hornet kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0f084801
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0902E3E1
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048001
kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 5) at page 151184353, read from 'TC4' (0x54433400) (72)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 6) at page 0, read from 'TC4' (0x54433400) (72)
kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 6) at page 0, read from 'TC4' (0x54433400) (72)
hornet kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E048002
kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 7) at page 0, read from 'TC4' (0x54433400) (72)
kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=481, last emitted seq=483
Trying to disable amdgpu.dc with amdgpu.d=0 kernel parametre did not fix the issue.
Last edited by igo95862 (2018-10-06 06:29:06)
Offline
Welcome to the arch linux forums igo95862.
Please use code tags for commands and their outputs as you mentioned the issue occurrs only after suspend possibly https://lkml.org/lkml/2018/4/21/185
otherwise see https://bbs.archlinux.org/viewtopic.php … 1#p1782871 for bisecting between 4.15 and 4.16.
Offline
I face a similar problem (also with RX 480 GPU, kernel version is 4.16.4). But the log entries differ somewhat...
May 01 19:55:33 foo kernel: amdgpu 0000:09:00.0: 00000000e2f83a89 unpin not necessary
May 01 19:55:33 foo kernel: amdgpu 0000:09:00.0: 0000000088dc85ea unpin not necessary
...
May 01 19:55:33 foo upowerd[839]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:07.1/0000:0a:00.3/usb3/3-1/3-1:1.0
...
May 01 19:55:37 foo gnome-shell[835]: Failed to set CRTC mode 1920x1080: Invalid argument
May 01 19:55:37 foo gnome-shell[835]: Failed to flip: Device or resource busy
May 01 19:55:37 foo gnome-shell[835]: Failed to set CRTC mode 1920x1080: Invalid argument
May 01 19:55:37 foo gnome-shell[835]: Failed to set CRTC mode 1920x1080: Invalid argument
...
After waking up, there are no artefacts, but the gnome-shell seems frozen - except the mouse pointer still works. However the screen does not respond to any action, neither by mouse nor by keyboard.
Offline
Problem persists in kernel 4.16.5.
Offline
Welcome to the arch linux forums igo95862.
Please use code tags for commands and their outputs as you mentioned the issue occurrs only after suspend possibly https://lkml.org/lkml/2018/4/21/185
otherwise see https://bbs.archlinux.org/viewtopic.php … 1#p1782871 for bisecting between 4.15 and 4.16.
I've added code tags.
Good to know that the issue is being worked on. I guess I will run LTS kernel until it is resolved.
Offline
Can you try 4.17-rc4 when it is released using linux-mainline from AUR or Unofficial_user_repositories#miffe or linux-git from AUR.
The patches for the issue I linked are now in the mainline tree which should have the rc-4 release in the next few hours.
If the issue remain then please perform the bisection as it will be a separate issue that needs investigation.
Edit:
4-17-rc4 not 4.16-rc4
Last edited by loqs (2018-05-12 10:43:07)
Offline
I noticed the same behaviour after upgrading to kernel 4.16 (RX580).
The problem still seems to persist in 4.16.8
I guess this bug needs to go upstream.
However, 4.17 will use the AMD DC driver, so I am wondering if it is worth the efforts.
Maybe it helps to force the 4.16 kernel to use DC driver, by setting the kernel parameter: amdgpu.dc=1 ?
I will give it a try at least.
Offline
@ArthurBorsboom you could try 4.17-rc4 see if the issue is fixed in 4.17.
Offline
The issue still occurs with kernel 4.18.1 + RX580.
I suspend the system, resume, the screen shows with a lot of artifacts and I seem to be unable to do anything, except a hard reset.
Offline
The crash still occurs in kernel 4.18.4 + RX580.
Below seems to be the 'interesting part' of dmesg.
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x04b8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB11897
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179378327, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x00000000
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 2, pasid 0) at page 0, read from 'TC4' (0x54433400) (72)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:28 z97 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=810, last emitted seq=812
Aug 24 12:43:28 z97 kernel: [drm] GPU recovery disabled.
The last line gives hope for a possible workaround. Maybe the GPU recovery functionality can be enable by a kernel boot parameter.
I will search and report back later for this.
Offline
To enable the GPU recovery the kernel boot parameter seems to be set as follows.
amdgpu.gpu_recovery=1
1=enable
0=disable
-1=auto
default is auto
Unfortunately it does not seem to help as a workaround to recover from a failed suspend/resume attempt.
For now I have disable suspend completely.
Offline