You are not logged in.

#1 2018-05-01 17:22:40

igo95862
Member
Registered: 2018-05-01
Posts: 6

amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

After upgrading to linux 4.16 my arch linux can no longer resume from the suspend. The screen will get covered with artefacts and the monitor becomes unresponsive. CTRL-ALT-F1 also does not work. The only way to reboot is using kernel magic keys.

Downgrading to linux-lts fixes the issue.

GPU model: RX 480

Journalctl logs:

 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0d004801
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0906C3F5
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08088001
 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4) at page 151438325, read from 'TC6' (0x54433600) (136)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0f104801
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0906C3E2
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048001
 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 4) at page 151438306, read from 'TC4' (0x54433400) (72)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x08048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 4) at page 0, read from 'TC4' (0x54433400) (72)
 hornet kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x0f084801
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0902E3E1
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0A048001
 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 5) at page 151184353, read from 'TC4' (0x54433400) (72)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 6) at page 0, read from 'TC4' (0x54433400) (72)
 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 6) at page 0, read from 'TC4' (0x54433400) (72)
 hornet kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x00004802
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0E048002
 kernel: amdgpu 0000:01:00.0: VM fault (0x02, vmid 7) at page 0, read from 'TC4' (0x54433400) (72)
 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=481, last emitted seq=483

Trying to disable amdgpu.dc with amdgpu.d=0 kernel parametre did not fix the issue.

Last edited by igo95862 (2018-10-06 06:29:06)

Offline

#2 2018-05-01 17:43:24

loqs
Member
Registered: 2014-03-06
Posts: 17,373

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

Welcome to the arch linux forums igo95862.
Please use code tags for commands and their outputs as you mentioned the issue occurrs only after suspend possibly https://lkml.org/lkml/2018/4/21/185
otherwise see https://bbs.archlinux.org/viewtopic.php … 1#p1782871 for bisecting between 4.15 and 4.16.

Offline

#3 2018-05-01 18:08:51

thorstenhirsch
Member
Registered: 2005-08-03
Posts: 102

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

I face a similar problem (also with RX 480 GPU, kernel version is 4.16.4). But the log entries differ somewhat...

May 01 19:55:33 foo kernel: amdgpu 0000:09:00.0: 00000000e2f83a89 unpin not necessary
May 01 19:55:33 foo kernel: amdgpu 0000:09:00.0: 0000000088dc85ea unpin not necessary
...
May 01 19:55:33 foo upowerd[839]: unhandled action 'unbind' on /sys/devices/pci0000:00/0000:00:07.1/0000:0a:00.3/usb3/3-1/3-1:1.0
...
May 01 19:55:37 foo gnome-shell[835]: Failed to set CRTC mode 1920x1080: Invalid argument
May 01 19:55:37 foo gnome-shell[835]: Failed to flip: Device or resource busy
May 01 19:55:37 foo gnome-shell[835]: Failed to set CRTC mode 1920x1080: Invalid argument
May 01 19:55:37 foo gnome-shell[835]: Failed to set CRTC mode 1920x1080: Invalid argument
...

After waking up, there are no artefacts, but the gnome-shell seems frozen - except the mouse pointer still works. However the screen does not respond to any action, neither by mouse nor by keyboard.

Offline

#4 2018-05-01 18:31:31

thorstenhirsch
Member
Registered: 2005-08-03
Posts: 102

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

Problem persists in kernel 4.16.5.

Offline

#5 2018-05-03 18:26:01

igo95862
Member
Registered: 2018-05-01
Posts: 6

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

loqs wrote:

Welcome to the arch linux forums igo95862.
Please use code tags for commands and their outputs as you mentioned the issue occurrs only after suspend possibly https://lkml.org/lkml/2018/4/21/185
otherwise see https://bbs.archlinux.org/viewtopic.php … 1#p1782871 for bisecting between 4.15 and 4.16.

I've added code tags.

Good to know that the issue is being worked on. I guess I will run LTS kernel until it is resolved.

Offline

#6 2018-05-06 17:06:55

loqs
Member
Registered: 2014-03-06
Posts: 17,373

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

Can you try 4.17-rc4 when it is released using linux-mainline from AUR or Unofficial_user_repositories#miffe or linux-git from AUR.
The patches for the issue I linked are now in the mainline tree which should have the rc-4 release in the next few hours.
If the issue remain then please perform the bisection as it will be a separate issue that needs investigation.
Edit:
4-17-rc4 not 4.16-rc4

Last edited by loqs (2018-05-12 10:43:07)

Offline

#7 2018-05-12 10:29:31

ArthurBorsboom
Member
Registered: 2014-05-20
Posts: 55

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

I noticed the same behaviour after upgrading to kernel 4.16 (RX580).
The problem still seems to persist in 4.16.8

I guess this bug needs to go upstream.
However, 4.17 will use the AMD DC driver, so I am wondering if it is worth the efforts.

Maybe it helps to force the 4.16 kernel to use DC driver, by setting the kernel parameter: amdgpu.dc=1 ?
I will give it a try at least.

Offline

#8 2018-05-12 10:42:39

loqs
Member
Registered: 2014-03-06
Posts: 17,373

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

@ArthurBorsboom you could try 4.17-rc4 see if the issue is fixed in 4.17.

Offline

#9 2018-08-19 13:48:00

ArthurBorsboom
Member
Registered: 2014-05-20
Posts: 55

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

The issue still occurs with kernel 4.18.1 + RX580.

I suspend the system, resume, the screen shows with a lot of artifacts and I seem to be unable to do anything, except a hard reset.

Offline

#10 2018-08-24 10:53:05

ArthurBorsboom
Member
Registered: 2014-05-20
Posts: 55

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

The crash still occurs in kernel 4.18.4 + RX580.

Below seems to be the 'interesting part' of dmesg.

Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x04b8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB11897
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179378327, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 146 0x0000480c
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00000000
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0404800C
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x0c, vmid 2, pasid 0) at page 0, read from 'TC4' (0x54433400) (72)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: GPU fault detected: 147 0x06a8c801
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0AB152D5
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x040C8001
Aug 24 12:43:17 z97 kernel: amdgpu 0000:01:00.0: VM fault (0x01, vmid 2, pasid 0) at page 179393237, read from 'TC2' (0x54433200) (200)
Aug 24 12:43:28 z97 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=810, last emitted seq=812
Aug 24 12:43:28 z97 kernel: [drm] GPU recovery disabled.

The last line gives hope for a possible workaround. Maybe the GPU recovery functionality can be enable by a kernel boot parameter.
I will search and report back later for this.

Offline

#11 2018-08-24 20:31:36

ArthurBorsboom
Member
Registered: 2014-05-20
Posts: 55

Re: amdgpu driver crashes upon resuming from suspend. linux 4.16 [solved]

To enable the GPU recovery the kernel boot parameter seems to be set as follows.

amdgpu.gpu_recovery=1

1=enable
0=disable
-1=auto
default is auto

Unfortunately it does not seem to help as a workaround to recover from a failed suspend/resume attempt.
For now I have disable suspend completely.

Offline

Board footer

Powered by FluxBB