You are not logged in.

#1 2018-06-25 08:45:31

tcn
Member
Registered: 2011-09-30
Posts: 55

[drm] GPU recovery disabled.

Hi!

This is a surprisingly long standing problem, more precisely since 4.15 all the way up to 4.17. After resuming from sleep (echo -n mem > /sys/power/state) amdgpu is dead (always, reliably).
Here's what dmesg has to say about it:

[   42.802559] PM: suspend exit
[   42.824332] amdgpu 0000:41:00.0: GPU fault detected: 147 0x0bd84802
[   42.824338] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0034F97B
[   42.824341] amdgpu 0000:41:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x0C048002
[   42.824345] amdgpu 0000:41:00.0: VM fault (0x02, vmid 6) at page 3471739, read from 'TC0' (0x54433000) (72)
[   52.956306] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, last signaled seq=1287, last emitted seq=1289
[   52.956316] [drm] IP block:gfx_v8_0 is hung!
[   52.956362] [drm] GPU recovery disabled.

I've also seen fault 146 but other than that it mostly looks the same. 4.14-lts (with dc=0) works fine. Also tried the amd staging next kernel from AUR.

Zenith Extreme, 1950x, RX 460 (polaris 11).

Any advice? Report upstream?

thx
tcn

Last edited by tcn (2018-06-25 08:45:53)

Offline

#2 2018-06-25 12:44:59

loqs
Member
Registered: 2014-03-06
Posts: 17,323

Re: [drm] GPU recovery disabled.

How recently did you try linux-amd-staging-git?  If linux-amd-staging-drm-next-git has the issue would suggest reporting it upstream.
Also could you elaborate on all the combinations of amdgpu.dc and kernel version you have tried and the results.

Offline

#3 2018-06-25 13:39:56

tcn
Member
Registered: 2011-09-30
Posts: 55

Re: [drm] GPU recovery disabled.

loqs wrote:

How recently did you try linux-amd-staging-git?

Couple of days ago. Well, however, while this looks like it's 4.18...

==> Making package: linux-amd-staging-drm-next-git 4.18.754874.486e4f30e2a3-1 (Mon 25 Jun 2018 03:29:39 PM CEST)

...what I actually end up with is 4.17-rc5-something. I just deleted the AUR git (branch is master, BTW) and retried: same (yay -S linux-amd-staging-drm-next-git).

loqs wrote:

Also could you elaborate on all the combinations of amdgpu.dc and kernel version you have tried and the results.

Well, I don't have dmesg for very combination at hand but let's put it this way: so far I haven't found any combination that wouldn't immediately crash.

How can I tell whether DC *is actually* enabled?

Offline

#4 2018-06-25 16:48:01

loqs
Member
Registered: 2014-03-06
Posts: 17,323

Re: [drm] GPU recovery disabled.

tcn wrote:
==> Making package: linux-amd-staging-drm-next-git 4.18.754874.486e4f30e2a3-1 (Mon 25 Jun 2018 03:29:39 PM CEST)

...what I actually end up with is 4.17-rc5-something. I just deleted the AUR git (branch is master, BTW) and retried: same (yay -S linux-amd-staging-drm-next-git).

_kernel_rel=4.18
....
  echo ${_kernel_rel}.$(git rev-list --count HEAD).$(git rev-parse --short HEAD)

Yes the PKGBUILD sets the version to 4.18 even though it is based on a 4.17 release candidate according to the git tree and the top Makefile.

tcn wrote:
loqs wrote:

Also could you elaborate on all the combinations of amdgpu.dc and kernel version you have tried and the results.

Well, I don't have dmesg for very combination at hand but let's put it this way: so far I haven't found any combination that wouldn't immediately crash.

How can I tell whether DC *is actually* enabled?

parm:           dc:Display Core driver (1 = enable, 0 = disable, -1 = auto (default)) (int)

If you are not specifying the option it goes to autodetection for your GPU I would expect autodetection to enable it on 4.17+

Offline

Board footer

Powered by FluxBB