You are not logged in.

#1 2019-01-07 15:23:42

Coriolis
Member
From: Russian Federation
Registered: 2015-07-26
Posts: 27

"GPU fault detected" with amdgpu and opencl-amd in blender OpenCL

I've bought sapphire rx 590 nitro+ recently. I'm using open source amdgpu driver and standalone proprietary amd opencl driver (package opencl-amd 18.50 from AUR), as advised in an amdgpu wiki page.

Everything works fine (games, benchmarks, stress-tests), except for OpenCL in blender (both stable 2.79 and beta 2.8). OpenCL works fine and renders everything correctly, until I do a certain action (adding subsurface scattering anywhere in my node tree), then, after finishing OpenCL kernel recompilation, my display output completely hangs (although sometimes I still can switch to other ttys).

I'm not sure if this is blender bug (I couldn't find any other cases like that) or something related to amdgpu, considering that rx 590 is very new, and, until linux 4.20 amdgpu patches, it didn't work at all with amdgpu. I haven't tried running full amdgpu-pro driver, since right now AUR package can't be installed due to missing binfmt-support package (and it probably won't work, since it requires downgrade of kernel and xorg, which I can't do due to missing support for my card in earlier kernel versions). I didn't try using mesa-opencl, since blender doesn't support it.

What I've tried so far:
Using amdgpu.dpm=0 kernel parameter, as rx 590 didn't work with pre-4.20 linux due to problems with dynamic power management.
Downgrading my kernel (as expected, my card didn't work at all).
Downgrading libdrm, since there were problems with opencl-amd and libdrm.
Checking for hardware problems (maybe bad pci-e connection or I didn't push in my card completely).

I have this in my journal every time it happens:

Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0e924814 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001813D2
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05048014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1577938, write from 'TC4' (0x54433400) (72)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0f420814 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0018237E
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05048014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1581950, write from 'TC4' (0x54433400) (72)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0dca0414 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x0018239F
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05088014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1581983, write from 'TC6' (0x54433600) (136)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0deac414 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001823A9
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05084014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1581993, write from 'TC7' (0x54433700) (132)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0f528814 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001829F5
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050C8014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1583605, write from 'TC2' (0x54433200) (200)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0d220814 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001823B5
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050C4014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1582005, write from 'TC3' (0x54433300) (196)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0ec28414 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001823B5
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050C4014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1582005, write from 'TC3' (0x54433300) (196)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0f920814 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x00182366
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x050C8014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1581926, write from 'TC2' (0x54433200) (200)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0dfa4814 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001829DF
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05088014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1583583, write from 'TC6' (0x54433600) (136)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: GPU fault detected: 146 0x0dca0414 for process blender pid 24453 thread blender pid 24453
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_ADDR   0x001829B9
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0:   VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x05088014
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: VM fault (0x14, vmid 2, pasid 32794) at page 1583545, write from 'TC6' (0x54433600) (136)
Jan 06 23:36:14 white kernel: amdgpu 0000:07:00.0: IH ring buffer overflow (0x00088D80, 0x00009E90, 0x00008D90)

Offline

Board footer

Powered by FluxBB