You are not logged in.

#26 2021-05-21 04:13:05

huohuo
Member
Registered: 2016-04-20
Posts: 28

Re: AMD Ryzen 2700U graphics errors

Hi, I also got similar issue but only run the GUI mode of CERN-root, i downgrade to 5.11.arch2-1 but didn't solve the issue.
C/GPU: AMD Ryzen 7 4800U with Radeon Graphics
current kernel: 5.12.5
btw, i also tried "iommu=pt" but it didn't help.

May 21 11:20:08 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences tim>
May 21 11:20:09 systemd-timesyncd[438]: Initial synchronization to time server [2001:1600:3:6::123>
May 21 11:20:13 kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences tim>
May 21 11:20:13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, signaled seq=>
May 21 11:20:13 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process Xo>
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset begin!
May 21 11:20:13 kernel: [drm] free PSP TMR buffer
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: MODE2 reset
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset succeeded, trying to resume
May 21 11:20:13 kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F400900000).
May 21 11:20:13 kernel: [drm] PSP is resuming...
May 21 11:20:13 kernel: [drm] reserve 0x400000 from 0xf41f800000 for PSP TMR
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucode is not available
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucode is not available
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resuming...
May 21 11:20:13 kernel: amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully!
May 21 11:20:14 kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 21 11:20:14 kernel: [drm] DMUB hardware initialized: version=0x01020008
May 21 11:20:14 kernel: [drm] VCN decode and encode initialized successfully(under DPG Mode).
May 21 11:20:14 kernel: [drm] JPEG decode initialized successfully.
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring gfx uses VM inv eng 0 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inv eng 11 on hub 0
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng 0 on hub 1
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_dec uses VM inv eng 1 on hub 1
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inv eng 4 on hub 1
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses VM inv eng 5 on hub 1
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inv eng 6 on hub 1
May 21 11:20:14 /usr/lib/gdm-x-session[945]: (II) event18 - Logitech B330/M330/M3: SYN_DROPPED eve>
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow start
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: recover vram bo from shadow done
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: amdgpu 0000:03:00.0: amdgpu: GPU reset(2) succeeded!
May 21 11:20:14 kernel: [drm] Skip scheduling IBs!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 systemd[928]: run-user-120.mount: Deactivated successfully.
May 21 11:20:14 systemd[1]: user-runtime-dir@120.service: Deactivated successfully.
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: audit: type=1131 audit(1621567214.371:90): pid=1 uid=0 auid=4294967295 ses>
May 21 11:20:14 audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 msg='unit=user-r>
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 systemd[1]: Stopped User Runtime Directory /run/user/120.
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 /usr/lib/gdm-x-session[945]: amdgpu: The CS has been cancelled because the context>
May 21 11:20:14 systemd[1]: Removed slice User Slice of UID 120.
May 21 11:20:14 systemd[1]: user-120.slice: Consumed 4.783s CPU time.
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!
May 21 11:20:14 kernel: [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

Offline

#27 2021-05-21 06:17:18

LucidComplex
Member
Registered: 2014-08-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

Tried 5.10.15 and it still crashes.

I don't remember crashing way back... Possibly not the kernel?

Offline

#28 2021-05-21 06:57:00

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,427

Re: AMD Ryzen 2700U graphics errors

mesa did see a few big bumps as well and could be a likely candidate

Online

#29 2021-05-21 07:50:45

LucidComplex
Member
Registered: 2014-08-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

I've downgraded to mesa 21.0.3-2.

Checking journalctl, the most recent I've started getting the gfxhub0 page faults are from May 13. I upgraded mesa on the same day (and rebooted). Looks like this is indeed the culprit.

Offline

#30 2021-05-21 08:10:59

huohuo
Member
Registered: 2016-04-20
Posts: 28

Re: AMD Ryzen 2700U graphics errors

LucidComplex wrote:

I've downgraded to mesa 21.0.3-2.

Checking journalctl, the most recent I've started getting the gfxhub0 page faults are from May 13. I upgraded mesa on the same day (and rebooted). Looks like this is indeed the culprit.

does the mesa downgrading work?

Offline

#31 2021-05-21 08:16:11

LucidComplex
Member
Registered: 2014-08-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

Still observing. If I don't get a crash this session, I'll report back.

Offline

#32 2021-05-21 08:31:47

codicodi
Member
Registered: 2021-05-21
Posts: 3

Re: AMD Ryzen 2700U graphics errors

According to folks on other distros this may be firmware issue (possibly intruduced in linux-firmware-20210426 which came out in Arch on 5th May).
https://forums.opensuse.org/showthread. … ost3029836
https://forum.manjaro.org/t/system-freq … /62139/114

Offline

#33 2021-05-21 10:05:33

huohuo
Member
Registered: 2016-04-20
Posts: 28

Re: AMD Ryzen 2700U graphics errors

codicodi wrote:

According to folks on other distros this may be firmware issue (possibly intruduced in linux-firmware-20210426 which came out in Arch on 5th May).
https://forums.opensuse.org/showthread. … ost3029836
https://forum.manjaro.org/t/system-freq … /62139/114

i tried to downgrade to linux-firmware-20210426 but issue is still there.

Offline

#34 2021-05-21 11:09:15

codicodi
Member
Registered: 2021-05-21
Posts: 3

Re: AMD Ryzen 2700U graphics errors

huohuo wrote:
codicodi wrote:

According to folks on other distros this may be firmware issue (possibly intruduced in linux-firmware-20210426 which came out in Arch on 5th May).
https://forums.opensuse.org/showthread. … ost3029836
https://forum.manjaro.org/t/system-freq … /62139/114

i tried to downgrade to linux-firmware-20210426 but issue is still there.

Try downgrading to 20210315.3568f96-2. As I said above I think the issues started with 20210426 and weren't fixed yet with 20210511.

I'm running 20210315.3568f96-2 myself for a few hours now and everything appears to be in order, although it's too early to be sure...

Last edited by codicodi (2021-05-21 11:09:35)

Offline

#35 2021-05-21 20:00:07

cyberrumor
Member
Registered: 2019-04-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

Just popping in to say I have the same issue. I thought it was a hardware issue until I found this thread. I'm also on Sway, I've been having games crash and recently noticed graphical corruption in Alacritty, which caused a freeze that my session was able to recover from automatically. I think you guys are right, saying it has something to do with acceleration. I'm on a 3400g, using integrated graphics.

sudo dmesg
[53507.534738] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[53509.653244] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653257] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106000000 from client 27
[53509.653266] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653270] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653273] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653275] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653278] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653280] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653282] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653289] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653295] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106003000 from client 27
[53509.653303] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653305] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653308] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653310] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653312] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653314] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653316] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653321] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653326] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106004000 from client 27
[53509.653333] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653336] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653338] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653340] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653342] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653344] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653346] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653351] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653356] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106001000 from client 27
[53509.653363] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653365] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653368] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653370] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653372] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653374] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653376] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653380] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653386] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106002000 from client 27
[53509.653393] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653395] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653397] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653399] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653402] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653404] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653406] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653410] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653415] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106009000 from client 27
[53509.653422] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653424] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653427] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653429] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653431] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653433] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653435] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653439] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653444] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x80010600c000 from client 27
[53509.653451] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653453] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653455] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653458] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653460] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653462] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653464] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653468] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653473] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x80010600e000 from client 27
[53509.653479] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653482] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653484] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653486] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653488] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653490] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653492] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653496] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653501] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x800106006000 from client 27
[53509.653508] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653510] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653513] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653515] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653517] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653519] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653521] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53509.653525] amdgpu 0000:0a:00.0: amdgpu: [gfxhub0] retry page fault (src_id:0 ring:0 vmid:2 pasid:32769, for process sway pid 7837 thread sway:cs0 pid 7845)
[53509.653530] amdgpu 0000:0a:00.0: amdgpu:   in page starting at address 0x80010600b000 from client 27
[53509.653537] amdgpu 0000:0a:00.0: amdgpu: VM_L2_PROTECTION_FAULT_STATUS:0x00201031
[53509.653539] amdgpu 0000:0a:00.0: amdgpu: 	 Faulty UTCL2 client ID: TCP (0x8)
[53509.653541] amdgpu 0000:0a:00.0: amdgpu: 	 MORE_FAULTS: 0x1
[53509.653543] amdgpu 0000:0a:00.0: amdgpu: 	 WALKER_ERROR: 0x0
[53509.653545] amdgpu 0000:0a:00.0: amdgpu: 	 PERMISSION_FAULTS: 0x3
[53509.653547] amdgpu 0000:0a:00.0: amdgpu: 	 MAPPING_ERROR: 0x0
[53509.653549] amdgpu 0000:0a:00.0: amdgpu: 	 RW: 0x0
[53512.654702] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[53517.774691] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[53519.697569] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[53545.331994] kauditd_printk_skb: 21 callbacks suppressed

Versions of my packages below.

pacman -Qs amd
local/amd-ucode 20210511.7685cf4-1
    Microcode update image for AMD CPUs
local/amdvlk 2021.Q2.3-1
    AMD's standalone Vulkan driver
local/lib32-amdvlk 2021.Q2.3-1
    AMD's standalone Vulkan driver
local/lib32-opencl-mesa 21.1.0-1
    OpenCL support for AMD/ATI Radeon mesa drivers (32-bit)
local/opencl-mesa 21.1.0-1
    OpenCL support for AMD/ATI Radeon mesa drivers
local/xf86-video-amdgpu 19.1.0-2 (xorg-drivers)
    X.org amdgpu video driver
pacman -Qs linux-firmware
local/linux-firmware 20210511.7685cf4-1
    Firmware files for Linux
pacman -Qi linux
Name            : linux
Version         : 5.12.5.arch1-1
...

Offline

#36 2021-05-22 04:56:02

LucidComplex
Member
Registered: 2014-08-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

LucidComplex wrote:

Still observing. If I don't get a crash this session, I'll report back.

Downgrading mesa to 21.0.3-2 fixed my issues! No crashes last session.

Offline

#37 2021-05-22 11:22:47

Interplanetary
Member
Registered: 2021-05-22
Posts: 2

Re: AMD Ryzen 2700U graphics errors

*ERROR* Waiting for fences timed out!

first appeared in my journalctl on March 23 (although it might have caused crashes before that), with

mesa 20.3.4-3
linux-lts 5.10.25-1
linux-firmware 20210315.3568f96-1

Still having this problem today with

mesa 21.1.0-1
linux-lts 5.10.38-1
linux-firmware 20210511.7685cf4-1

using a Ryzen 7 3700U with Vega 10.

I also randomly get kernel panics on entering sleep mode, I don't know if that's got something to do with it.

Offline

#38 2021-05-22 12:20:38

greenfoo
Member
Registered: 2019-12-05
Posts: 8

Re: AMD Ryzen 2700U graphics errors

EDIT: I did not notice the second page of comments in this thread before replaying... so basically all I'm saying here had already been discussed, ups...


---- Original message ---

I'm experiencing the same issue: random GPU resets that start with this message appearing on journalctl:

*ERROR* Waiting for fences timed out!

This error was reported to amdgpu maintainers one year ago. It went almost silent since then but seems to have "resurrected" recently by looking at that thread's history.

In particular, in those comments they point to a manjaro thread where some people seem to have fixed the issue by downgrading linux-firmware.

I just did, and haven't had an issue for several hours (but should wait a few days just in case):

sudo pacman -U /var/cache/pacman/pkg/linux-firmware-20210315.3568f96-2-any.pkg.tar.zst

(...and don't forget to add "linux-firmware" to "IgnorePkg" in "/etc/pacman.conf" to tell pacman not to upgrade it from now on)

UPDATE:
It's been two days now without crashes (which was never the case before downgrading). However I see others on this same thread that say they still have crashes even when using the downgraded "linux-firmware-20210315" package.
As reference (in case it helps) I'm using a vega 8 GPU (Ryzen 3500U) and these versions of kernel and mesa:
- mesa 21.1.0
- linux 5.12.5

Last edited by greenfoo (2021-05-23 19:57:42)

Offline

#39 2021-05-22 15:17:55

cyberrumor
Member
Registered: 2019-04-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

I am still getting crashes with the downgraded linux-firmware, version 20210315.3568f96-2.

EDIT:

Also still getting crashes with with mesa 21.0.3-2. I'm starting to wonder if this is a BIOS bug. I'm on a gigabyte b450i with bios revision F61a. 3400g. I saw another thread at one point that mentioned XMP on this board causes voltage to drop on the iGPU. Is anyone else running XMP? Disabling XMP still gives me the issue, but it happens significantly less.

EDIT 2:

Slapping this reddit thread down since it's relevant.

Last edited by cyberrumor (2021-05-22 23:33:35)

Offline

#40 2021-05-22 17:34:56

danielem
Member
Registered: 2020-12-27
Posts: 25

Re: AMD Ryzen 2700U graphics errors

cyberrumor wrote:

I'm starting to wonder if this is a BIOS bug

Have my ryzen 2700U for 2 years already and the issues started only recently, so it is hard to justify it with BIOS issues (aldo it isn't completely impossible)

Offline

#41 2021-05-23 00:58:50

cyberrumor
Member
Registered: 2019-04-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

It looks like the fix is incoming. I'm not sure if this has been pulled into linux-mainline in the aur yet, but i'm going to build it and give it a shot.

Offline

#42 2021-05-23 05:58:09

LucidComplex
Member
Registered: 2014-08-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

Just experienced the bug again with downgraded mesa. Hope that fix works!

Offline

#43 2021-05-24 19:51:33

srslizness
Member
Registered: 2021-05-14
Posts: 12

Re: AMD Ryzen 2700U graphics errors

Ok, so, I'm back with news!

This is my current setup, based on comments here: linux 5.11.6.arch1-1 AND linux-firmware 20210315.3568f96-2

With these, the crashes have stopped occuring completely, it's now been a few days of constant heavy usage of my computer and not a single crash or any sign of the bug in my journal whatsoever.

CPU: AMD Ryzen 7 PRO 3700U w/ Radeon Vega Mobile Gfx

It is possible linux-firmware is the sole culprit, but I'm staying downgraded on linux as well just to be sure, until I can test the fix mentioned above. I hope this helps someone!

Offline

#44 2021-05-28 23:27:02

cyberrumor
Member
Registered: 2019-04-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

That's great news! I'm still on the official up-to-date linux-firmware from Arch repos, and my only change was compiling linux-mainline from the aur, and booting from it. I'm using kernel 5.13-rc2-1. I've now gone 5 days without the issue. I'm betting the fix comes from different packages depending on which CPU you have, which would explain the differing experiences we have. So, 3400g = fixed in 5.13. Everything else seems to be fixed via downgrade of linux-firmware.

EDIT:

Just crashed again, but that was the longest period of stability since I started seeing the issue. Am currently building 5.13-rc3, will report back.

EDIT 2:

Built 5.13-rc4 instead, still crashing.

Last edited by cyberrumor (2021-06-01 15:03:01)

Offline

#45 2021-05-31 07:40:12

7nm
Member
Registered: 2019-02-08
Posts: 15

Re: AMD Ryzen 2700U graphics errors

FWIW, this issue isn't fixed for me yet. I've tried using almost every kernel in the LTS series from 5.10.23 to 5.10.41, 5.11.x, and 5.12.x but system freezes still happen. They happen less frequently in 5.12.x but they still happen.

Here are some logs.

https://paste.ack.tf/d5db46

https://paste.ack.tf/715fcc

Offline

#46 2021-05-31 16:22:57

srslizness
Member
Registered: 2021-05-14
Posts: 12

Re: AMD Ryzen 2700U graphics errors

7nm wrote:

FWIW, this issue isn't fixed for me yet. I've tried using almost every kernel in the LTS series from 5.10.23 to 5.10.41, 5.11.x, and 5.12.x but system freezes still happen. They happen less frequently in 5.12.x but they still happen.

Here are some logs.

https://paste.ack.tf/d5db46

https://paste.ack.tf/715fcc

Did you also downgrade linux-firmware?

Offline

#47 2021-06-01 20:28:25

7nm
Member
Registered: 2019-02-08
Posts: 15

Re: AMD Ryzen 2700U graphics errors

srslizness wrote:

Did you also downgrade linux-firmware?

To which version should I downgrade? The oldest version I have locally is linux-firmware-20210315.

Also, here's another system freeze log from today. Kernel version 5.12.8-arch1-1

https://paste.ack.tf/7343c3

EDIT: Another day, another crash

[86728.466221] myhost kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[86733.586224] myhost kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[86733.586361] myhost kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered
[86738.706215] myhost kernel: [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for fences timed out!
[86743.616759] myhost kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout, but soft recovered

So much for AMDGPU being better for Linux. My system freezes everyday and I have to do hard reboots.

Last edited by 7nm (2021-06-02 06:12:42)

Offline

#48 2021-06-02 06:50:52

LucidComplex
Member
Registered: 2014-08-21
Posts: 12

Re: AMD Ryzen 2700U graphics errors

I have my linux-firmware at version 20210315.3568f96-2. No crashes since. I've since ignored that package and am updating fine.

Offline

#49 2021-06-02 12:17:29

7nm
Member
Registered: 2019-02-08
Posts: 15

Re: AMD Ryzen 2700U graphics errors

LucidComplex wrote:

I have my linux-firmware at version 20210315.3568f96-2. No crashes since. I've since ignored that package and am updating fine.

Ok, I'll downgrade to 20210315.3568f96-2 and report back if there are any improvements.

My laptop is basically a big paperweight right now and journalctl is filled with amdgpu errors.

Offline

#50 2021-06-03 02:09:36

lpr1
Member
Registered: 2017-10-08
Posts: 68

Re: AMD Ryzen 2700U graphics errors

Just posting as reminder to myself and confirmation, it seems like it's Vega related, R3200G, iGPU used, crashing when using Chromium. Downgraded to "linux-firmware-20210426", let's see how it goes.

Using this PC for ~year now, never had such issue.

Offline

Board footer

Powered by FluxBB