You are not logged in.

#76 2023-10-21 06:50:10

seth
Member
Registered: 2012-09-03
Posts: 58,155

Re: Random GPU crashes amdgpu across 2 different GPUs

I've tried various kernel parameters. Nothing has helped.

https://bbs.archlinux.org/viewtopic.php?id=57855

Did you try https://bbs.archlinux.org/viewtopic.php … 5#p2124945 ?

Offline

#77 2023-10-21 10:18:48

peldax
Member
Registered: 2023-10-08
Posts: 3

Re: Random GPU crashes amdgpu across 2 different GPUs

Here is a snippet of relevant journal log from my retry after updating to kernel 6.5.8. I retry after every kernel release because I suspect an amdgpu issue. The error message is the same as has always been.

říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:7 pasid:32779, for process WolfOldBlood_x6 pid 8887 thread WolfOldBlo:cs0 pid 8938)
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:   in page starting at address 0x000003c942200000 from client 0x1b (UTCL2)
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00701430
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:          Faulty UTCL2 client ID: SQC (data) (0xa)
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:          MORE_FAULTS: 0x0
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:          WALKER_ERROR: 0x0
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:          PERMISSION_FAULTS: 0x3
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:          MAPPING_ERROR: 0x0
říj 21 12:02:54 kernel: amdgpu 0000:0c:00.0: amdgpu:          RW: 0x0
říj 21 12:03:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=111964, emitted seq=111966
říj 21 12:03:04 kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process WolfOldBlood_x6 pid 8887 thread WolfOldBlo:cs0 pid 8938
říj 21 12:03:04 kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset begin!
říj 21 12:03:08 kernel: amdgpu 0000:0c:00.0: amdgpu: failed to suspend display audio
říj 21 12:03:08 kernel: amdgpu 0000:0c:00.0: amdgpu: MODE1 reset
říj 21 12:03:08 kernel: amdgpu 0000:0c:00.0: amdgpu: GPU mode1 reset
říj 21 12:03:08 kernel: amdgpu 0000:0c:00.0: amdgpu: GPU smu mode1 reset
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset succeeded, trying to resume
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resuming...
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: smu driver if version = 0x00000040, smu fw if version = 0x00000041, smu fw program = 0, version = 0x003a5800 (58.88.0)
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: SMU driver if version not matched
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: use vbios provided pptable
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: SMU is resumed successfully!
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 5 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 6 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 7 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 8 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 9 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 10 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring kiq_0.2.1.0 uses VM inv eng 11 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma1 uses VM inv eng 13 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma2 uses VM inv eng 14 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring sdma3 uses VM inv eng 15 on hub 0
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_dec_0 uses VM inv eng 0 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_0.0 uses VM inv eng 1 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_0.1 uses VM inv eng 4 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_dec_1 uses VM inv eng 5 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_1.0 uses VM inv eng 6 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring vcn_enc_1.1 uses VM inv eng 7 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: ring jpeg_dec uses VM inv eng 8 on hub 8
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow start
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: recover vram bo from shadow done
říj 21 12:03:09 kernel: amdgpu 0000:0c:00.0: amdgpu: GPU reset(2) succeeded!

Offline

#78 2023-10-25 17:52:22

Rena
Member
From: Ontario
Registered: 2015-05-08
Posts: 13
Website

Re: Random GPU crashes amdgpu across 2 different GPUs

This is the first I've heard of amd_iommu. Where do people find these?

If it happens again I'll post the full log, but for now it's "fixed" by downgrading some packages and using LTS kernel:

composable-kernel 5.5.1-2
xf86-video-amdgpu 21.0.0-2
vulkan-radeon 23.1.0-1
amdvlk 2023.Q3.2-1
mesa 23.1.0-1

Probably not all of these needed to be downgraded (or are needed at all) but this is what got things back to being usable for now.

Here's the part of the log from when the first error appears to when I Magic SysRq:

Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32773, for pro
cess blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b
 (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00201431
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: SQC (data) (0xa)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x3
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:24 vmid:2 pasid:32773, for pro
cess blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000800010000000 from client 0x1b
 (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00241051
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: TCP (0x8)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x1
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x5
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x1
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for pro
cess blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b
 (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:2 pasid:32773, for process blender pid 3829 thread blender:cs0 pid 3856)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu:   in page starting at address 0x0000003f80000000 from client 0x1b (UTCL2)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 Faulty UTCL2 client ID: CB/DB (0x0)
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MORE_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 WALKER_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 PERMISSION_FAULTS: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 MAPPING_ERROR: 0x0
Oct 25 13:22:12 greymon kernel: amdgpu 0000:03:00.0: amdgpu: \x09 RW: 0x0

Last edited by Rena (2023-10-25 17:55:54)

Offline

#79 2023-11-04 06:41:15

Schlunze
Member
Registered: 2013-10-03
Posts: 53

Re: Random GPU crashes amdgpu across 2 different GPUs

my workaraound for the problem is to set the kernel parameter

amdgpu.ppfeaturemask=0xfffd3fff

works in my case for 7950x igpu

Offline

#80 2023-12-11 23:47:48

andyturfer
Member
Registered: 2021-01-08
Posts: 90

Re: Random GPU crashes amdgpu across 2 different GPUs

I installed Fedora 39 KDE on my laptop (about a month ago), and haven't experienced this issue a single time. I haven't added any special kernel parameters. Looks like this issue is specific to Arch Linux.

Offline

#81 2023-12-12 07:08:07

seth
Member
Registered: 2012-09-03
Posts: 58,155

Re: Random GPU crashes amdgpu across 2 different GPUs

zram, zswap or THP?
https://bbs.archlinux.org/viewtopic.php … 7#p2110357

zgrep TRANSPARENT_HUGEPAGE /proc/config.gz # on F39

Offline

#82 2023-12-13 01:47:16

andyturfer
Member
Registered: 2021-01-08
Posts: 90

Re: Random GPU crashes amdgpu across 2 different GPUs

seth wrote:

zram, zswap or THP?
https://bbs.archlinux.org/viewtopic.php … 7#p2110357

zgrep TRANSPARENT_HUGEPAGE /proc/config.gz # on F39

There is no /proc/config.gz on my Fedora 39 install.

From the "THP" link you sent:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
always [madvise] never
$ cat /sys/kernel/mm/transparent_hugepage/defrag
always defer defer+madvise [madvise] never
$ cat /proc/sys/vm/compaction_proactiveness
20

On another laptop I have running Arch (with KDE), the first value is different:

$ cat /sys/kernel/mm/transparent_hugepage/enabled
[always] madvise never

I don't know how to check zram or zswap.

Offline

#83 2023-12-13 07:41:27

seth
Member
Registered: 2012-09-03
Posts: 58,155

Re: Random GPU crashes amdgpu across 2 different GPUs

https://forums.fedoraforum.org/showthre … ra-kernels
Great.

But you could simply try to follow the lead and set THP to "never" as a broadsword attempt and see whether that stabilizes the system.

https://wiki.archlinux.org/title/Zswap
https://wiki.archlinux.org/title/Zram

cat /sys/module/zswap/parameters/enabled
swapon # prints all swapt devices

Offline

#84 2023-12-13 14:17:02

andyturfer
Member
Registered: 2021-01-08
Posts: 90

Re: Random GPU crashes amdgpu across 2 different GPUs

seth wrote:

https://forums.fedoraforum.org/showthre … ra-kernels
Great.

But you could simply try to follow the lead and set THP to "never" as a broadsword attempt and see whether that stabilizes the system.

https://wiki.archlinux.org/title/Zswap
https://wiki.archlinux.org/title/Zram

cat /sys/module/zswap/parameters/enabled
swapon # prints all swapt devices

I'll try this as soon as I find some time to re-install Arch on this system (it's currently running Fedora).

Offline

#85 2023-12-13 14:29:45

seth
Member
Registered: 2012-09-03
Posts: 58,155

Re: Random GPU crashes amdgpu across 2 different GPUs

The latter two things you certainly want to record for the fedora system to sport differences itr.

Offline

#86 2023-12-21 00:02:02

orlfman
Member
Registered: 2007-11-20
Posts: 141

Re: Random GPU crashes amdgpu across 2 different GPUs

i opened a report on amd's drm gitlab here: https://gitlab.freedesktop.org/drm/amd/-/issues/3067

i've been having these issues too.

Offline

#87 2024-01-07 22:21:37

duskfox
Member
Registered: 2023-10-06
Posts: 2

Re: Random GPU crashes amdgpu across 2 different GPUs

duskfox wrote:
UFODivebomb wrote:

I've been encountering these crashes as well. Only in Baldur's Gate 3. T_T

`amd_iommu=off` reduced the frequency of the crashes but they still occurred.

Trying `amdgpu.mcbp=0`.

Same issue with Baldur's Gate 3, I'm using both amd_iommu=off and amdgpu.mcbp=0 crash are reduced in frequency but they still happen.

The issue happened again on kernel 6.6.10-arch1-1, I had removed both boot parameters since because on 6.6 I never had a crash and I thought the issue was resolved upstream.
In the following days I'll retry again with the parameters.

Offline

#88 2024-06-14 15:12:53

hezz
Member
Registered: 2024-06-14
Posts: 1

Re: Random GPU crashes amdgpu across 2 different GPUs

Hi folks, wondering if anybody has tried to rule out a faulty HDMI cable that could have caused the issue. I was constantly receiving same page fault errors on a laptop with Radeon 6600 running Fedora. After changing the external monitor connection from HDMI to DP the issue seems to have gone.

Offline

Board footer

Powered by FluxBB