You are not logged in.
Hello,
Since a recent update (not sure exactly when), when playing video in Firefox, the amdgpu module crash and the system is not responding until a hard reboot. The crash is random, it can be after 1 second or 10 minutes. I don't know where to search to find how to fix it. Here are some info on my config:
CPU and GPU are AMD with integrated graphics:
$ lshw -class cpu
*-cpu
description: CPU
produit: AMD Ryzen 7 7840U w/ Radeon 780M Graphics
fabriquant: Advanced Micro Devices [AMD]
identifiant matériel: 4
information bus: cpu@0
version: 25.116.1
numéro de série: Unknown
emplacement: FP8
taille: 1333MHz
capacité: 5132MHz
bits: 64 bits
horloge: 100MHz
fonctionnalités: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp x86-64 constant_tsc rep_good amd_lbr_v2 nopl xtopology nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d amd_lbr_pmc_freeze cpufreq
configuration: cores=8 enabledcores=8 microcode=175128836 threads=16
Up to date version for kernel, firefox, ffpmeg and mesa packages:
$ uname -a
Linux dagobah 6.10.2-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 27 Jul 2024 16:49:55 +0000 x86_64 GNU/Linux
$ pacman -Q firefox
firefox 128.0.3-1
$ pacman -Q ffmpeg
ffmpeg 2:7.0.1-2
$ pacman -Q mesa
mesa 1:24.1.5-1
Here is an exacted dmesg log for the crash (complete dmesg is here: https://www.claudex.be/owncloud/index.p … FomktxiMBJ ):
aoû 03 13:15:02 dagobah kernel: ------------[ cut here ]------------
aoû 03 13:15:02 dagobah kernel: WARNING: CPU: 6 PID: 119 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:630 amdgpu_irq_put+0x46/0x70 [amdgpu]
aoû 03 13:15:02 dagobah kernel: Modules linked in: snd_seq_dummy snd_hrtimer rfcomm snd_seq snd_seq_device cmac algif_hash algif_skcipher af_alg bnep dm_crypt cbc encrypted_keys trusted asn1_encoder vfat fat intel_rapl_msr amd_atl intel_rapl_common snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof mt7921e snd_sof_utils mt7921_common snd_pci_ps snd_hda_codec_realtek mt792x_lib snd_amd_sdw_acpi soundwire_amd kvm_amd mt76_connac_lib snd_hda_codec_generic soundwire_generic_allocation snd_hda_scodec_component snd_hda_codec_hdmi mt76 soundwire_bus snd_hda_intel mousedev kvm snd_soc_core snd_intel_dspcfg snd_compress snd_intel_sdw_acpi crct10dif_pclmul crc32_pclmul ac97_bus mac80211 hid_sensor_als polyval_clmulni snd_pcm_dmaengine snd_hda_codec polyval_generic snd_rpl_pci_acp6x hid_sensor_trigger snd_acp_pci industrialio_triggered_buffer gf128mul snd_hda_core snd_acp_legacy_common kfifo_buf libarc4 ghash_clmulni_intel sha512_ssse3 snd_pci_acp6x snd_hwdep
aoû 03 13:15:02 dagobah kernel: cros_usbpd_charger hid_sensor_iio_common cros_ec_debugfs cros_ec_chardev cros_usbpd_logger cros_usbpd_notify cros_ec_sysfs gpio_cros_ec industrialio sha256_ssse3 btusb snd_pcm sha1_ssse3 cros_ec_dev cfg80211 btrtl snd_pci_acp5x aesni_intel snd_timer snd_rn_pci_acp3x btintel joydev hid_sensor_hub hid_multitouch amd_pmf crypto_simd ucsi_acpi btbcm snd_acp_config snd hid_generic typec_ucsi cryptd amdtee btmtk sp5100_tco cros_ec_lpcs snd_soc_acpi cros_ec bluetooth rapl wmi_bmof pcspkr thunderbolt typec soundcore ccp snd_pci_acp3x rfkill k10temp i2c_piix4 roles amd_sfh i2c_hid_acpi platform_profile i2c_hid tee amd_pmc mac_hid pkcs8_key_parser i2c_dev crypto_user loop nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 amdgpu dm_mod amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec serio_raw gpu_sched atkbd libps2 drm_suballoc_helper vivaldi_fmap nvme drm_buddy drm_display_helper nvme_core xhci_pci crc32c_intel i8042 video cec xhci_pci_renesas nvme_auth serio wmi
aoû 03 13:15:02 dagobah kernel: CPU: 6 PID: 119 Comm: kworker/u64:2 Tainted: G W 6.10.2-arch1-1 #1 a727c214dbee27eb0624871a8199f6116f5b74c2
aoû 03 13:15:02 dagobah kernel: Hardware name: Framework Laptop 13 (AMD Ryzen 7040Series)/FRANMDCP07, BIOS 03.05 03/29/2024
aoû 03 13:15:02 dagobah kernel: Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
aoû 03 13:15:02 dagobah kernel: RIP: 0010:amdgpu_irq_put+0x46/0x70 [amdgpu]
aoû 03 13:15:02 dagobah kernel: Code: c0 74 33 48 8b 4e 10 48 83 39 00 74 29 89 d1 48 8d 04 88 8b 08 85 c9 74 11 f0 ff 08 74 07 31 c0 e9 1a 8a 54 e9 e9 5a fd ff ff <0f> 0b b8 ea ff ff ff e9 09 8a 54 e9 b8 ea ff ff ff e9 ff 89 54 e9
aoû 03 13:15:02 dagobah kernel: RSP: 0018:ffffb842c0597ca0 EFLAGS: 00010246
aoû 03 13:15:02 dagobah kernel: RAX: ffff9f97c6765988 RBX: ffff9f97d2400000 RCX: 0000000000000000
aoû 03 13:15:02 dagobah kernel: RDX: 0000000000000000 RSI: ffff9f97d2400c60 RDI: ffff9f97d2400000
aoû 03 13:15:02 dagobah kernel: RBP: ffff9f97d2400000 R08: 0000000000000000 R09: 0000000000000006
aoû 03 13:15:02 dagobah kernel: R10: ffffb842c4bcf000 R11: ffffb842c4bcf000 R12: 0000000000001050
aoû 03 13:15:02 dagobah kernel: R13: ffff9f97d2444928 R14: ffff9f98c7e13000 R15: ffff9f97d24105e8
aoû 03 13:15:02 dagobah kernel: FS: 0000000000000000(0000) GS:ffff9f9f1e500000(0000) knlGS:0000000000000000
aoû 03 13:15:02 dagobah kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
aoû 03 13:15:02 dagobah kernel: CR2: 000000ba05612000 CR3: 0000000529220000 CR4: 0000000000f50ef0
aoû 03 13:15:02 dagobah kernel: PKRU: 55555554
aoû 03 13:15:02 dagobah kernel: Call Trace:
aoû 03 13:15:02 dagobah kernel: <TASK>
aoû 03 13:15:02 dagobah kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: ? __warn.cold+0x8e/0xe8
aoû 03 13:15:02 dagobah kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: ? report_bug+0xff/0x140
aoû 03 13:15:02 dagobah kernel: ? handle_bug+0x3c/0x80
aoû 03 13:15:02 dagobah kernel: ? exc_invalid_op+0x17/0x70
aoû 03 13:15:02 dagobah kernel: ? asm_exc_invalid_op+0x1a/0x20
aoû 03 13:15:02 dagobah kernel: ? amdgpu_irq_put+0x46/0x70 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: gmc_v11_0_hw_fini+0x24/0x90 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: gmc_v11_0_suspend+0xe/0x20 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: amdgpu_device_ip_suspend_phase2+0x10c/0x1a0 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: ? amdgpu_device_ip_suspend_phase1+0x70/0xd0 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: amdgpu_device_ip_suspend+0x40/0x70 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: amdgpu_device_pre_asic_reset+0xd0/0x290 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: amdgpu_device_gpu_recover.cold+0x465/0xacc [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: amdgpu_job_timedout+0x18e/0x1d0 [amdgpu 87e5d16f77823e9f47fc89f193778492a8b9aa26]
aoû 03 13:15:02 dagobah kernel: drm_sched_job_timedout+0x7e/0x110 [gpu_sched 0cbff37b8d4e86680ac8ca4c08ff4097f61b1dba]
aoû 03 13:15:02 dagobah kernel: process_one_work+0x17b/0x330
aoû 03 13:15:02 dagobah kernel: worker_thread+0x2e2/0x410
aoû 03 13:15:02 dagobah kernel: ? __pfx_worker_thread+0x10/0x10
aoû 03 13:15:02 dagobah kernel: kthread+0xcf/0x100
aoû 03 13:15:02 dagobah kernel: ? __pfx_kthread+0x10/0x10
aoû 03 13:15:02 dagobah kernel: ret_from_fork+0x31/0x50
aoû 03 13:15:02 dagobah kernel: ? __pfx_kthread+0x10/0x10
aoû 03 13:15:02 dagobah kernel: ret_from_fork_asm+0x1a/0x30
aoû 03 13:15:02 dagobah kernel: </TASK>
aoû 03 13:15:02 dagobah kernel: ---[ end trace 0000000000000000 ]---
Thanks for your help
Last edited by claudex (2024-08-11 08:06:38)
Offline
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32789)
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: in process RDD Process pid 2259 thread firefox:cs0 pid 2284)
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: in page starting at address 0x00008001249e9000 from client 18
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00103A11
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x1d)
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: MORE_FAULTS: 0x1
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: WALKER_ERROR: 0x0
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: PERMISSION_FAULTS: 0x1
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: MAPPING_ERROR: 0x0
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: RW: 0x0
aoû 03 13:14:40 dagobah kernel: amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:1 pasid:32789)
Is likely where the error originates .
Your commandline has amdgpu.sg_display=0 .
Does it help if you remove that ?
How much memory is the iGPU allowed to use and does increasing the amount make a difference ?
For X the output of glxinfo -B will help, for wayland use eglinfo -B
(both come with mesa-utils package)
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
Your commandline has amdgpu.sg_display=0 .
Thanks, I completely forgot I had set up this command line, I had to use it because the screen flickered. But it seems to be resolved now I removed it. I will test some videos and report if it changed something.
How much memory is the iGPU allowed to use and does increasing the amount make a difference ?
I have 512MiB. I cannot change it with my framework laptop
For X the output of glxinfo -B will help, for wayland use eglinfo -B
Here it is:
$ eglinfo -B
GBM platform:
_amdgpu_device_initialize: amdgpu_query_info(ACCEL_WORKING) failed (-13)
amdgpu: amdgpu_device_initialize failed.
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
OpenGL core profile vendor: Mesa
OpenGL core profile renderer: llvmpipe (LLVM 18.1.8, 256 bits)
OpenGL core profile version: 4.5 (Core Profile) Mesa 24.1.5-arch1.1
OpenGL core profile shading language version: 4.50
OpenGL compatibility profile vendor: Mesa
OpenGL compatibility profile renderer: llvmpipe (LLVM 18.1.8, 256 bits)
OpenGL compatibility profile version: 4.5 (Compatibility Profile) Mesa 24.1.5-arch1.1
OpenGL compatibility profile shading language version: 4.50
OpenGL ES profile vendor: Mesa
OpenGL ES profile renderer: llvmpipe (LLVM 18.1.8, 256 bits)
OpenGL ES profile version: OpenGL ES 3.2 Mesa 24.1.5-arch1.1
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
Wayland platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
OpenGL core profile vendor: AMD
OpenGL core profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL core profile version: 4.6 (Core Profile) Mesa 24.1.5-arch1.1
OpenGL core profile shading language version: 4.60
OpenGL compatibility profile vendor: AMD
OpenGL compatibility profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL compatibility profile version: 4.6 (Compatibility Profile) Mesa 24.1.5-arch1.1
OpenGL compatibility profile shading language version: 4.60
OpenGL ES profile vendor: AMD
OpenGL ES profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL ES profile version: OpenGL ES 3.2 Mesa 24.1.5-arch1.1
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
X11 platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
OpenGL core profile vendor: AMD
OpenGL core profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL core profile version: 4.6 (Core Profile) Mesa 24.1.5-arch1.1
OpenGL core profile shading language version: 4.60
OpenGL compatibility profile vendor: AMD
OpenGL compatibility profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL compatibility profile version: 4.6 (Compatibility Profile) Mesa 24.1.5-arch1.1
OpenGL compatibility profile shading language version: 4.60
OpenGL ES profile vendor: AMD
OpenGL ES profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL ES profile version: OpenGL ES 3.2 Mesa 24.1.5-arch1.1
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
Surfaceless platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
OpenGL core profile vendor: AMD
OpenGL core profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL core profile version: 4.6 (Core Profile) Mesa 24.1.5-arch1.1
OpenGL core profile shading language version: 4.60
OpenGL compatibility profile vendor: AMD
OpenGL compatibility profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL compatibility profile version: 4.6 (Compatibility Profile) Mesa 24.1.5-arch1.1
OpenGL compatibility profile shading language version: 4.60
OpenGL ES profile vendor: AMD
OpenGL ES profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL ES profile version: OpenGL ES 3.2 Mesa 24.1.5-arch1.1
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
Device platform:
Device #0:
Platform Device platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
OpenGL core profile vendor: AMD
OpenGL core profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL core profile version: 4.6 (Core Profile) Mesa 24.1.5-arch1.1
OpenGL core profile shading language version: 4.60
OpenGL compatibility profile vendor: AMD
OpenGL compatibility profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL compatibility profile version: 4.6 (Compatibility Profile) Mesa 24.1.5-arch1.1
OpenGL compatibility profile shading language version: 4.60
OpenGL ES profile vendor: AMD
OpenGL ES profile renderer: AMD Radeon 780M (radeonsi, gfx1103_r1, LLVM 18.1.8, DRM 3.57, 6.10.2-arch1-1)
OpenGL ES profile version: OpenGL ES 3.2 Mesa 24.1.5-arch1.1
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
Device #1:
Platform Device platform:
EGL API version: 1.5
EGL vendor string: Mesa Project
EGL version string: 1.5
EGL client APIs: OpenGL OpenGL_ES
OpenGL core profile vendor: Mesa
OpenGL core profile renderer: llvmpipe (LLVM 18.1.8, 256 bits)
OpenGL core profile version: 4.5 (Core Profile) Mesa 24.1.5-arch1.1
OpenGL core profile shading language version: 4.50
OpenGL compatibility profile vendor: Mesa
OpenGL compatibility profile renderer: llvmpipe (LLVM 18.1.8, 256 bits)
OpenGL compatibility profile version: 4.5 (Compatibility Profile) Mesa 24.1.5-arch1.1
OpenGL compatibility profile shading language version: 4.50
OpenGL ES profile vendor: Mesa
OpenGL ES profile renderer: llvmpipe (LLVM 18.1.8, 256 bits)
OpenGL ES profile version: OpenGL ES 3.2 Mesa 24.1.5-arch1.1
OpenGL ES profile shading language version: OpenGL ES GLSL ES 3.20
Offline
The issue is still there. I have found a similar issue which says that this happen with a outdated firmware. However I have a more recent firmware than listed, so it should work. Anyways, I have manually installed the latest amdgpu firwmares since they were updated since the last release (which is older than the kernel 6.10 release) and I see if it changes anything.
https://gitlab.freedesktop.org/mesa/mesa/-/issues/11414
https://wiki.debian.org/InstallingDebia … es#Display
Offline
The firmware doesn't change anything. However, it seems that is more frequent (or only) on battery with the power profile set to power-saver. With the power profile on performance, it doesn't crash. I don't know what parameter could narrow the issue.
Offline
I also seem to be getting the same error since updating recently (updated on Aug 3rd, prior update was likely on the 20th Jul), however I'm running on a 7800x3d with the iGPU for output (while using a dGPU with "PRIME" for 3D tasks) so I'm unsure how the battery or power profiles play in to my similar situation since I'm on a desktop platform.
Same version of firefox (128.0.3-1), mesa (1:24.1.5-1) and crashes as mentioned are randomly timed (about 30 min to 1hr for me), have tried downgrading mesa to 24.1.3 but saw no change in crashing and older versions of firefox refuses to load newer profiles on launch so haven't yet tested it with the manual override.
Strangely I also seem to get green blocky artifacts when decoding vp9 content on youtube that sometimes increases in frequency/spread and a crash follows soon after but HEVC/x265 content in mpv is stable (no crashes or artifacts). Can also confirm that removing amdgpu.sg_display=0 from the kernel options made no difference (same reasoning for including it in the first place) and maybe the issue is related to hardware decode in firefox given that mpv or any other workload is stable?
Looks to be nearly, if not exactly, identical in the log:
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32783)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in page starting at address 0x000080010fc38000 from client 0x12 (VMC)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00203811
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: Faulty UTCL2 client ID: VCN (0x1c)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MORE_FAULTS: 0x1
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PERMISSION_FAULTS: 0x1
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RW: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32783)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in page starting at address 0x000080010fc3a000 from client 0x12 (VMC)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: Faulty UTCL2 client ID: MP0 (0x0)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MORE_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RW: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32783)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in page starting at address 0x000080010fa74000 from client 0x12 (VMC)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: Faulty UTCL2 client ID: MP0 (0x0)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MORE_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RW: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32783)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in page starting at address 0x000080010fa70000 from client 0x12 (VMC)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: Faulty UTCL2 client ID: MP0 (0x0)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MORE_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RW: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32783)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in page starting at address 0x000080010fa78000 from client 0x12 (VMC)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: Faulty UTCL2 client ID: MP0 (0x0)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MORE_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RW: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:8 vmid:2 pasid:32783)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: in page starting at address 0x000080010fa79000 from client 0x12 (VMC)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00000000
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: Faulty UTCL2 client ID: MP0 (0x0)
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MORE_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: WALKER_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PERMISSION_FAULTS: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MAPPING_ERROR: 0x0
Aug 05 15:00:14 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RW: 0x0
Aug 05 15:00:24 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring vcn_dec_0 timeout, signaled seq=167531, emitted seq=167534
Aug 05 15:00:24 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process RDD Process pid 5760 thread firefox:cs0 pid 5947
Aug 05 15:00:24 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: GPU reset begin!
Aug 05 15:00:24 desktop kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 05 15:00:24 desktop kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x000003e0 != 0x00000320n
Aug 05 15:00:24 desktop kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 05 15:00:24 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: MODE2 reset
Aug 05 15:00:24 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: GPU reset succeeded, trying to resume
Aug 05 15:00:24 desktop kernel: [drm] PCIE GART of 1024M enabled (table at 0x000000F41FC00000).
Aug 05 15:00:24 desktop kernel: [drm] VRAM is lost due to GPU reset!
Aug 05 15:00:24 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: PSP is resuming...
Aug 05 15:00:24 desktop systemd-udevd[750]: /etc/udev/rules.d/51-android.rules:1 Invalid key 'UBSYSTEM'.
Aug 05 15:00:24 desktop systemd-udevd[750]: /etc/udev/rules.d/65-kvm.rules:1 Ignoring NAME="%k", as it will take no effect.
Aug 05 15:00:24 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: reserve 0xa00000 from 0xf41e000000 for PSP TMR
Aug 05 15:00:25 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RAS: optional ras ta ucode is not available
Aug 05 15:00:25 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: RAP: optional rap ta ucode is not available
Aug 05 15:00:25 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: SECUREDISPLAY: securedisplay ta ucode is not available
Aug 05 15:00:25 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: SMU is resuming...
Aug 05 15:00:25 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: SMU is resumed successfully!
Aug 05 15:00:25 desktop kernel: [drm] DMUB hardware initialized: version=0x05001900
Aug 05 15:00:26 desktop kernel: amdgpu 0000:6b:00.0: [drm] enabling link 1 failed: 15
Aug 05 15:00:26 desktop kernel: [drm] kiq ring mec 2 pipe 1 q 0
Aug 05 15:00:26 desktop kernel: amdgpu 0000:6b:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec_0 test failed (-110)
Aug 05 15:00:26 desktop kernel: [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block <vcn_v3_0> failed -110
Aug 05 15:00:26 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: GPU reset(1) failed
Aug 05 15:00:26 desktop kernel: amdgpu 0000:6b:00.0: amdgpu: GPU reset end with ret = -110
Aug 05 15:00:26 desktop kernel: [drm:amdgpu_job_timedout [amdgpu]] *ERROR* GPU Recovery Failed: -110
Aug 05 15:00:27 desktop kernel: [drm] Register(0) [mmUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Aug 05 15:00:27 desktop kernel: [drm] Register(0) [mmUVD_RBC_RB_RPTR] failed to reach value 0x00000010 != 0x00000000n
Offline
I also see this error since kernel 6.10. There is a workaround patch from https://gitlab.freedesktop.org/drm/amd/-/issues/2339:
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index c556c8b65..e8c6092e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -180,8 +180,9 @@ void amdgpu_bo_placement_from_domain(struct amdgpu_bo *abo, u32 domain)
* When GTT is just an alternative to VRAM make sure that we
* only use it as fallback and still try to fill up VRAM first.
*/
- if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM)
- places[c].flags |= TTM_PL_FLAG_FALLBACK;
+ if (domain & abo->preferred_domains & AMDGPU_GEM_DOMAIN_VRAM &&
+ !(adev->flags & AMD_IS_APU))
+ places[c].flags |= TTM_PL_FLAG_FALLBACK;
c++;
}
--
Or don't use the 6.10-series for now.
Offline
From my understanding of this issue https://gitlab.freedesktop.org/drm/amd/-/issues/3437, it's not the root cause, there is a mesa patch https://gitlab.freedesktop.org/mesa/mes … ests/30510 I'll try to build it if I have time before the release of 24.2.
Offline
FWIW, I built/installed mesa-git (2 days ago or @ab72be6c5e9) and it's been fully stable so far for me, I guess until 24.2 comes out this might be a simple workaround.
Offline
I've build the staging branch for 24.1 (which should result i 24.1.6) to limit other issues, and it also seems stable for the moment. I'll make some tests to see if it is stable for some hours https://gitlab.freedesktop.org/mesa/mes … type=heads
Offline