You are not logged in.
Based on my confused understanding trying to follow all this, the patch that fixes freezes on 'Mesa chipset detection*: raven' hardware has been merged. Therefore the AUR 'mesa-git' package has this fix.
Implying 'mesa-git' may have the patch applied that Lone_Wolf used in his latest patched mesa.
I git cloned the mesa source from: https://gitlab.freedesktop.org/mesa/mesa.git
Used my 'git-rollback' script to display the commits because I don't really know how to use git for anything useful.
The output was huge, so copied it to file for grep, and eventually come up with this:
commit 3b78dcec058e85321f636f353ad5c23c986e3a11
Author: Marek Olšák <maraeo@gmail.com>
Date: Mon Jan 27 15:24:21 2025 -0500
radeonsi: disallow compute queues on Raven/Raven2 due to hangs
Fixes: 58b512ddd6e - radeonsi: execute clears at resource allocation using compute instead of gfx
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33248>
src/gallium/drivers/radeonsi/si_pipe.c
And the code applied to si_pipe.c: ( I've also verified this code was present in the clean chroot build.)
$ sed -n '518,526p;527q' mesa/src/gallium/drivers/radeonsi/si_pipe.c
sctx->has_graphics = sscreen->info.gfx_level == GFX6 ||
/* Compute queues hang on Raven and derivatives, see:
* https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310 */
((sscreen->info.family == CHIP_RAVEN ||
sscreen->info.family == CHIP_RAVEN2) &&
!sscreen->info.has_dedicated_vram) ||
!(flags & PIPE_CONTEXT_COMPUTE_ONLY);
Last night I built 'mesa-git' in a clean chroot and installed it.
Passed the initial freeze test including overnight play of an 8 hour, 4k vid in browser.
If this is correct, the 'mesa-git' AUR package may be an alternative solution for users needing a temp workaround until a fix is available in the official repos.
Any feedback regarding linux-amdgpu 6.13.arch1-2?
I can test this in a few hours...
Last edited by NuSkool (2025-01-31 20:21:44)
Scripts I use: https://github.com/Cody-Learner
Offline
linux-amdgpu 6.13.arch1-4 freezes. No messages in the log, though I didn't apply the suggested debug parameter from earlier. I'll try linux-amdgpu 6.13.arch1-2 next.
Offline
pacoandres wrote:I've been reading documentation but I can't find a way to force verbose logs for amdgpu. Does any one know if it's possible?
Try
echo 0xf > /sys/module/drm/parameters/debug
I had to use this due to permission issues:
echo '0xf' | sudo tee /sys/module/drm/parameters/debug
That results in this though. Is this expected?
sudo cat /sys/module/drm/parameters/debug
15
Before changing it, it had a single zero, '0'.
I can edit the file to contain just this '0xf' if necessary...
I'm currently testing: linux-amdgpu 6.13.arch1-2 with repo mesa.
Last edited by NuSkool (2025-01-31 22:07:10)
Scripts I use: https://github.com/Cody-Learner
Offline
That results in this though. Is this expected?
sudo cat /sys/module/drm/parameters/debug 15
Yes. 0xf in hexadecimal is 15 decimal.
Offline
Preliminary test results for:
linux-amdgpu 6.13.arch1-2 uname -rs: Linux 6.13.0-arch1-2-amdgpu
mesa 1:24.3.4-1
cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux-amdgpu root=UUID=60bc1026-da96-43b5-8963-eda5d63b8049 rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force
dmesg: http://0x0.st/88Gu.txt
OK - idle stability
OK - workflow stability
OK - glxgears window resizing
OK - vkcube window resizing
Ran a 4k fullscreen vid while running both glxgears, vkcube at the same time while resizing...
I'll run this setup for 24hr or until anything further to report.
Maybe this kernel is a solution you're looking for or the right direction for further work?
Last edited by NuSkool (2025-01-31 22:46:26)
Scripts I use: https://github.com/Cody-Learner
Offline
linux-amdgpu 6.13.arch1-2 seems to be working so far! I haven't been able to cause the freeze. Performance and gpu temperature looks normal, the test applications glxgears and vkcube resize fine. I'll keep running this and see if it stays working.
(Ryzen 5 2500U (Raven Ridge))
Offline
Maybe this kernel is a solution you're looking for or the right direction for further work?
At least this is an option. This build is adaptation on the change that AMD engineer is going to merge in the kernel. The differences are:
- changing of powergate state for Compute unit is applied for all AMD GPUs, not limiting for Raven;
- gfx_off timeout is removed, so the GPU runs into power efficient mode faster, which should reduce overall power consumption.
But I still think there might be a better option to fix the instability. Let's see if linux-ring-test 6.13.arch1-1 approach also works.
Offline
I'm also getting those freezes. A workaround that works for me is this when it happens
- press CTRL+ALT+1 and then CTRL+ALT+2 repeatedly for a couple of minutes until the screen comes back on - on gnome CTRL+ALT+2 should bring you back to the desktop without needing to re-input the password.
my system
- gpu: AMD 7900 XTX
- cpu: 7800X3Duname -a
Linux arch-pc 6.12.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 02 Jan 2025 22:52:26 +0000 x86_64 GNU/Linuxmesa 1:24.3.3-1
Jan 06 11:24:24 arch-pc syncthing[1825]: [4FNSN] INFO: Established secure connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK Jan 06 11:24:24 arch-pc syncthing[1825]: [4FNSN] INFO: Device REZZLIU client is "syncthing v1.27.3" named "Pixel 6a" at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK Jan 06 11:28:19 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users. Jan 06 11:28:19 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users. Jan 06 11:28:21 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users. Jan 06 11:28:21 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users. Jan 06 11:29:58 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out Jan 06 11:31:55 arch-pc syncthing[1825]: [4FNSN] INFO: Lost primary connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK: read timeout (0 remain) Jan 06 11:31:55 arch-pc syncthing[1825]: [4FNSN] INFO: Connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK closed: read timeout Jan 06 11:31:57 arch-pc syncthing[1825]: [4FNSN] INFO: Established secure connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-60O5O9OOK7OAU5E8G37OS9DNBU Jan 06 11:31:57 arch-pc syncthing[1825]: [4FNSN] INFO: Device REZZLIU client is "syncthing v1.27.3" named "Pixel 6a" at 10.10.10.5:22000-10.10.10.30:22000/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-60O5O9OOK7OAU5E8G37OS9DNBU Jan 06 11:32:22 arch-pc syncthing[1825]: [4FNSN] INFO: Established secure connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5OCMCEI48QBG9FDKOAD4O6E Jan 06 11:32:22 arch-pc syncthing[1825]: [4FNSN] INFO: Additional connection (+1) for device REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5OCMCEI48QBG9FDKOAD4O6E Jan 06 11:32:22 arch-pc syncthing[1825]: [4FNSN] INFO: Lost primary connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-60O5O9OOK7OAU5E8G37OS9DNBU: replacing connection (1 remain) Jan 06 11:35:28 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* flip_done timed out Jan 06 11:35:28 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:79:crtc-0] commit wait timed out Jan 06 11:35:38 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* flip_done timed out Jan 06 11:35:38 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [PLANE:76:plane-6] commit wait timed out Jan 06 11:35:38 arch-pc kernel: ------------[ cut here ]------------ Jan 06 11:35:38 arch-pc kernel: WARNING: CPU: 0 PID: 1179 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8622 amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu] Jan 06 11:35:38 arch-pc kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq amd_atl intel_rapl_msr intel_rapl_common amdgpu ext4 mbcache vfat jbd2 fat snd_hda_codec_hdmi mt7921e snd_hda_intel snd_intel_dspcfg mt7921_common amdxcp snd_intel_sdw_acpi btusb drm_exec snd_usb_audio mt792x_lib uvcvideo btrtl gpu_sched snd_hda_codec mt76_connac_lib snd_usbmidi_lib videobuf2_vmalloc drm_buddy kvm_amd btintel uvc snd_ump snd_hda_core mt76 btbcm videobuf2_memops i2c_algo_bit snd_rawmidi spd5118 drm_suballoc_helper videobuf2_v4l2 snd_hwdep snd_seq_device btmtk drm_ttm_helper kvm mac80211 atlantic ttm snd_pcm videobuf2_common bluetooth drm_display_helper snd_timer rapl macsec videodev libarc4 wmi_bmof pcspkr snd i2c_piix4 ptp cec k10temp i2c_smbus mc pl2303 soundcore crc16 pps_core cfg80211 mousedev gpio_amdpt joydev gpio_generic rfkill mac_hid loop nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables dm_crypt cbc encrypted_keys trusted asn1_encoder tee crct10dif_pclmul crc32_pclmul polyval_clmulni Jan 06 11:35:38 arch-pc kernel: polyval_generic ghash_clmulni_intel hid_generic sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul nvme crypto_simd cryptd ccp usbhid sp5100_tco nvme_core nvme_auth video wmi btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mod crypto_user Jan 06 11:35:38 arch-pc kernel: CPU: 0 UID: 0 PID: 1179 Comm: systemd-logind Not tainted 6.12.8-arch1-1 #1 099de49ddaebb26408f097c48b36e50b2c8e21c9 Jan 06 11:35:38 arch-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D69/MEG X670E ACE (MS-7D69), BIOS 1.90 08/10/2023 Jan 06 11:35:38 arch-pc kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu] Jan 06 11:35:38 arch-pc kernel: Code: 7c e2 e9 dc fd ff ff 49 8d 87 50 31 04 00 c6 85 38 fe ff ff 00 48 89 85 48 fe ff ff e9 d8 cb ff ff 0f 0b e9 fc f2 ff ff 0f 0b <0f> 0b e9 12 f3 ff ff 0f 0b e9 11 cc ff ff 48 c7 85 28 fe ff ff 00 Jan 06 11:35:38 arch-pc kernel: RSP: 0018:ffffae86c23a75b0 EFLAGS: 00010086 Jan 06 11:35:38 arch-pc kernel: RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffff8db801082118 Jan 06 11:35:38 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff8db842c80178 Jan 06 11:35:38 arch-pc kernel: RBP: ffffae86c23a7800 R08: ffffae86c23a749c R09: 0000000000000000 Jan 06 11:35:38 arch-pc kernel: R10: ffffae86c23a7508 R11: ffffae86c23a750c R12: ffffae86c23a7668 Jan 06 11:35:38 arch-pc kernel: R13: 0000000000000000 R14: ffff8dba3d761000 R15: ffff8db801082000 Jan 06 11:35:38 arch-pc kernel: FS: 00007cc05b0e2900(0000) GS:ffff8dc6d8200000(0000) knlGS:0000000000000000 Jan 06 11:35:38 arch-pc kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jan 06 11:35:38 arch-pc kernel: CR2: 0000714aae6c4400 CR3: 00000001407b8000 CR4: 0000000000f50ef0 Jan 06 11:35:38 arch-pc kernel: PKRU: 55555554 Jan 06 11:35:38 arch-pc kernel: Call Trace: Jan 06 11:35:38 arch-pc kernel: <TASK> Jan 06 11:35:38 arch-pc kernel: ? amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu fb97feb5a7216969a6c4e39cc61cb53691cdacb2] Jan 06 11:35:38 arch-pc kernel: ? __warn.cold+0x93/0xf6 Jan 06 11:35:38 arch-pc kernel: ? amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu fb97feb5a7216969a6c4e39cc61cb53691cdacb2] Jan 06 11:35:38 arch-pc kernel: ? report_bug+0xff/0x140 Jan 06 11:35:38 arch-pc kernel: ? handle_bug+0x58/0x90 Jan 06 11:35:38 arch-pc kernel: ? exc_invalid_op+0x17/0x70 Jan 06 11:35:38 arch-pc kernel: ? asm_exc_invalid_op+0x1a/0x20 Jan 06 11:35:38 arch-pc kernel: ? amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu fb97feb5a7216969a6c4e39cc61cb53691cdacb2] Jan 06 11:35:38 arch-pc kernel: commit_tail+0x91/0x130 Jan 06 11:35:38 arch-pc kernel: drm_atomic_helper_commit+0x11a/0x140 Jan 06 11:35:38 arch-pc kernel: drm_atomic_commit+0xa6/0xe0 Jan 06 11:35:38 arch-pc kernel: ? __pfx___drm_printfn_info+0x10/0x10 Jan 06 11:35:38 arch-pc kernel: drm_client_modeset_commit_atomic+0x203/0x250 Jan 06 11:35:38 arch-pc kernel: drm_client_modeset_commit_locked+0x5a/0x160 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: __drm_fb_helper_restore_fbdev_mode_unlocked+0x5e/0xd0 Jan 06 11:35:38 arch-pc kernel: drm_fb_helper_set_par+0x30/0x40 Jan 06 11:35:38 arch-pc kernel: fb_set_var+0x25c/0x460 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? update_load_avg+0x7e/0x7b0 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? sched_clock_cpu+0xf/0x1d0 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? psi_group_change+0x13b/0x310 Jan 06 11:35:38 arch-pc kernel: fbcon_blank+0x271/0x330 Jan 06 11:35:38 arch-pc kernel: do_unblank_screen+0xad/0x150 Jan 06 11:35:38 arch-pc kernel: complete_change_console+0x54/0x120 Jan 06 11:35:38 arch-pc kernel: vt_ioctl+0xec3/0x12c0 Jan 06 11:35:38 arch-pc kernel: tty_ioctl+0xe2/0x8a0 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? __seccomp_filter+0x303/0x520 Jan 06 11:35:38 arch-pc kernel: __x64_sys_ioctl+0x91/0xd0 Jan 06 11:35:38 arch-pc kernel: do_syscall_64+0x82/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? syscall_exit_to_user_mode+0x37/0x1c0 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? do_syscall_64+0x8e/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? evdev_ioctl+0x6f/0x90 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? syscall_exit_to_user_mode+0x37/0x1c0 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? do_syscall_64+0x8e/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? do_syscall_64+0x8e/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? do_syscall_64+0x8e/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? do_syscall_64+0x8e/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: ? do_syscall_64+0x8e/0x190 Jan 06 11:35:38 arch-pc kernel: ? srso_alias_return_thunk+0x5/0xfbef5 Jan 06 11:35:38 arch-pc kernel: entry_SYSCALL_64_after_hwframe+0x76/0x7e Jan 06 11:35:38 arch-pc kernel: RIP: 0033:0x7cc05ab23ced Jan 06 11:35:38 arch-pc kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00 Jan 06 11:35:38 arch-pc kernel: RSP: 002b:00007ffe55f13050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 Jan 06 11:35:38 arch-pc kernel: RAX: ffffffffffffffda RBX: 000000000000001f RCX: 00007cc05ab23ced Jan 06 11:35:38 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000005605 RDI: 000000000000001f Jan 06 11:35:38 arch-pc kernel: RBP: 00007ffe55f130a0 R08: 00007ffe55f13030 R09: 00005ce766311ed0 Jan 06 11:35:38 arch-pc kernel: R10: 00007ffe55f13080 R11: 0000000000000246 R12: 0000000000000000 Jan 06 11:35:38 arch-pc kernel: R13: 00007ffe55f13130 R14: 00005ce766310be0 R15: 00005ce766313120 Jan 06 11:35:38 arch-pc kernel: </TASK> Jan 06 11:35:38 arch-pc kernel: ---[ end trace 0000000000000000 ]--- Jan 06 11:35:38 arch-pc kernel: rfkill: input handler enabled Jan 06 11:35:38 arch-pc systemd-logind[1179]: New session 5 of user gdm. Jan 06 11:35:38 arch-pc gsd-media-keys[2064]: Unable to get default source Jan 06 11:35:38 arch-pc gsd-media-keys[2064]: Unable to get default sink Jan 06 11:35:38 arch-pc systemd[1]: Created slice User Slice of UID 120. Jan 06 11:35:38 arch-pc systemd[1]: Starting User Runtime Directory /run/user/120... Jan 06 11:35:39 arch-pc systemd[1]: Finished User Runtime Directory /run/user/120. Jan 06 11:35:39 arch-pc systemd[1]: Starting User Manager for UID 120... Jan 06 11:35:39 arch-pc (systemd)[7755]: pam_warn(systemd-user:setcred): function=[pam_sm_setcred] flags=0x8002 service=[systemd-user] terminal=[] user=[gdm] ruser=[<unknown>] rhost=[<unknown>] Jan 06 11:35:39 arch-pc (systemd)[7755]: pam_unix(systemd-user:session): session opened for user gdm(uid=120) by gdm(uid=0) Jan 06 11:35:39 arch-pc systemd-logind[1179]: New session 6 of user gdm. Jan 06 11:35:39 arch-pc systemd[7755]: Queued start job for default target Main User Target. Jan 06 11:35:39 arch-pc systemd[7755]: Created slice User Application Slice. Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Paths. Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Timers. Jan 06 11:35:39 arch-pc systemd[7755]: Starting D-Bus User Message Bus Socket... Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG network certificate management daemon. Jan 06 11:35:39 arch-pc systemd[7755]: Starting GCR ssh-agent wrapper... Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GNOME Keyring daemon. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers). Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent and passphrase cache (restricted). Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent (ssh-agent emulation). Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent and passphrase cache. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG public key management service. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on p11-kit server. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on PipeWire PulseAudio. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on PipeWire Multimedia System Sockets. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on D-Bus User Message Bus Socket. Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GCR ssh-agent wrapper. Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Sockets. Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Basic System. Jan 06 11:35:39 arch-pc systemd[1]: Started User Manager for UID 120. Jan 06 11:35:39 arch-pc systemd[7755]: Starting Update XDG user dir configuration... Jan 06 11:35:39 arch-pc systemd[1]: Started Session 5 of User gdm. Jan 06 11:35:39 arch-pc systemd[7755]: Finished Update XDG user dir configuration. Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Main User Target. Jan 06 11:35:39 arch-pc systemd[7755]: Startup finished in 158ms. Jan 06 11:35:39 arch-pc systemd[7755]: Created slice User Core Session Slice. Jan 06 11:35:39 arch-pc systemd[7755]: Starting D-Bus User Message Bus... Jan 06 11:35:39 arch-pc dbus-broker-launch[7777]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +31: Eavesdropping is deprecated and ignored Jan 06 11:35:39 arch-pc dbus-broker-launch[7777]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +33: Eavesdropping is deprecated and ignored Jan 06 11:35:39 arch-pc systemd[7755]: Started D-Bus User Message Bus. Jan 06 11:35:39 arch-pc dbus-broker-launch[7777]: Ready Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.freedesktop.systemd1' requested by ':1.2' (uid=120 pid=7782 comm="/usr/lib/gnome-session-binary --autostart /usr/sha") Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1 Jan 06 11:35:39 arch-pc gnome-session[7782]: gnome-session-binary[7782]: WARNING: Could not check if unit gnome-session-wayland@gnome-login.target is active: Error calling StartServiceByName for org.freedesktop.systemd1: Process org.freedesktop.systemd1 exited with status 1 Jan 06 11:35:39 arch-pc gnome-session-binary[7782]: WARNING: Could not check if unit gnome-session-wayland@gnome-login.target is active: Error calling StartServiceByName for org.freedesktop.systemd1: Process org.freedesktop.systemd1 exited with status 1 Jan 06 11:35:39 arch-pc gnome-session[7782]: gnome-session-binary[7782]: WARNING: Desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop for application orca-autostart.desktop could not be parsed or references a missing TryExec binary Jan 06 11:35:39 arch-pc gnome-session-binary[7782]: WARNING: Desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop for application orca-autostart.desktop could not be parsed or references a missing TryExec binary Jan 06 11:35:39 arch-pc gnome-shell[7794]: Running GNOME Shell (using mutter 47.3) as a Wayland display server Jan 06 11:35:39 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users. Jan 06 11:35:39 arch-pc rtkit-daemon[1410]: Successfully made thread 7812 of process 7794 owned by '120' high priority at nice level -15. Jan 06 11:35:39 arch-pc rtkit-daemon[1410]: Supervising 11 threads of 8 processes of 2 users. Jan 06 11:35:39 arch-pc gnome-shell[7794]: Thread 'KMS thread' will be using high priority scheduling Jan 06 11:35:39 arch-pc gnome-shell[7794]: Device '/dev/dri/card1' prefers shadow buffer Jan 06 11:35:39 arch-pc gnome-shell[7794]: Added device '/dev/dri/card1' (amdgpu) using atomic mode setting. Jan 06 11:35:39 arch-pc gnome-shell[7794]: Device '/dev/dri/card0' prefers shadow buffer Jan 06 11:35:39 arch-pc gnome-shell[7794]: Added device '/dev/dri/card0' (amdgpu) using atomic mode setting. Jan 06 11:35:39 arch-pc gnome-shell[7794]: Created gbm renderer for '/dev/dri/card1' Jan 06 11:35:39 arch-pc gnome-shell[7794]: Created gbm renderer for '/dev/dri/card0' Jan 06 11:35:39 arch-pc gnome-shell[7794]: Boot VGA GPU /dev/dri/card1 selected as primary Jan 06 11:35:39 arch-pc gnome-shell[7794]: Obtained a high priority EGL context Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.a11y.Bus' requested by ':1.4' (uid=120 pid=7794 comm="/usr/bin/gnome-shell") Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Successfully activated service 'org.a11y.Bus' Jan 06 11:35:39 arch-pc gnome-shell[7794]: Using public X11 display :1024, (using :1025 for managed services) Jan 06 11:35:39 arch-pc gnome-shell[7794]: Using Wayland display name 'wayland-0' Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7862]: dbus-daemon[7862]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=120 pid=7794 comm="/usr/bin/gnome-shell") Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7862]: dbus-daemon[7862]: Successfully activated service 'org.a11y.atspi.Registry' Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7865]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry Jan 06 11:35:39 arch-pc gnome-shell[7794]: Unset XDG_SESSION_ID, getCurrentSessionProxy() called outside a user session. Asking logind directly. Jan 06 11:35:39 arch-pc gnome-shell[7794]: Will monitor session 5 Jan 06 11:35:39 arch-pc systemd[1]: Starting Locale Service... Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.gnome.Shell.Screencast' requested by ':1.3' (uid=120 pid=7794 comm="/usr/bin/gnome-shell") Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.freedesktop.impl.portal.PermissionStore' requested by ':1.3' (uid=120 pid=7794 comm="/usr/bin/gnome-shell") Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Successfully activated service 'org.freedesktop.impl.portal.PermissionStore' Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.gnome.Shell.Notifications' requested by ':1.3' (uid=120 pid=7794 comm="/usr/bin/gnome-shell") Jan 06 11:35:39 arch-pc systemd[1]: Started Locale Service. Jan 06 11:35:39 arch-pc org.gnome.Shell.desktop[7794]: Window manager warning: Failed to parse saved session file: Failed to open file “/var/lib/gdm/.config/mutter/sessions/10569fa8b512a31f8b173618853922793800000077820000.ms”: No such file or directory Jan 06 11:35:39 arch-pc gnome-shell[7794]: Failed to launch ibus-daemon: Failed to execute child process “ibus-daemon” (No such file or directory) Jan 06 11:35:39 arch-pc gnome-shell[7794]: Error looking up permission: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for geolocation Jan 06 11:35:39 arch-pc systemd[1]: Starting Hostname Service... Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.freedesktop.systemd1' requested by ':1.20' (uid=120 pid=7981 comm="/usr/lib/gsd-sharing") Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1 Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Successfully activated service 'org.gnome.Shell.Notifications' Jan 06 11:35:39 arch-pc gsd-sharing[7981]: Failed to StopUnit service: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1 Jan 06 11:35:39 arch-pc gsd-sharing[7981]: Failed to StopUnit service: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1 Jan 06 11:35:39 arch-pc gnome-shell[7794]: Failed to create color profile from colord profile: Error opening file /home/fred/.local/share/icc/edid-3677f87b350ebb514504413952b427ce.icc: Permission denied Jan 06 11:35:39 arch-pc gnome-shell[7794]: No permission to control network connections: Polkit.Error: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: Action org.freedesktop.NetworkManager.network-control is not registered Jan 06 11:35:39 arch-pc systemd[1]: Starting Location Lookup Service... Jan 06 11:35:39 arch-pc systemd[7755]: Started PipeWire Multimedia Service. Jan 06 11:35:39 arch-pc systemd[7755]: Started Multimedia Service Session Manager. Jan 06 11:35:39 arch-pc systemd[7755]: Started PipeWire PulseAudio.
@beholder, could you try the mentioned builds on your configuration? You've mentioned your problem in https://bbs.archlinux.org/viewtopic.php … 1#p2218651 comment. And I can say that it has the same root cause.
Last edited by Mechanicus (2025-02-01 09:42:11)
Offline
Bad news, linux-ring-test 6.13.arch1-1 freeze while on a video meeting with firefox, okular and dolphin as the only opened windows.
No logs this time, just the 'amdgpu: Dumping IP State'. I forgot enable amdgpu logs.
When the meeting ends I'll test Linux 6.13.0-arch1-2-amdgpu
Last edited by pacoandres (2025-02-01 09:54:11)
Offline
Bad news, linux-ring-test 6.13.arch1-1 freeze while on a video meeting with firefox, okular and dolphin as the only opened windows.
No logs this time, I forgot enable amdgpu logs.When the meeting ends I'll test Linux 6.13.0-arch1-2-amdgpu
Well... That means there are fewer options available.
Last edited by Mechanicus (2025-02-01 09:57:47)
Offline
Based on my confused understanding trying to follow all this, the patch that fixes freezes on 'Mesa chipset detection*: raven' hardware has been merged. Therefore the AUR 'mesa-git' package has this fix.
Implying 'mesa-git' may have the patch applied that Lone_Wolf used in his latest patched mesa.
I git cloned the mesa source from: https://gitlab.freedesktop.org/mesa/mesa.git
All fresh mesa trunk builds will have the raven/raven2 changes but with mesa 25.0 branched off and mesa trunk now at 25.1 the differences between mesa 25.0.x and mesa 25.1 trunk will increase fast .
A better approach would be to adjust mesa-git to build the 25.0 branch instead of main .
replace lines 93-96 of the mesa-git PKGBUILD with
source=(
'mesa::git+https://gitlab.freedesktop.org/mesa/mesa.git#branch=25.0'
'LICENSE'
)
to achieve that.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
I'm also getting those freezes on a Ryzen 5 PRO 2400G since months. Using zen kernel looks like it is making the freezes less frequent, but it didn't solved the problem.
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.
Is this only affecting Arch systems as far as you know? I'm considering what to be done, since it is becoming frustrating working with random freezes... don't know if waiting for a fix, changing distro, or completely change pc.
Offline
Getting the freezes with my 7700 XT, but not with my 6700 XT or RX 480. Seems to be exacerbated by Beszel probing my GPU with rocm-smi every 4.3 seconds for graphs.
Adding `amdgpu.ppfeaturemask=0xfff73fff` to mask GFXOFF seemed to help, but now I am trying the linux-amdgpu 6.13-arch1-2.
So far, the only problem I've had with this kernel:
1) /dev/dri identifies my dGPU as `card1` and not `card0`.
2) My iGPU is most definitely disabled.
3) Running `sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover` causes a total reboot.
Offline
I'm also getting those freezes on a Ryzen 5 PRO 2400G since months. Using zen kernel looks like it is making the freezes less frequent, but it didn't solved the problem.
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.Is this only affecting Arch systems as far as you know? I'm considering what to be done, since it is becoming frustrating working with random freezes... don't know if waiting for a fix, changing distro, or completely change pc.
As far as I understand this is mainly affecting Arch systems because they have the latest updates, but until this issue doesn't get fix it could propagate to other distributions as they update the Mesa to the latest versions.
That's what I've understand, not sure if it's like that.
Last edited by pacoandres (2025-02-01 11:37:39)
Offline
Hi. I have AMD Ryzen 3 2200G (Vega 8, Raven Ridge, Zen/GCN5). I have been experiencing freezes since 2024-12-16. Program versions:
linux 6.12.10.arch1-1
linux-headers 6.12.10.arch1-1
linux-firmware 20250109.7673dffd-1
With Mesa version 1:24.3.4-1 from the official repositories, I got a freeze within a couple of hours. After installing @Lone_Wolf's Mesa package on 2025-01-26
mesa-test-git 25.0.0_devel.200729.61e289d0ca0-1
no freezes. How can I help?
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.
@Lone_Wolf's Mesa packages are a pretty solid workaround, at least for me. Search for links to them in this thread.
we are not condemned to write ugly code
Offline
Found another issue with the linux-amdgpu 6.13-arch1-2, which is probably just a hardware issue in disguise.
I had to reboot my system twice to get the boot menu to load up, since it's defaulting by alphabetic sorting to a different kernel. After rebooting back to linux-amdgpu, it failed to initialize my motherboard's integrated Bluetooth radio, which is in a Mediatek M.2 mini card.
Feb 01 03:15:29 copycat kernel: usb usb3-port6: attempt power cycle
Feb 01 03:15:30 copycat kernel: usb usb3-port6: unable to enumerate USB device
Offline
Found another issue with the linux-amdgpu 6.13-arch1-2, which is probably just a hardware issue in disguise.
I had to reboot my system twice to get the boot menu to load up, since it's defaulting by alphabetic sorting to a different kernel. After rebooting back to linux-amdgpu, it failed to initialize my motherboard's integrated Bluetooth radio, which is in a Mediatek M.2 mini card.
Feb 01 03:15:29 copycat kernel: usb usb3-port6: attempt power cycle Feb 01 03:15:30 copycat kernel: usb usb3-port6: unable to enumerate USB device
Interesting... It might be Linux 6.13 issue. Let's focus on amdgpu driver for now.
Offline
Horo86 wrote:I'm also getting those freezes on a Ryzen 5 PRO 2400G since months. Using zen kernel looks like it is making the freezes less frequent, but it didn't solved the problem.
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.Is this only affecting Arch systems as far as you know? I'm considering what to be done, since it is becoming frustrating working with random freezes... don't know if waiting for a fix, changing distro, or completely change pc.
As far as I understand this is mainly affecting Arch systems because they have the latest updates, but until this issue doesn't get fix it could propagate to other distributions as they update the Mesa to the latest versions.
That's what I've understand, not sure if it's like that.
There are (some even in this thread) reports of suse & void linux users who have exactly the same issues .
Mesa 24.3.x has severe issues on some amd gpu chipsets, especially raven & raven2 cards.
There are strong signs that mesa 24.3.x exposed a kernel bug that has been present much longer.
Amd mesa & kernel devs are working on figuring this out with help of the posters in this thread.
A mitigation has landed in mesa trunk about a week ago and will be in mesa 25.0 which is at release candidate 0 now and should be released as stable in a few weeks.
Currently there are 2 workarounds :
Downgrade to mesa 24.2.8 using this 24.2.8 binary
Switch to a mesa 25.0 build with the fix , like 25.0 binary
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
clean chroot building not flexible enough ?
Try clean chroot manager by graysky
Offline
linux-amdgpu-6.13-arch1-2 also freezes my system.
Checks on glxgears, vkcube, and 'cat recover' woked well, but after the checks when trying to look to dmesg the system freeze.
This time I could get into a console after a while, but desktop (KDE plasma on wayland) is completly freeze.
This is the dmesg output. Until 230.001866 the logs are related to 'cat recover', then are related to the freeze (sorry, I forgot again enable verbose logging. I'll try to reproduce with it enabled).
https://pastebin.com/raw/igjyLk9B
Last edited by pacoandres (2025-02-01 12:36:47)
Offline
The mesa 25.0 / git workaround I saw in the MRs list affects gfx9 / Raven only, should I bother installing mesa-git anyway? Mesa 24.3.4 upgrade did coincide with my issues.
However, I also have a Docker container that runs an Ubuntu 24.10 based image, running the Wolf from Games on Whales container, which apparently idles on the GPU in the background when there are no clients connected. It presumably is using the latest Mesa from Ubuntu 24.10 as of when I built the image, which is 24.2.8.
I've experienced the freezes when nothing is running on the GPU except for Wolf, which is running on the aforementioned mesa 24.2.8. However, there was another extenuating issue with that, in that I had Beszel probing my GPU sensors every 4.3 seconds by repeatedly running the rocm-smi script. This script would then cause lockups after a while.
Offline
linux-amdgpu-6.13-arch1-2 also freezes my system.
Checks on glxgears, vkcube, and 'cat recover' woked well, but after the checks when trying to look to dmesg the system freeze.
This time I could get into a console after a while, but desktop (KDE plasma on wayland) is completly freeze.This is the dmesg output. Until 230.001866 the logs are related to 'cat recover', then are related to the freeze (sorry, I forgot again enable verbose logging. I'll try to reproduce with it enabled).
https://pastebin.com/raw/igjyLk9B
Thank you for detailed answer. Let's check stability without amdgpu_gpu_reset. This functionality seems to be broken in different place.
Offline
AMDGPU
Build: linux-ring-recovery-6.13.arch1-1 - freezes.
Included patches:
- Extend amdgpu_ring_soft_recovery function wih PG control
What to check:
- sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover
- idle stability
- workflow stability
- glxgears window resizing
- vkcube window resizing
Kernel option to keep during testing period: fsck.mode=force
Last edited by Mechanicus (2025-02-02 10:49:44)
Offline
Mechanicus, I'm now testing patches proposed by Alex Deucher on GitLab. So I suppose my kernel is now equivalent to your linux-amdgpu 6.13.arch1-2 to some degree?
Can you try the following and replicate a crash?
amdgpu.ppfeaturemask=0xfff77fff
This feature mask, namely PP_GFX_DCS_MASK, had no effect on our problem.
Offline
The only thing that prevents my system to freeze: amdgpu.ppfeaturemask=0xf7fff
I yet have to experience the freeze with amdgpu.ppfeaturemask=0xfff73fff you suggested. I will continue to run it more, but already 2 days give or take, seems fine.
Temps are the same as before 30-39c (when stock AMD cooler kicks in by default curve) in normal light workloads (Blender, Firefox etc.).
Offline
Mechanicus, I'm now testing patches proposed by Alex Deucher on GitLab. So I suppose my kernel is now equivalent to your linux-amdgpu 6.13.arch1-2 to some degree?
Correct. The differences are that my changes affect all chips and puts GPU in power efficient mode faster.
Last edited by Mechanicus (2025-02-01 17:27:35)
Offline