You are not logged in.

#376 2025-01-31 20:10:30

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Based on my confused understanding trying to follow all this, the patch that fixes freezes on 'Mesa chipset detection*: raven' hardware has been merged. Therefore the AUR 'mesa-git' package has this fix.

Implying 'mesa-git' may have the patch applied that Lone_Wolf used in his latest patched mesa.

I git cloned the mesa source from: https://gitlab.freedesktop.org/mesa/mesa.git

Used my 'git-rollback' script to display the commits because I don't really know how to use git for anything useful.
The output was huge, so copied it to file for grep, and eventually come up with this:

commit 3b78dcec058e85321f636f353ad5c23c986e3a11
Author: Marek Olšák <maraeo@gmail.com>
Date:   Mon Jan 27 15:24:21 2025 -0500

    radeonsi: disallow compute queues on Raven/Raven2 due to hangs
    
    Fixes: 58b512ddd6e - radeonsi: execute clears at resource allocation using compute instead of gfx
    Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310
    
    Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
    Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33248>

src/gallium/drivers/radeonsi/si_pipe.c

And the code applied to si_pipe.c:   ( I've also verified this code was present in the clean chroot build.)

$ sed -n '518,526p;527q'  mesa/src/gallium/drivers/radeonsi/si_pipe.c

   sctx->has_graphics = sscreen->info.gfx_level == GFX6 ||
                        /* Compute queues hang on Raven and derivatives, see:
                         * https://gitlab.freedesktop.org/mesa/mesa/-/issues/12310 */
                        ((sscreen->info.family == CHIP_RAVEN ||
                          sscreen->info.family == CHIP_RAVEN2) &&
                         !sscreen->info.has_dedicated_vram) ||
                        !(flags & PIPE_CONTEXT_COMPUTE_ONLY);

Last night I built 'mesa-git' in a clean chroot and installed it.
Passed the initial freeze test including overnight play of an 8 hour, 4k vid in browser.

If this is correct, the  'mesa-git' AUR package may be an alternative solution for users needing a temp workaround until a fix is available in the official repos.

Any feedback regarding linux-amdgpu 6.13.arch1-2?

I can test this in a few hours...

Last edited by NuSkool (2025-01-31 20:21:44)

Offline

#377 2025-01-31 21:06:24

flemingfleming
Member
Registered: 2024-12-27
Posts: 14

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

linux-amdgpu 6.13.arch1-4 freezes. No messages in the log, though I didn't apply the suggested debug parameter from earlier. I'll try  linux-amdgpu 6.13.arch1-2 next.

Offline

#378 2025-01-31 22:01:23

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus wrote:
pacoandres wrote:

I've been reading documentation but I can't find a way to force verbose logs for amdgpu. Does any one know if it's possible?

Try

echo 0xf > /sys/module/drm/parameters/debug

I had to use this due to permission issues:

echo '0xf' | sudo tee /sys/module/drm/parameters/debug

That results in this though. Is this expected?

sudo cat /sys/module/drm/parameters/debug
15

Before changing it, it had a single zero, '0'.

I can edit the file to contain just this '0xf' if necessary...

I'm currently testing:  linux-amdgpu 6.13.arch1-2 with repo mesa.

Last edited by NuSkool (2025-01-31 22:07:10)

Offline

#379 2025-01-31 22:09:35

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:

That results in this though. Is this expected?

sudo cat /sys/module/drm/parameters/debug
15

Yes. 0xf in hexadecimal is 15 decimal.

Offline

#380 2025-01-31 22:36:40

NuSkool
Member
Registered: 2015-03-23
Posts: 205

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Preliminary test results for:
linux-amdgpu 6.13.arch1-2    uname -rs: Linux 6.13.0-arch1-2-amdgpu
mesa 1:24.3.4-1

cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-linux-amdgpu root=UUID=60bc1026-da96-43b5-8963-eda5d63b8049 rw loglevel=3 sysrq_always_enabled=1 amd_pstate=passive fsck.mode=force

dmesg: http://0x0.st/88Gu.txt

OK - idle stability
OK - workflow stability
OK - glxgears window resizing
OK - vkcube window resizing

Ran a 4k fullscreen vid while running both glxgears, vkcube at the same time while resizing...

I'll run this setup for 24hr or until anything further to report.
Maybe this kernel is a solution you're looking for or the right direction for further work?

Last edited by NuSkool (2025-01-31 22:46:26)

Offline

#381 2025-02-01 03:22:59

flemingfleming
Member
Registered: 2024-12-27
Posts: 14

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

linux-amdgpu 6.13.arch1-2 seems to be working so far! I haven't been able to cause the freeze. Performance and gpu temperature looks normal, the test applications glxgears and vkcube resize fine. I'll keep running this and see if it stays working.

(Ryzen 5 2500U (Raven Ridge))

Offline

#382 2025-02-01 09:30:24

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:

Maybe this kernel is a solution you're looking for or the right direction for further work?

At least this is an option. This build is adaptation on the change that AMD engineer is going to merge in the kernel. The differences are:
- changing of powergate state for Compute unit is applied for all AMD GPUs, not limiting for Raven;
- gfx_off timeout is removed, so the GPU runs into power efficient mode faster, which should reduce overall power consumption.

But I still think there might be a better option to fix the instability. Let's see if linux-ring-test 6.13.arch1-1 approach also works.

Offline

#383 2025-02-01 09:39:59

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

beholder wrote:

I'm also getting those freezes. A workaround that works for me is this when it happens

- press CTRL+ALT+1 and then CTRL+ALT+2 repeatedly for a couple of minutes until the screen comes back on - on gnome CTRL+ALT+2 should bring you back to the desktop without needing to re-input the password.

my system

- gpu: AMD 7900 XTX
- cpu: 7800X3D

uname -a
Linux arch-pc 6.12.8-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 02 Jan 2025 22:52:26 +0000 x86_64 GNU/Linux

mesa 1:24.3.3-1

Jan 06 11:24:24 arch-pc syncthing[1825]: [4FNSN] INFO: Established secure connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK
Jan 06 11:24:24 arch-pc syncthing[1825]: [4FNSN] INFO: Device REZZLIU client is "syncthing v1.27.3" named "Pixel 6a" at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK
Jan 06 11:28:19 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users.
Jan 06 11:28:19 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users.
Jan 06 11:28:21 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users.
Jan 06 11:28:21 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users.
Jan 06 11:29:58 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:79:crtc-0] flip_done timed out
Jan 06 11:31:55 arch-pc syncthing[1825]: [4FNSN] INFO: Lost primary connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK: read timeout (0 remain)
Jan 06 11:31:55 arch-pc syncthing[1825]: [4FNSN] INFO: Connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5MKVCC0MJ28OCCCBKQC57BK closed: read timeout
Jan 06 11:31:57 arch-pc syncthing[1825]: [4FNSN] INFO: Established secure connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-60O5O9OOK7OAU5E8G37OS9DNBU
Jan 06 11:31:57 arch-pc syncthing[1825]: [4FNSN] INFO: Device REZZLIU client is "syncthing v1.27.3" named "Pixel 6a" at 10.10.10.5:22000-10.10.10.30:22000/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-60O5O9OOK7OAU5E8G37OS9DNBU
Jan 06 11:32:22 arch-pc syncthing[1825]: [4FNSN] INFO: Established secure connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5OCMCEI48QBG9FDKOAD4O6E
Jan 06 11:32:22 arch-pc syncthing[1825]: [4FNSN] INFO: Additional connection (+1) for device REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/tcp-client/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P10-60O5OCMCEI48QBG9FDKOAD4O6E
Jan 06 11:32:22 arch-pc syncthing[1825]: [4FNSN] INFO: Lost primary connection to REZZLIU at 10.10.10.5:22000-10.10.10.30:22000/quic-server/TLS1.3-TLS_AES_128_GCM_SHA256/LAN-P20-60O5O9OOK7OAU5E8G37OS9DNBU: replacing connection (1 remain)
Jan 06 11:35:28 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* flip_done timed out
Jan 06 11:35:28 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [CRTC:79:crtc-0] commit wait timed out
Jan 06 11:35:38 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* flip_done timed out
Jan 06 11:35:38 arch-pc kernel: amdgpu 0000:03:00.0: [drm] *ERROR* [PLANE:76:plane-6] commit wait timed out
Jan 06 11:35:38 arch-pc kernel: ------------[ cut here ]------------
Jan 06 11:35:38 arch-pc kernel: WARNING: CPU: 0 PID: 1179 at drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:8622 amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu]
Jan 06 11:35:38 arch-pc kernel: Modules linked in: snd_seq_dummy snd_hrtimer snd_seq amd_atl intel_rapl_msr intel_rapl_common amdgpu ext4 mbcache vfat jbd2 fat snd_hda_codec_hdmi mt7921e snd_hda_intel snd_intel_dspcfg mt7921_common amdxcp snd_intel_sdw_acpi btusb drm_exec snd_usb_audio mt792x_lib uvcvideo btrtl gpu_sched snd_hda_codec mt76_connac_lib snd_usbmidi_lib videobuf2_vmalloc drm_buddy kvm_amd btintel uvc snd_ump snd_hda_core mt76 btbcm videobuf2_memops i2c_algo_bit snd_rawmidi spd5118 drm_suballoc_helper videobuf2_v4l2 snd_hwdep snd_seq_device btmtk drm_ttm_helper kvm mac80211 atlantic ttm snd_pcm videobuf2_common bluetooth drm_display_helper snd_timer rapl macsec videodev libarc4 wmi_bmof pcspkr snd i2c_piix4 ptp cec k10temp i2c_smbus mc pl2303 soundcore crc16 pps_core cfg80211 mousedev gpio_amdpt joydev gpio_generic rfkill mac_hid loop nfnetlink zram 842_decompress 842_compress lz4hc_compress lz4_compress ip_tables x_tables dm_crypt cbc encrypted_keys trusted asn1_encoder tee crct10dif_pclmul crc32_pclmul polyval_clmulni
Jan 06 11:35:38 arch-pc kernel:  polyval_generic ghash_clmulni_intel hid_generic sha512_ssse3 sha256_ssse3 sha1_ssse3 aesni_intel gf128mul nvme crypto_simd cryptd ccp usbhid sp5100_tco nvme_core nvme_auth video wmi btrfs blake2b_generic libcrc32c crc32c_generic crc32c_intel xor raid6_pq dm_mod crypto_user
Jan 06 11:35:38 arch-pc kernel: CPU: 0 UID: 0 PID: 1179 Comm: systemd-logind Not tainted 6.12.8-arch1-1 #1 099de49ddaebb26408f097c48b36e50b2c8e21c9
Jan 06 11:35:38 arch-pc kernel: Hardware name: Micro-Star International Co., Ltd. MS-7D69/MEG X670E ACE (MS-7D69), BIOS 1.90 08/10/2023
Jan 06 11:35:38 arch-pc kernel: RIP: 0010:amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu]
Jan 06 11:35:38 arch-pc kernel: Code: 7c e2 e9 dc fd ff ff 49 8d 87 50 31 04 00 c6 85 38 fe ff ff 00 48 89 85 48 fe ff ff e9 d8 cb ff ff 0f 0b e9 fc f2 ff ff 0f 0b <0f> 0b e9 12 f3 ff ff 0f 0b e9 11 cc ff ff 48 c7 85 28 fe ff ff 00
Jan 06 11:35:38 arch-pc kernel: RSP: 0018:ffffae86c23a75b0 EFLAGS: 00010086
Jan 06 11:35:38 arch-pc kernel: RAX: 0000000000000001 RBX: 0000000000000286 RCX: ffff8db801082118
Jan 06 11:35:38 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000000297 RDI: ffff8db842c80178
Jan 06 11:35:38 arch-pc kernel: RBP: ffffae86c23a7800 R08: ffffae86c23a749c R09: 0000000000000000
Jan 06 11:35:38 arch-pc kernel: R10: ffffae86c23a7508 R11: ffffae86c23a750c R12: ffffae86c23a7668
Jan 06 11:35:38 arch-pc kernel: R13: 0000000000000000 R14: ffff8dba3d761000 R15: ffff8db801082000
Jan 06 11:35:38 arch-pc kernel: FS:  00007cc05b0e2900(0000) GS:ffff8dc6d8200000(0000) knlGS:0000000000000000
Jan 06 11:35:38 arch-pc kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jan 06 11:35:38 arch-pc kernel: CR2: 0000714aae6c4400 CR3: 00000001407b8000 CR4: 0000000000f50ef0
Jan 06 11:35:38 arch-pc kernel: PKRU: 55555554
Jan 06 11:35:38 arch-pc kernel: Call Trace:
Jan 06 11:35:38 arch-pc kernel:  <TASK>
Jan 06 11:35:38 arch-pc kernel:  ? amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu fb97feb5a7216969a6c4e39cc61cb53691cdacb2]
Jan 06 11:35:38 arch-pc kernel:  ? __warn.cold+0x93/0xf6
Jan 06 11:35:38 arch-pc kernel:  ? amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu fb97feb5a7216969a6c4e39cc61cb53691cdacb2]
Jan 06 11:35:38 arch-pc kernel:  ? report_bug+0xff/0x140
Jan 06 11:35:38 arch-pc kernel:  ? handle_bug+0x58/0x90
Jan 06 11:35:38 arch-pc kernel:  ? exc_invalid_op+0x17/0x70
Jan 06 11:35:38 arch-pc kernel:  ? asm_exc_invalid_op+0x1a/0x20
Jan 06 11:35:38 arch-pc kernel:  ? amdgpu_dm_atomic_commit_tail+0x3b4f/0x3c30 [amdgpu fb97feb5a7216969a6c4e39cc61cb53691cdacb2]
Jan 06 11:35:38 arch-pc kernel:  commit_tail+0x91/0x130
Jan 06 11:35:38 arch-pc kernel:  drm_atomic_helper_commit+0x11a/0x140
Jan 06 11:35:38 arch-pc kernel:  drm_atomic_commit+0xa6/0xe0
Jan 06 11:35:38 arch-pc kernel:  ? __pfx___drm_printfn_info+0x10/0x10
Jan 06 11:35:38 arch-pc kernel:  drm_client_modeset_commit_atomic+0x203/0x250
Jan 06 11:35:38 arch-pc kernel:  drm_client_modeset_commit_locked+0x5a/0x160
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  __drm_fb_helper_restore_fbdev_mode_unlocked+0x5e/0xd0
Jan 06 11:35:38 arch-pc kernel:  drm_fb_helper_set_par+0x30/0x40
Jan 06 11:35:38 arch-pc kernel:  fb_set_var+0x25c/0x460
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? update_load_avg+0x7e/0x7b0
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? sched_clock_cpu+0xf/0x1d0
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? psi_group_change+0x13b/0x310
Jan 06 11:35:38 arch-pc kernel:  fbcon_blank+0x271/0x330
Jan 06 11:35:38 arch-pc kernel:  do_unblank_screen+0xad/0x150
Jan 06 11:35:38 arch-pc kernel:  complete_change_console+0x54/0x120
Jan 06 11:35:38 arch-pc kernel:  vt_ioctl+0xec3/0x12c0
Jan 06 11:35:38 arch-pc kernel:  tty_ioctl+0xe2/0x8a0
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? __seccomp_filter+0x303/0x520
Jan 06 11:35:38 arch-pc kernel:  __x64_sys_ioctl+0x91/0xd0
Jan 06 11:35:38 arch-pc kernel:  do_syscall_64+0x82/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? syscall_exit_to_user_mode+0x37/0x1c0
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? do_syscall_64+0x8e/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? evdev_ioctl+0x6f/0x90
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? syscall_exit_to_user_mode+0x37/0x1c0
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? do_syscall_64+0x8e/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? do_syscall_64+0x8e/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? do_syscall_64+0x8e/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? do_syscall_64+0x8e/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  ? do_syscall_64+0x8e/0x190
Jan 06 11:35:38 arch-pc kernel:  ? srso_alias_return_thunk+0x5/0xfbef5
Jan 06 11:35:38 arch-pc kernel:  entry_SYSCALL_64_after_hwframe+0x76/0x7e
Jan 06 11:35:38 arch-pc kernel: RIP: 0033:0x7cc05ab23ced
Jan 06 11:35:38 arch-pc kernel: Code: 04 25 28 00 00 00 48 89 45 c8 31 c0 48 8d 45 10 c7 45 b0 10 00 00 00 48 89 45 b8 48 8d 45 d0 48 89 45 c0 b8 10 00 00 00 0f 05 <89> c2 3d 00 f0 ff ff 77 1a 48 8b 45 c8 64 48 2b 04 25 28 00 00 00
Jan 06 11:35:38 arch-pc kernel: RSP: 002b:00007ffe55f13050 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Jan 06 11:35:38 arch-pc kernel: RAX: ffffffffffffffda RBX: 000000000000001f RCX: 00007cc05ab23ced
Jan 06 11:35:38 arch-pc kernel: RDX: 0000000000000001 RSI: 0000000000005605 RDI: 000000000000001f
Jan 06 11:35:38 arch-pc kernel: RBP: 00007ffe55f130a0 R08: 00007ffe55f13030 R09: 00005ce766311ed0
Jan 06 11:35:38 arch-pc kernel: R10: 00007ffe55f13080 R11: 0000000000000246 R12: 0000000000000000
Jan 06 11:35:38 arch-pc kernel: R13: 00007ffe55f13130 R14: 00005ce766310be0 R15: 00005ce766313120
Jan 06 11:35:38 arch-pc kernel:  </TASK>
Jan 06 11:35:38 arch-pc kernel: ---[ end trace 0000000000000000 ]---
Jan 06 11:35:38 arch-pc kernel: rfkill: input handler enabled
Jan 06 11:35:38 arch-pc systemd-logind[1179]: New session 5 of user gdm.
Jan 06 11:35:38 arch-pc gsd-media-keys[2064]: Unable to get default source
Jan 06 11:35:38 arch-pc gsd-media-keys[2064]: Unable to get default sink
Jan 06 11:35:38 arch-pc systemd[1]: Created slice User Slice of UID 120.
Jan 06 11:35:38 arch-pc systemd[1]: Starting User Runtime Directory /run/user/120...
Jan 06 11:35:39 arch-pc systemd[1]: Finished User Runtime Directory /run/user/120.
Jan 06 11:35:39 arch-pc systemd[1]: Starting User Manager for UID 120...
Jan 06 11:35:39 arch-pc (systemd)[7755]: pam_warn(systemd-user:setcred): function=[pam_sm_setcred] flags=0x8002 service=[systemd-user] terminal=[] user=[gdm] ruser=[<unknown>] rhost=[<unknown>]
Jan 06 11:35:39 arch-pc (systemd)[7755]: pam_unix(systemd-user:session): session opened for user gdm(uid=120) by gdm(uid=0)
Jan 06 11:35:39 arch-pc systemd-logind[1179]: New session 6 of user gdm.
Jan 06 11:35:39 arch-pc systemd[7755]: Queued start job for default target Main User Target.
Jan 06 11:35:39 arch-pc systemd[7755]: Created slice User Application Slice.
Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Paths.
Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Timers.
Jan 06 11:35:39 arch-pc systemd[7755]: Starting D-Bus User Message Bus Socket...
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG network certificate management daemon.
Jan 06 11:35:39 arch-pc systemd[7755]: Starting GCR ssh-agent wrapper...
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GNOME Keyring daemon.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent and passphrase cache (access for web browsers).
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent and passphrase cache (restricted).
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent (ssh-agent emulation).
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG cryptographic agent and passphrase cache.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GnuPG public key management service.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on p11-kit server.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on PipeWire PulseAudio.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on PipeWire Multimedia System Sockets.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on D-Bus User Message Bus Socket.
Jan 06 11:35:39 arch-pc systemd[7755]: Listening on GCR ssh-agent wrapper.
Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Sockets.
Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Basic System.
Jan 06 11:35:39 arch-pc systemd[1]: Started User Manager for UID 120.
Jan 06 11:35:39 arch-pc systemd[7755]: Starting Update XDG user dir configuration...
Jan 06 11:35:39 arch-pc systemd[1]: Started Session 5 of User gdm.
Jan 06 11:35:39 arch-pc systemd[7755]: Finished Update XDG user dir configuration.
Jan 06 11:35:39 arch-pc systemd[7755]: Reached target Main User Target.
Jan 06 11:35:39 arch-pc systemd[7755]: Startup finished in 158ms.
Jan 06 11:35:39 arch-pc systemd[7755]: Created slice User Core Session Slice.
Jan 06 11:35:39 arch-pc systemd[7755]: Starting D-Bus User Message Bus...
Jan 06 11:35:39 arch-pc dbus-broker-launch[7777]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +31: Eavesdropping is deprecated and ignored
Jan 06 11:35:39 arch-pc dbus-broker-launch[7777]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +33: Eavesdropping is deprecated and ignored
Jan 06 11:35:39 arch-pc systemd[7755]: Started D-Bus User Message Bus.
Jan 06 11:35:39 arch-pc dbus-broker-launch[7777]: Ready
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.freedesktop.systemd1' requested by ':1.2' (uid=120 pid=7782 comm="/usr/lib/gnome-session-binary --autostart /usr/sha")
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Jan 06 11:35:39 arch-pc gnome-session[7782]: gnome-session-binary[7782]: WARNING: Could not check if unit gnome-session-wayland@gnome-login.target is active: Error calling StartServiceByName for org.freedesktop.systemd1: Process org.freedesktop.systemd1 exited with status 1
Jan 06 11:35:39 arch-pc gnome-session-binary[7782]: WARNING: Could not check if unit gnome-session-wayland@gnome-login.target is active: Error calling StartServiceByName for org.freedesktop.systemd1: Process org.freedesktop.systemd1 exited with status 1
Jan 06 11:35:39 arch-pc gnome-session[7782]: gnome-session-binary[7782]: WARNING: Desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop for application orca-autostart.desktop could not be parsed or references a missing TryExec binary
Jan 06 11:35:39 arch-pc gnome-session-binary[7782]: WARNING: Desktop file /usr/share/gdm/greeter/autostart/orca-autostart.desktop for application orca-autostart.desktop could not be parsed or references a missing TryExec binary
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Running GNOME Shell (using mutter 47.3) as a Wayland display server
Jan 06 11:35:39 arch-pc rtkit-daemon[1410]: Supervising 10 threads of 7 processes of 1 users.
Jan 06 11:35:39 arch-pc rtkit-daemon[1410]: Successfully made thread 7812 of process 7794 owned by '120' high priority at nice level -15.
Jan 06 11:35:39 arch-pc rtkit-daemon[1410]: Supervising 11 threads of 8 processes of 2 users.
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Thread 'KMS thread' will be using high priority scheduling
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Device '/dev/dri/card1' prefers shadow buffer
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Added device '/dev/dri/card1' (amdgpu) using atomic mode setting.
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Device '/dev/dri/card0' prefers shadow buffer
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Added device '/dev/dri/card0' (amdgpu) using atomic mode setting.
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Created gbm renderer for '/dev/dri/card1'
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Created gbm renderer for '/dev/dri/card0'
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Boot VGA GPU /dev/dri/card1 selected as primary
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Obtained a high priority EGL context
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.a11y.Bus' requested by ':1.4' (uid=120 pid=7794 comm="/usr/bin/gnome-shell")
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Successfully activated service 'org.a11y.Bus'
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Using public X11 display :1024, (using :1025 for managed services)
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Using Wayland display name 'wayland-0'
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7862]: dbus-daemon[7862]: Activating service name='org.a11y.atspi.Registry' requested by ':1.0' (uid=120 pid=7794 comm="/usr/bin/gnome-shell")
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7862]: dbus-daemon[7862]: Successfully activated service 'org.a11y.atspi.Registry'
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7865]: SpiRegistry daemon is running with well-known name - org.a11y.atspi.Registry
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Unset XDG_SESSION_ID, getCurrentSessionProxy() called outside a user session. Asking logind directly.
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Will monitor session 5
Jan 06 11:35:39 arch-pc systemd[1]: Starting Locale Service...
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.gnome.Shell.Screencast' requested by ':1.3' (uid=120 pid=7794 comm="/usr/bin/gnome-shell")
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.freedesktop.impl.portal.PermissionStore' requested by ':1.3' (uid=120 pid=7794 comm="/usr/bin/gnome-shell")
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Successfully activated service 'org.freedesktop.impl.portal.PermissionStore'
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.gnome.Shell.Notifications' requested by ':1.3' (uid=120 pid=7794 comm="/usr/bin/gnome-shell")
Jan 06 11:35:39 arch-pc systemd[1]: Started Locale Service.
Jan 06 11:35:39 arch-pc org.gnome.Shell.desktop[7794]: Window manager warning: Failed to parse saved session file: Failed to open file “/var/lib/gdm/.config/mutter/sessions/10569fa8b512a31f8b173618853922793800000077820000.ms”: No such file or directory
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Failed to launch ibus-daemon: Failed to execute child process “ibus-daemon” (No such file or directory)
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Error looking up permission: GDBus.Error:org.freedesktop.portal.Error.NotFound: No entry for geolocation
Jan 06 11:35:39 arch-pc systemd[1]: Starting Hostname Service...
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activating service name='org.freedesktop.systemd1' requested by ':1.20' (uid=120 pid=7981 comm="/usr/lib/gsd-sharing")
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Activated service 'org.freedesktop.systemd1' failed: Process org.freedesktop.systemd1 exited with status 1
Jan 06 11:35:39 arch-pc /usr/lib/gdm-wayland-session[7781]: dbus-daemon[7781]: [session uid=120 pid=7781 pidfd=5] Successfully activated service 'org.gnome.Shell.Notifications'
Jan 06 11:35:39 arch-pc gsd-sharing[7981]: Failed to StopUnit service: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
Jan 06 11:35:39 arch-pc gsd-sharing[7981]: Failed to StopUnit service: GDBus.Error:org.freedesktop.DBus.Error.Spawn.ChildExited: Process org.freedesktop.systemd1 exited with status 1
Jan 06 11:35:39 arch-pc gnome-shell[7794]: Failed to create color profile from colord profile: Error opening file /home/fred/.local/share/icc/edid-3677f87b350ebb514504413952b427ce.icc: Permission denied
Jan 06 11:35:39 arch-pc gnome-shell[7794]: No permission to control network connections: Polkit.Error: GDBus.Error:org.freedesktop.PolicyKit1.Error.Failed: Action org.freedesktop.NetworkManager.network-control is not registered
Jan 06 11:35:39 arch-pc systemd[1]: Starting Location Lookup Service...
Jan 06 11:35:39 arch-pc systemd[7755]: Started PipeWire Multimedia Service.
Jan 06 11:35:39 arch-pc systemd[7755]: Started Multimedia Service Session Manager.
Jan 06 11:35:39 arch-pc systemd[7755]: Started PipeWire PulseAudio.

@beholder, could you try the mentioned builds on your configuration? You've mentioned your problem in https://bbs.archlinux.org/viewtopic.php … 1#p2218651 comment. And I can say that it has the same root cause.

Last edited by Mechanicus (2025-02-01 09:42:11)

Offline

#384 2025-02-01 09:47:00

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Bad news, linux-ring-test 6.13.arch1-1  freeze while on a video meeting with firefox, okular and dolphin as the only opened windows.
No logs this time, just the 'amdgpu: Dumping IP State'. I forgot enable amdgpu logs.

When the meeting ends I'll test Linux 6.13.0-arch1-2-amdgpu

Last edited by pacoandres (2025-02-01 09:54:11)

Offline

#385 2025-02-01 09:55:36

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

pacoandres wrote:

Bad news, linux-ring-test 6.13.arch1-1  freeze while on a video meeting with firefox, okular and dolphin as the only opened windows.
No logs this time, I forgot enable amdgpu logs.

When the meeting ends I'll test Linux 6.13.0-arch1-2-amdgpu

Well... That means there are fewer options available.

Last edited by Mechanicus (2025-02-01 09:57:47)

Offline

#386 2025-02-01 10:44:40

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,252

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NuSkool wrote:

Based on my confused understanding trying to follow all this, the patch that fixes freezes on 'Mesa chipset detection*: raven' hardware has been merged. Therefore the AUR 'mesa-git' package has this fix.

Implying 'mesa-git' may have the patch applied that Lone_Wolf used in his latest patched mesa.

I git cloned the mesa source from: https://gitlab.freedesktop.org/mesa/mesa.git

All fresh mesa trunk builds will have the raven/raven2 changes but with mesa 25.0 branched off and mesa trunk now at 25.1 the differences between mesa 25.0.x and mesa 25.1 trunk will increase fast .

A better approach would be to adjust mesa-git to build the 25.0 branch instead of main .

replace lines 93-96 of the mesa-git PKGBUILD with

source=(
    'mesa::git+https://gitlab.freedesktop.org/mesa/mesa.git#branch=25.0'
    'LICENSE'
)

to achieve that.


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#387 2025-02-01 11:18:43

Horo86
Member
Registered: 2025-01-25
Posts: 2

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

I'm also getting those freezes on a Ryzen 5 PRO 2400G since months. Using zen kernel looks like it is making the freezes less frequent, but it didn't solved the problem.
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.

Is this only affecting Arch systems as far as you know? I'm considering what to be done, since it is becoming frustrating working with random freezes... don't know if waiting for a fix, changing distro, or completely change pc.

Offline

#388 2025-02-01 11:31:04

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Getting the freezes with my 7700 XT, but not with my 6700 XT or RX 480. Seems to be exacerbated by Beszel probing my GPU with rocm-smi every 4.3 seconds for graphs.

Adding `amdgpu.ppfeaturemask=0xfff73fff` to mask GFXOFF seemed to help, but now I am trying the linux-amdgpu 6.13-arch1-2.

So far, the only problem I've had with this kernel:

1) /dev/dri identifies my dGPU as `card1` and not `card0`.
2) My iGPU is most definitely disabled.
3) Running `sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover` causes a total reboot.

Offline

#389 2025-02-01 11:36:11

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Horo86 wrote:

I'm also getting those freezes on a Ryzen 5 PRO 2400G since months. Using zen kernel looks like it is making the freezes less frequent, but it didn't solved the problem.
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.

Is this only affecting Arch systems as far as you know? I'm considering what to be done, since it is becoming frustrating working with random freezes... don't know if waiting for a fix, changing distro, or completely change pc.

As far as I understand this is mainly affecting Arch systems because they have the latest updates, but until this issue doesn't get fix it could propagate to other distributions as they update the Mesa to the latest versions.

That's what I've understand, not sure if it's like that.

Last edited by pacoandres (2025-02-01 11:37:39)

Offline

#390 2025-02-01 11:49:57

beroal
Member
From: Ukraine
Registered: 2009-06-07
Posts: 377
Website

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Hi. I have AMD Ryzen 3 2200G (Vega 8, Raven Ridge, Zen/GCN5). I have been experiencing freezes since 2024-12-16. Program versions:

linux 6.12.10.arch1-1
linux-headers 6.12.10.arch1-1
linux-firmware 20250109.7673dffd-1

With Mesa version 1:24.3.4-1 from the official repositories, I got a freeze within a couple of hours. After installing @Lone_Wolf's Mesa package on 2025-01-26

mesa-test-git 25.0.0_devel.200729.61e289d0ca0-1

no freezes. How can I help?

Horo86 wrote:

I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.

@Lone_Wolf's Mesa packages are a pretty solid workaround, at least for me. Search for links to them in this thread.


we are not condemned to write ugly code

Offline

#391 2025-02-01 11:52:39

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Found another issue with the linux-amdgpu 6.13-arch1-2, which is probably just a hardware issue in disguise.

I had to reboot my system twice to get the boot menu to load up, since it's defaulting by alphabetic sorting to a different kernel. After rebooting back to linux-amdgpu, it failed to initialize my motherboard's integrated Bluetooth radio, which is in a Mediatek M.2 mini card.

Feb 01 03:15:29 copycat kernel: usb usb3-port6: attempt power cycle
Feb 01 03:15:30 copycat kernel: usb usb3-port6: unable to enumerate USB device

Offline

#392 2025-02-01 12:12:28

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

kode54 wrote:

Found another issue with the linux-amdgpu 6.13-arch1-2, which is probably just a hardware issue in disguise.

I had to reboot my system twice to get the boot menu to load up, since it's defaulting by alphabetic sorting to a different kernel. After rebooting back to linux-amdgpu, it failed to initialize my motherboard's integrated Bluetooth radio, which is in a Mediatek M.2 mini card.

Feb 01 03:15:29 copycat kernel: usb usb3-port6: attempt power cycle
Feb 01 03:15:30 copycat kernel: usb usb3-port6: unable to enumerate USB device

Interesting... It might be Linux 6.13 issue. Let's focus on amdgpu driver for now.

Offline

#393 2025-02-01 12:27:02

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 13,252

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

pacoandres wrote:
Horo86 wrote:

I'm also getting those freezes on a Ryzen 5 PRO 2400G since months. Using zen kernel looks like it is making the freezes less frequent, but it didn't solved the problem.
I understood from this thread that this is a major issue... but haven't understood if there is any identified workaround or is there any fix in the release pipeline.

Is this only affecting Arch systems as far as you know? I'm considering what to be done, since it is becoming frustrating working with random freezes... don't know if waiting for a fix, changing distro, or completely change pc.

As far as I understand this is mainly affecting Arch systems because they have the latest updates, but until this issue doesn't get fix it could propagate to other distributions as they update the Mesa to the latest versions.

That's what I've understand, not sure if it's like that.

There are (some even in this thread) reports of suse & void linux users who have exactly the same issues .

Mesa 24.3.x has severe issues on some amd gpu chipsets, especially raven & raven2 cards.
There are strong signs that mesa 24.3.x exposed a kernel bug that has been present much longer.
Amd mesa & kernel devs are working on figuring this out with help of the posters in this thread.

A mitigation has landed in mesa trunk about  a week ago and will be in mesa 25.0 which is at release candidate 0 now and should be released as stable in a few weeks.

Currently there are 2 workarounds :

Downgrade to mesa 24.2.8  using this 24.2.8 binary
Switch to a mesa 25.0 build with the fix , like 25.0 binary


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#394 2025-02-01 12:32:27

pacoandres
Member
Registered: 2020-03-05
Posts: 39

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

linux-amdgpu-6.13-arch1-2 also freezes my system.

Checks on glxgears, vkcube, and 'cat recover' woked well, but after the checks when trying to look to dmesg the system freeze.
This time I could get into a console after a while, but desktop (KDE plasma on wayland) is completly freeze.

This is the dmesg output. Until 230.001866 the logs are related to 'cat recover', then are related to the freeze (sorry, I forgot again enable verbose logging. I'll try to reproduce with it enabled).
https://pastebin.com/raw/igjyLk9B

Last edited by pacoandres (2025-02-01 12:36:47)

Offline

#395 2025-02-01 12:37:05

kode54
Member
Registered: 2013-10-21
Posts: 42

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

The mesa 25.0 / git workaround I saw in the MRs list affects gfx9 / Raven only, should I bother installing mesa-git anyway? Mesa 24.3.4 upgrade did coincide with my issues.

However, I also have a Docker container that runs an Ubuntu 24.10 based image, running the Wolf from Games on Whales container, which apparently idles on the GPU in the background when there are no clients connected. It presumably is using the latest Mesa from Ubuntu 24.10 as of when I built the image, which is 24.2.8.

I've experienced the freezes when nothing is running on the GPU except for Wolf, which is running on the aforementioned mesa 24.2.8. However, there was another extenuating issue with that, in that I had Beszel probing my GPU sensors every 4.3 seconds by repeatedly running the rocm-smi script. This script would then cause lockups after a while.

Offline

#396 2025-02-01 12:41:23

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

pacoandres wrote:

linux-amdgpu-6.13-arch1-2 also freezes my system.

Checks on glxgears, vkcube, and 'cat recover' woked well, but after the checks when trying to look to dmesg the system freeze.
This time I could get into a console after a while, but desktop (KDE plasma on wayland) is completly freeze.

This is the dmesg output. Until 230.001866 the logs are related to 'cat recover', then are related to the freeze (sorry, I forgot again enable verbose logging. I'll try to reproduce with it enabled).
https://pastebin.com/raw/igjyLk9B

Thank you for detailed answer. Let's check stability without amdgpu_gpu_reset. This functionality seems to be broken in different place.

Offline

#397 2025-02-01 14:12:57

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

AMDGPU
Build: linux-ring-recovery-6.13.arch1-1 - freezes.
Included patches:
- Extend amdgpu_ring_soft_recovery function wih PG control

What to check:
- sudo cat /sys/kernel/debug/dri/1/amdgpu_gpu_recover
- idle stability
- workflow stability
- glxgears window resizing
- vkcube window resizing

Kernel option to keep during testing period: fsck.mode=force

Last edited by Mechanicus (2025-02-02 10:49:44)

Offline

#398 2025-02-01 17:03:26

NotAnArchUser
Member
Registered: 2025-01-25
Posts: 9

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

Mechanicus, I'm now testing patches proposed by Alex Deucher on GitLab. So I suppose my kernel is now equivalent to your linux-amdgpu 6.13.arch1-2 to some degree?

SnowF wrote:

Can you try the following and replicate a crash?

amdgpu.ppfeaturemask=0xfff77fff

This feature mask, namely PP_GFX_DCS_MASK, had no effect on our problem.

Offline

#399 2025-02-01 17:24:44

lpr1
Member
Registered: 2017-10-08
Posts: 109

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

SnowF wrote:

The only thing that prevents my system to freeze: amdgpu.ppfeaturemask=0xf7fff

I yet have to experience the freeze with amdgpu.ppfeaturemask=0xfff73fff you suggested. I will continue to run it more, but already 2 days give or take, seems fine.

Temps are the same as before 30-39c (when stock AMD cooler kicks in by default curve) in normal light workloads (Blender, Firefox etc.).

Offline

#400 2025-02-01 17:27:15

Mechanicus
Member
Registered: 2025-01-13
Posts: 95

Re: Issues with Mesa 24.3.x and amdgpu Vega graphics

NotAnArchUser wrote:

Mechanicus, I'm now testing patches proposed by Alex Deucher on GitLab. So I suppose my kernel is now equivalent to your linux-amdgpu 6.13.arch1-2 to some degree?

Correct. The differences are that my changes affect all chips and puts GPU in power efficient mode faster.

Last edited by Mechanicus (2025-02-01 17:27:35)

Offline

Board footer

Powered by FluxBB