You are not logged in.
Sir respectfully I don't understand what you're saying,
a) What is gromit?
b) How/where do I use "amdgpu.dcdebugmask=0x10"? Is it possible to undo that at a later stage as well?
c) What is a bisect?
I mean to say that I'd love to help, I just don't know how to.
(I'm still learning Linux)
Offline
a) The user that has posted in this thread providing kernel builds, e.g. https://bbs.archlinux.org/viewtopic.php … 1#p2210441
b) kernel configuration you can apply on your bootloader to alter kernel behaviour. You'd add that line there and it's removable at the same spot
c) the process of trying to find the exact change that caused an issue, starting from a known good and a known bad point, and iterating untill a culprit is found, gromit is doing the heavy lifting here, if you (collectively, it doesn't have to be you individually) give feedback on each kernel he's providing we will eventually land on the exact culprit. For more info on the underlying process: https://wiki.archlinux.org/title/Bisect … s_with_Git -- that's just for interest and can be somewhat involved, if you want to help, test gromits builds and give feedback on whether you still see the issue
FWIW someone on reddit suggested 58a261bfc96763a851cb48b203ed57da37e157b8 not on a system where I can trivially do a build with a revert right now, but it might help speed things up for @gromit
Last edited by V1del (2024-11-26 10:19:34)
Online
FWIW someone on reddit suggested 58a261bfc96763a851cb48b203ed57da37e157b8 not on a system where I can trivially do a build with a revert right now, but it might help speed things up for @gromit
6.12.1.arch1 with 58a261bfc96763a851cb48b203ed57da37e157b8 reverted which required 23d16ede33a4db4973468bf6652a09da5efd1468 to be reverted first:
linux-6.12.1.arch1-1.1-x86_64.pkg.tar.zst/linux-headers-6.12.1.arch1-1.1-x86_64.pkg.tar.zst
Last edited by loqs (2024-11-26 13:31:30)
Offline
i'm not on the bisset party yet, but can confirm "amdgpu.dcdebugmask=0x10" resolved it!
i could reproduce consistently before, and it is completely gone with this option.
Operating System: Arch Linux
KDE Plasma Version: 6.2.3
KDE Frameworks Version: 6.8.0
Qt Version: 6.8.0
Kernel Version: 6.12.1-arch1-1 (64-bit)
Graphics Platform: Wayland
Processors: 12 × AMD Ryzen 5 PRO 6650U with Radeon Graphics
Memory: 30.7 GiB of RAM
Graphics Processor: AMD Radeon Graphics
glxinfo:
Vendor: AMD (0x1002)
Device: AMD Radeon Graphics (radeonsi, rembrandt, LLVM 18.1.8, DRM 3.59, 6.12.1-arch1-1) (0x1681)
Version: 24.2.7
Accelerated: yes
Video memory: 512MB
Unified memory: no
Preferred profile: core (0x1)
Max core profile version: 4.6
Max compat profile version: 4.6
Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.2
if others are curious, PSR means Panel Self Refresh. Not sure what it does... more info on https://bbs.archlinux.org/viewtopic.php?id=276352
my logs, before adding the 0x10 option above (after, this line is gone):
> kernel: [drm] PSR support 1, DC PSR ver 0, sink PSR ver 3 DPCD caps 0x30 su_y_granularity 4
Last edited by gcb (2024-11-26 14:18:43)
Offline
It's a power optimisation on laptops, you can basically tell the GPU to only render out updates to content that's actually changing while the rest remains static -- so you're using the GPU less for smaller changes which will conserve power. Disabling it means the GPU will have to repaint the entire screen for everything again, leading to an increase in power usage.
See https://gitlab.freedesktop.org/drm/amd/-/issues/3742 and https://gitlab.freedesktop.org/drm/amd/-/issues/3658 as well, apparently it might help if you disable VRR/switch to 60Hz if you currently have a higher refreshrate.
Last edited by V1del (2024-11-26 15:14:34)
Online
I am having this issues with slow/glitches also on hyprland artifacts screw going to downgrade see if it fixes it, this is after I did an update that most likely included the kernel
Offline
6.12.1.arch1 with 58a261bfc96763a851cb48b203ed57da37e157b8 reverted which required 23d16ede33a4db4973468bf6652a09da5efd1468 to be reverted first:
linux-6.12.1.arch1-1.1-x86_64.pkg.tar.zst/linux-headers-6.12.1.arch1-1.1-x86_64.pkg.tar.zst
I can confirm this eliminates the artifacts on my Radeon 660M (Ryzen 5 6650U integrated graphics).
Offline
Did someone already find a report for this issue on the lists or in the DRM Gitlab?
Offline
See https://gitlab.freedesktop.org/drm/amd/-/issues/3742 and https://gitlab.freedesktop.org/drm/amd/-/issues/3658 as well, apparently it might help if you disable VRR/switch to 60Hz if you currently have a higher refreshrate.
Chances are it's one (or both) of these two
Online
I'm also affected.
Linux 6.12
GNOME 47 upon Wayland
ThinkPad X13 with AMD Ryzen 6850U (RDNA2)
Turning PSR OFF with the well known boot option [m]amdgpu.dcdebugmask=0x10[/m] fixes the graphical glitches (similiar picture).
Upstream report about graphic glitches. Everyone else is complaining about frame timing or performance but our apparent issue are graphic glitches.
Last edited by hoschi (2024-11-30 18:41:04)
Offline
@hoschi have you tried with 58a261bfc96763a851cb48b203ed57da37e157b8 reverted or if that does not resolve the issue bisecting between 6.11 and 6.12? Does the patch attached to https://gitlab.freedesktop.org/drm/amd/ … te_2680869 have any effect? Is the issue still present in amdgpu-drm-next for you?
Offline
I too can confirm amdgpu.dcdebugmask=0x10 works
Offline
I had multiple freezes per day with Vega8 IG, also confirming that amdgpu.dcdebugmask=0x10 seems to solve the issue.
Last edited by lpr1 (2024-12-12 13:34:38)
Offline
Please avoid bumping the thread with me too posts.
Offline
Kernel 6.12.4.arch1-1 still has this issue, unfortunately. V1del seems to be correct, my desktop with an RX 6600 XT works just fine but my laptop with a 660M (it reports itself as a 680M) has this issue. My laptop technically has a dGPU but it has no mux chip so it's rendering on the iGPU (not sure if this is relevant information but might as well mention it just in case). 60Hz as mentioned improves the issue but does not solve it, VRR seems to have no effect.
Video of Problem - https://youtu.be/m_s9LB6e76g
Last edited by LumpyArbuckle (2024-12-15 19:56:49)
Offline
Can you try installing the kernel loqs provided so we might have an additional datapoint that that commit is the actual issue?
Online
It appears loqs's kernel solves the problem.
Offline
Offline
Glad I found this thread. My AMD 9950X with iGPU enabled is experiencing this as well but it persists even when booting with amdgpu.dcdebugmask=0x10:
For reference:
Linux version 6.12.7-arch1-1 (linux@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Fri, 27 Dec 2024 14:24:37 +0000
Command line: BOOT_IMAGE=/vmlinuz-linux root=UUID=08e3e8f3-f050-43fc-8b43-e9ed21443418 rw loglevel=3 quiet audit=0 amdgpu.dcdebugmask=0x10
...
------------[ cut here ]------------
workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
WARNING: CPU: 23 PID: 993 at kernel/workqueue.c:3704 check_flush_dependency+0xfc/0x120
Modules linked in: overlay amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component snd_hda_codec_hdmi kvm snd_hda_intel crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_intel_sdw_acpi polyval_clmulni polyval_generic snd_hda_codec ghash_clmulni_intel sha512_ssse3 snd_hda_core sha256_ssse3 sha1_ssse3 snd_hwdep spd5118 ip6t_REJECT aesni_intel sp5100_tco snd_pcm nf_reject_ipv6 r8169 gf128mul crypto_simd snd_timer realtek cryptd i2c_piix4 snd mdio_devres xt_hl wmi_bmof rapl pcspkr ccp i2c_smbus soundcore libphy ip6t_rt gpio_amdpt gpio_generic cfg80211 mousedev joydev ipt_REJECT nf_reject_ipv4 mac_hid xt_multiport xt_comment rfkill xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter xt_iprange xt_mark xt_NFQUEUE k10temp nct6683 dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2 hid_microsoft ff_memless hid_generic nvme nvme_core crc32c_intel
usbhid nvme_auth amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
CPU: 23 UID: 0 PID: 993 Comm: kworker/u128:2 Not tainted 6.12.7-arch1-1 #1 9e77c5d99557be92f482a3ac6317d887bb3ffaf9
Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.93 12/02/2024
Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
RIP: 0010:check_flush_dependency+0xfc/0x120
Code: 8b 45 18 48 8d b2 c0 00 00 00 49 89 e8 48 8d 8b c0 00 00 00 48 c7 c7 10 32 0f 8f c6 05 c9 3a 16 02 01 48 89 c2 e8 04 8e fd ff <0f> 0b e9 1f ff ff ff 80 3d b4 3a 16 02 00 75 93 e9 4a ff ff ff 66
RSP: 0018:ffffad3d05bebc68 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffffa0b940050c00 RCX: 0000000000000027
RDX: ffffa0c44e3a18c8 RSI: 0000000000000001 RDI: ffffa0c44e3a18c0
RBP: ffffffffc0370b00 R08: 0000000000000000 R09: ffffad3d05bebae8
R10: ffffa0c47dcd1768 R11: 0000000000000003 R12: ffffa0b94907a140
R13: ffffa0b94ac1d2c0 R14: ffffad3d05bebc98 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffffa0c44e380000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ca7028fe1dc CR3: 0000000c05a22000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
? check_flush_dependency+0xfc/0x120
? __warn.cold+0x93/0xf6
? check_flush_dependency+0xfc/0x120
? report_bug+0xff/0x140
? console_unlock+0x9d/0x140
? handle_bug+0x58/0x90
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
? check_flush_dependency+0xfc/0x120
? check_flush_dependency+0xfc/0x120
__flush_work+0x110/0x2c0
cancel_delayed_work_sync+0x5e/0x80
amdgpu_gfx_off_ctrl+0xad/0x140 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
amdgpu_ring_alloc+0x40/0x60 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
amdgpu_ib_schedule+0xf0/0x730 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
amdgpu_job_run+0x8e/0x1f0 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
drm_sched_run_job_work+0x259/0x3f0 [gpu_sched 29c1ee69cf658188cc04c71397c34eee6e156b12]
process_one_work+0x17b/0x330
worker_thread+0x2ce/0x3f0
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
6.12.1.arch1 with 58a261bfc96763a851cb48b203ed57da37e157b8 reverted which required 23d16ede33a4db4973468bf6652a09da5efd1468 to be reverted first:
I also built 6.12.7 with these two commits reverted but it too did not help. Most of the time, journalctl is populated with warnings like above, but once every day or so, the display goes black/failing to detect a signal. A reboot fixes it but there appears to be no other recovery option. The journalctl output different when this occurs:
BUG: unable to handle page fault for address: ffffec4c967ee0b4
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 185dd41067 P4D 185dd41067 PUD 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 25 UID: 0 PID: 8893 Comm: kworker/u130:7 Tainted: G W 6.12.6-arch1-1 #1 be8168881006593767299fff7299891c69c41600
Tainted: [W]=WARN
Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.93 12/02/2024
Workqueue: writeback wb_workfn (flush-259:0)
RIP: 0010:filemap_get_folios_tag+0xc0/0x240
Code: 00 00 74 e8 49 81 ff 02 04 00 00 0f 84 10 01 00 00 4d 85 ff 0f 84 4b 01 00 00 41 f6 c7 01 75 c3 e8 a5 1e e5 ff 0f 1f 44 00 00 <41> 8b 47 34 85 c0 0f 84 e6 00 00 00 41 8b 47 34 85 c0 0f 84 da 00
RSP: 0018:ffffb3894e04b6e8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000051e
RDX: ffff9d6c4596c280 RSI: ffffec4c967ee080 RDI: 0000000000000023
RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
R10: 0000000000000228 R11: ffffffffffffffff R12: ffffb3894e04b6f0
R13: ffffb3894e04b790 R14: ffffec4d967ee074 R15: ffffec4c967ee080
FS: 0000000000000000(0000) GS:ffff9d825d880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffec4c967ee0b4 CR3: 00000007c5e22000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
? __die_body.cold+0x19/0x27
? page_fault_oops+0x15a/0x2d0
? search_bpf_extables+0x5f/0x80
? exc_page_fault+0x18a/0x190
? asm_exc_page_fault+0x26/0x30
? filemap_get_folios_tag+0xc0/0x240
? filemap_get_folios_tag+0xbb/0x240
mpage_prepare_extent_to_map+0x109/0x4d0 [ext4 3b603f7da8dbf49224f63ea4207efa6e0905036e]
ext4_do_writepages+0x331/0xc50 [ext4 3b603f7da8dbf49224f63ea4207efa6e0905036e]
ext4_writepages+0xad/0x170 [ext4 3b603f7da8dbf49224f63ea4207efa6e0905036e]
do_writepages+0x7e/0x270
__writeback_single_inode+0x41/0x340
? wbc_detach_inode+0x116/0x240
writeback_sb_inodes+0x21d/0x4e0
__writeback_inodes_wb+0x4c/0xf0
wb_writeback+0x193/0x310
wb_workfn+0x34b/0x440
process_one_work+0x17b/0x330
worker_thread+0x2ce/0x3f0
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: overlay dm_crypt cbc encrypted_keys trusted asn1_encoder tee amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg sha512_ssse3 snd_intel_sdw_acpi sha256_ssse3 snd_hda_codec sha1_ssse3 aesni_intel snd_hda_core spd5118 gf128mul ip6t_REJECT snd_hwdep nf_reject_ipv6 crypto_simd sp5100_tco snd_pcm r8169 cryptd xt_hl snd_timer realtek ip6t_rt wmi_bmof snd i2c_piix4 mdio_devres rapl ccp soundcore i2c_smbus pcspkr cfg80211 libphy ipt_REJECT mousedev nf_reject_ipv4 joydev rfkill xt_multiport gpio_amdpt xt_comment gpio_generic mac_hid xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter xt_iprange xt_mark xt_NFQUEUE k10temp nct6683 dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2
hid_microsoft ff_memless hid_generic nvme crc32c_intel nvme_core usbhid nvme_auth amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
CR2: ffffec4c967ee0b4
---[ end trace 0000000000000000 ]---
RIP: 0010:filemap_get_folios_tag+0xc0/0x240
Code: 00 00 74 e8 49 81 ff 02 04 00 00 0f 84 10 01 00 00 4d 85 ff 0f 84 4b 01 00 00 41 f6 c7 01 75 c3 e8 a5 1e e5 ff 0f 1f 44 00 00 <41> 8b 47 34 85 c0 0f 84 e6 00 00 00 41 8b 47 34 85 c0 0f 84 da 00
RSP: 0018:ffffb3894e04b6e8 EFLAGS: 00010202
RAX: 0000000000000002 RBX: 0000000000000000 RCX: 000000000000051e
RDX: ffff9d6c4596c280 RSI: ffffec4c967ee080 RDI: 0000000000000023
RBP: ffffffffffffffff R08: ffffffffffffffff R09: 0000000000000000
R10: 0000000000000228 R11: ffffffffffffffff R12: ffffb3894e04b6f0
R13: ffffb3894e04b790 R14: ffffec4d967ee074 R15: ffffec4c967ee080
FS: 0000000000000000(0000) GS:ffff9d825d880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffec4c967ee0b4 CR3: 00000007c5e22000 CR4: 0000000000f50ef0
PKRU: 55555554
note: kworker/u130:7[8893] exited with irqs disabled
------------[ cut here ]------------
WARNING: CPU: 25 PID: 8893 at kernel/exit.c:886 do_exit+0x8d3/0xad0
Modules linked in: overlay dm_crypt cbc encrypted_keys trusted asn1_encoder tee amd_atl intel_rapl_msr intel_rapl_common snd_hda_codec_realtek kvm_amd snd_hda_codec_generic snd_hda_scodec_component kvm snd_hda_codec_hdmi crct10dif_pclmul crc32_pclmul polyval_clmulni snd_hda_intel polyval_generic ghash_clmulni_intel snd_intel_dspcfg sha512_ssse3 snd_intel_sdw_acpi sha256_ssse3 snd_hda_codec sha1_ssse3 aesni_intel snd_hda_core spd5118 gf128mul ip6t_REJECT snd_hwdep nf_reject_ipv6 crypto_simd sp5100_tco snd_pcm r8169 cryptd xt_hl snd_timer realtek ip6t_rt wmi_bmof snd i2c_piix4 mdio_devres rapl ccp soundcore i2c_smbus pcspkr cfg80211 libphy ipt_REJECT mousedev nf_reject_ipv4 joydev rfkill xt_multiport gpio_amdpt xt_comment gpio_generic mac_hid xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter xt_iprange xt_mark xt_NFQUEUE k10temp nct6683 dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2
hid_microsoft ff_memless hid_generic nvme crc32c_intel nvme_core usbhid nvme_auth amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
CPU: 25 UID: 0 PID: 8893 Comm: kworker/u130:7 Tainted: G D W 6.12.6-arch1-1 #1 be8168881006593767299fff7299891c69c41600
Tainted: [D]=DIE, [W]=WARN
Hardware name: Micro-Star International Co., Ltd. MS-7E16/X670E GAMING PLUS WIFI (MS-7E16), BIOS 1.93 12/02/2024
Workqueue: writeback wb_workfn (flush-259:0)
RIP: 0010:do_exit+0x8d3/0xad0
Code: f6 e8 e1 e2 ff ff e9 90 fd ff ff 4c 89 e6 bf 05 06 00 00 e8 7f 35 01 00 e9 47 f8 ff ff 48 89 df e8 82 c7 13 00 e9 4a f9 ff ff <0f> 0b e9 a1 f7 ff ff 0f 0b e9 5e f7 ff ff 4c 89 e6 48 89 df e8 d4
RSP: 0018:ffffb3894e04bec8 EFLAGS: 00010282
RAX: 0000000400000000 RBX: ffff9d6c4596c280 RCX: 0000000000000000
RDX: 0000000000000001 RSI: 0000000000002710 RDI: ffff9d6b07a5c200
RBP: ffff9d6b072ca880 R08: 0000000000000000 R09: ffffb3894e04bdb8
R10: ffff9d825dcc9168 R11: 0000000000000003 R12: 0000000000000009
R13: ffff9d6b07a5c200 R14: 0000000000000000 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9d825d880000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffec4c967ee0b4 CR3: 00000007c5e22000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
? do_exit+0x8d3/0xad0
? __warn.cold+0x93/0xf6
? do_exit+0x8d3/0xad0
? report_bug+0xff/0x140
? handle_bug+0x58/0x90
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? do_exit+0x8d3/0xad0
? do_exit+0x6d/0xad0
make_task_dead+0x90/0x90
rewind_stack_and_make_dead+0x16/0x20
RIP: 0000:0x0
Code: Unable to access opcode bytes at 0xffffffffffffffd6.
RSP: 0000:0000000000000000 EFLAGS: 00000000 ORIG_RAX: 0000000000000000
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
</TASK>
---[ end trace 0000000000000000 ]---
EDIT: something else I noticed is that using the iGPU with this is affecting the stability of the system. For example, I cannot compile chromium. I get coredumps. If I disable the iGPU and use my Radeon PCIe card, it is rock solid.
Last edited by graysky (2024-12-28 23:26:24)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
graysky different issue with similar/the same symptoms if the revert does not help? Was the issue introduced with 6.12? Can you bisect?
Offline
I'd like to but have no last-good commit as this is a new machine
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
This issue is confirmed on a different motherboard (X870E based vs X670E based), see below. Here again, the monitor goes dead, but switching to another tty brings it back to life.
@loqs - since the hardware is new to me, do you have a suggestion for a starting point for a bisect? Using a guess-and-check for that seems crude. Given that the hardware is pretty new, it could be that it never worked correctly.
------------[ cut here ]------------
workqueue: WQ_MEM_RECLAIM sdma0:drm_sched_run_job_work [gpu_sched] is flushing !WQ_MEM_RECLAIM events:amdgpu_device_delay_enable_gfx_off [amdgpu]
WARNING: CPU: 11 PID: 472093 at kernel/workqueue.c:3704 check_flush_dependency+0xfc/0x120
Modules linked in: overlay amd_atl intel_rapl_msr intel_rapl_common kvm_amd kvm crct10dif_pclmul crc32_pclmul snd_hda_codec_hdmi polyval_clmulni snd_hda_intel polyval_generic snd_intel_dspcfg ghash_clmulni_intel snd_intel_sdw_a>
crc32c_intel nvme_core usbhid nvme_auth amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
CPU: 11 UID: 0 PID: 472093 Comm: kworker/u128:2 Not tainted 6.12.7-arch1-1 #1 9e77c5d99557be92f482a3ac6317d887bb3ffaf9
Hardware name: Micro-Star International Co., Ltd. MS-7E49/MPG X870E CARBON WIFI (MS-7E49), BIOS 1.A21 12/18/2024
Workqueue: sdma0 drm_sched_run_job_work [gpu_sched]
RIP: 0010:check_flush_dependency+0xfc/0x120
Code: 8b 45 18 48 8d b2 c0 00 00 00 49 89 e8 48 8d 8b c0 00 00 00 48 c7 c7 10 32 2f b9 c6 05 c9 3a 16 02 01 48 89 c2 e8 04 8e fd ff <0f> 0b e9 1f ff ff ff 80 3d b4 3a 16 02 00 75 93 e9 4a ff ff ff 66
RSP: 0018:ffffb8a7e37cfc68 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffff9f65c0050c00 RCX: 0000000000000027
RDX: ffff9f7c3f5a18c8 RSI: 0000000000000001 RDI: ffff9f7c3f5a18c0
RBP: ffffffffc0436b00 R08: 0000000000000000 R09: ffffb8a7e37cfae8
R10: ffff9f7c9dcca668 R11: 0000000000000003 R12: ffff9f66197da140
R13: ffff9f65caa3c540 R14: ffffb8a7e37cfc98 R15: 0000000000000001
FS: 0000000000000000(0000) GS:ffff9f7c3f580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000059648cd68bd8 CR3: 0000000360674000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
<TASK>
? check_flush_dependency+0xfc/0x120
? __warn.cold+0x93/0xf6
? check_flush_dependency+0xfc/0x120
? report_bug+0xff/0x140
? console_unlock+0x9d/0x140
? handle_bug+0x58/0x90
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? __pfx_amdgpu_device_delay_enable_gfx_off+0x10/0x10 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
? check_flush_dependency+0xfc/0x120
? check_flush_dependency+0xfc/0x120
__flush_work+0x110/0x2c0
cancel_delayed_work_sync+0x5e/0x80
amdgpu_gfx_off_ctrl+0xad/0x140 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
amdgpu_ring_alloc+0x40/0x60 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
amdgpu_ib_schedule+0xf0/0x730 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
amdgpu_job_run+0x8e/0x1f0 [amdgpu e86c125fc0d1d107466a775e7b7301e5c757afc6]
drm_sched_run_job_work+0x259/0x3f0 [gpu_sched 29c1ee69cf658188cc04c71397c34eee6e156b12]
process_one_work+0x17b/0x330
worker_thread+0x2ce/0x3f0
? __pfx_worker_thread+0x10/0x10
kthread+0xcf/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x31/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
---[ end trace 0000000000000000 ]---
Last edited by graysky (2025-01-01 15:39:28)
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
I'd also say a different issue... did you test linux-mainline/-git or so? FWIW in most cases I've seen so far that WQ message is a red herring that has no functional impact. (I also have that since the 6.12 kernels guaranteed once on a boot, absolutely no functional effects)
Last edited by V1del (2025-01-02 18:24:38)
Online
I have not tested linux-mainline. Just using an old PCIe card now/disabled on-board.
CPU-optimized Linux-ck packages @ Repo-ck • AUR packages • Zsh and other configs
Offline
Here you can find a precompiled version of the mainline kernel if you want to try it:
sudo pacman -U https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.13rc5-1-x86_64.pkg.tar.zst
Offline