You are not logged in.
1. try to disable c-states in the UEFI
2. test a completely different software stack (eg. *slightly* older ubuntu live system) and see whether you get the GPU to reset there. The other thread is pretty similar, so this might actually be a software problem (likely kernel or mesa)
Offline
1. try to disable c-states in the UEFI
2. test a completely different software stack (eg. *slightly* older ubuntu live system) and see whether you get the GPU to reset there. The other thread is pretty similar, so this might actually be a software problem (likely kernel or mesa)
I am not sure if I have C-State in my BIOS. These are the pages with power related settings, is any of this what you mean?
These are two images of the BIOS settings:
- https://0x0.st/86-2.jpg
- https://0x0.st/86-p.jpg
is popOS! 22.04 good? or shall I take ubuntu 22.04? or more recent?
Offline
is popOS! 22.04 good? or shall I take ubuntu 22.04? or more recent?
It really doesn't matter, just make sure that HW acceleration is actually invoked.
Nothing there looks like c-states, ppc-adjustment might be interesting…
Offline
is popOS! 22.04 good? or shall I take ubuntu 22.04? or more recent?
It really doesn't matter, just make sure that HW acceleration is actually invoked.
Nothing there looks like c-states, ppc-adjustment might be interesting…
I ran popOS! 22.04 LTS, and I got interesting results.
- if I ran a youtube video on firefox, it works without a problem. (with hardware acceleration enabled)
- when I ran a video using mpv --hwdec=auto, it crashes.
Last edited by prankish (2025-06-15 19:01:30)
Offline
Means FF on popOS! was likely simply not accelerated?
It also means there is *very* much likely a problem with the hardware. I suspect the moment the APU draws some power and starves the CPU.
Can you cause likewise freezes w/o the video HWA, eg. by running some more demanding GL/vulkan demo? (or game)?
Have you looked into the PPC adjustment setting?
Offline
Means FF on popOS! was likely simply not accelerated?
It also means there is *very* much likely a problem with the hardware. I suspect the moment the APU draws some power and starves the CPU.Can you cause likewise freezes w/o the video HWA, eg. by running some more demanding GL/vulkan demo? (or game)?
Have you looked into the PPC adjustment setting?
I ran https://benchmark.unigine.com/superposition and https://aur.archlinux.org/packages/gputest, they both ran fine without any crashes.
I look into the PPC adjustment settings and see if I see any differences.
Offline
There's https://bbs.archlinux.org/viewtopic.php?id=306510 - I don't quite yet know how to make sense out of it, but apparently the crash only happens when an external output is in use.
Can you confirm that?
Offline
There's https://bbs.archlinux.org/viewtopic.php?id=306510 - I don't quite yet know how to make sense out of it, but apparently the crash only happens when an external output is in use.
Can you confirm that?
I have tried the different P_States but that didn't change anything.
I have a minipc and only an integrated gpu, so there is only 1 way to connect it to a monitor.
Last edited by prankish (2025-06-27 17:18:37)
Offline
Please check whether the monitor is 10bpc (edid-decode, xrandr --props will list that, too) - do you have another monitor?
Offline
Please check whether the monitor is 10bpc (edid-decode, xrandr --props will list that, too) - do you have another monitor?
I am using a single monitor
❯ find /sys/class/drm/*/edid -exec edid-decode {} \; 2>/dev/null
EDID of '/sys/class/drm/card1-DP-1/edid' was empty.
EDID of '/sys/class/drm/card1-DP-2/edid' was empty.
EDID of '/sys/class/drm/card1-DP-3/edid' was empty.
EDID of '/sys/class/drm/card1-DP-4/edid' was empty.
EDID of '/sys/class/drm/card1-DP-5/edid' was empty.
EDID of '/sys/class/drm/card1-DP-6/edid' was empty.
EDID of '/sys/class/drm/card1-DP-7/edid' was empty.
edid-decode (hex):
00 ff ff ff ff ff ff 00 10 ac 96 a1 4c 43 49 30
0c 1f 01 03 80 3c 22 78 ea 50 95 a8 54 4e a5 26
0f 50 54 a5 4b 00 71 4f 81 80 a9 c0 a9 40 d1 c0
e1 00 01 01 01 01 08 e8 00 30 f2 70 5a 80 b0 58
8a 00 55 50 21 00 00 1e 00 00 00 ff 00 43 36 50
31 4d 34 33 0a 20 20 20 20 20 00 00 00 fc 00 44
45 4c 4c 20 53 32 37 32 31 51 53 0a 00 00 00 fd
00 28 3c 82 89 3c 00 0a 20 20 20 20 20 20 01 d0
02 03 45 f1 54 61 01 02 03 04 05 06 07 10 11 12
14 15 16 1f 20 21 5d 5e 5f 23 09 07 07 83 01 00
00 6d 03 0c 00 10 00 38 44 20 00 60 03 02 01 67
d8 5d c4 01 78 80 01 e4 0f 01 00 00 68 1a 00 00
01 01 28 3c e6 56 5e 00 a0 a0 a0 29 50 30 20 35
00 55 50 21 00 00 1a 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 2e
----------------
Block 0, Base EDID:
EDID Structure Version & Revision: 1.3
Vendor & Product Identification:
Manufacturer: DEL
Model: 41366
Serial Number: 810107724 (0x3049434c)
Made in: week 12 of 2021
Basic Display Parameters & Features:
Digital display
Maximum image size: 60 cm x 34 cm
Gamma: 2.20
DPMS levels: Standby Suspend Off
RGB color display
First detailed timing is the preferred timing
Color Characteristics:
Red : 0.6572, 0.3291
Green: 0.3046, 0.6445
Blue : 0.1503, 0.0595
White: 0.3134, 0.3291
Established Timings I & II:
IBM : 720x400 70.081663 Hz 9:5 31.467 kHz 28.320000 MHz
DMT 0x04: 640x480 59.940476 Hz 4:3 31.469 kHz 25.175000 MHz
DMT 0x06: 640x480 75.000000 Hz 4:3 37.500 kHz 31.500000 MHz
DMT 0x09: 800x600 60.316541 Hz 4:3 37.879 kHz 40.000000 MHz
DMT 0x0b: 800x600 75.000000 Hz 4:3 46.875 kHz 49.500000 MHz
DMT 0x10: 1024x768 60.003840 Hz 4:3 48.363 kHz 65.000000 MHz
DMT 0x12: 1024x768 75.028582 Hz 4:3 60.023 kHz 78.750000 MHz
DMT 0x24: 1280x1024 75.024675 Hz 5:4 79.976 kHz 135.000000 MHz
Standard Timings:
DMT 0x15: 1152x864 75.000000 Hz 4:3 67.500 kHz 108.000000 MHz
DMT 0x23: 1280x1024 60.019740 Hz 5:4 63.981 kHz 108.000000 MHz
DMT 0x53: 1600x900 60.000000 Hz 16:9 60.000 kHz 108.000000 MHz (RB)
DMT 0x33: 1600x1200 60.000000 Hz 4:3 75.000 kHz 162.000000 MHz
DMT 0x52: 1920x1080 60.000000 Hz 16:9 67.500 kHz 148.500000 MHz
GTF : 2048x1280 60.000000 Hz 16:10 79.500 kHz 221.328000 MHz
Detailed Timing Descriptors:
DTD 1: 3840x2160 60.000000 Hz 16:9 135.000 kHz 594.000000 MHz (597 mm x 336 mm)
Hfront 176 Hsync 88 Hback 296 Hpol P
Vfront 8 Vsync 10 Vback 72 Vpol P
Display Product Serial Number: 'C6P1M43'
Display Product Name: 'DELL S2721QS'
Display Range Limits:
Monitor ranges (GTF): 40-60 Hz V, 130-137 kHz H, max dotclock 600 MHz
Extension blocks: 1
Checksum: 0xd0
----------------
Block 1, CTA-861 Extension Block:
Revision: 3
Underscans IT Video Formats by default
Basic audio support
Supports YCbCr 4:4:4
Supports YCbCr 4:2:2
Native detailed modes: 1
Video Data Block:
VIC 97: 3840x2160 60.000000 Hz 16:9 135.000 kHz 594.000000 MHz
VIC 1: 640x480 59.940476 Hz 4:3 31.469 kHz 25.175000 MHz
VIC 2: 720x480 59.940060 Hz 4:3 31.469 kHz 27.000000 MHz
VIC 3: 720x480 59.940060 Hz 16:9 31.469 kHz 27.000000 MHz
VIC 4: 1280x720 60.000000 Hz 16:9 45.000 kHz 74.250000 MHz
VIC 5: 1920x1080i 60.000000 Hz 16:9 33.750 kHz 74.250000 MHz
VIC 6: 1440x480i 59.940060 Hz 4:3 15.734 kHz 27.000000 MHz
VIC 7: 1440x480i 59.940060 Hz 16:9 15.734 kHz 27.000000 MHz
VIC 16: 1920x1080 60.000000 Hz 16:9 67.500 kHz 148.500000 MHz
VIC 17: 720x576 50.000000 Hz 4:3 31.250 kHz 27.000000 MHz
VIC 18: 720x576 50.000000 Hz 16:9 31.250 kHz 27.000000 MHz
VIC 20: 1920x1080i 50.000000 Hz 16:9 28.125 kHz 74.250000 MHz
VIC 21: 1440x576i 50.000000 Hz 4:3 15.625 kHz 27.000000 MHz
VIC 22: 1440x576i 50.000000 Hz 16:9 15.625 kHz 27.000000 MHz
VIC 31: 1920x1080 50.000000 Hz 16:9 56.250 kHz 148.500000 MHz
VIC 32: 1920x1080 24.000000 Hz 16:9 27.000 kHz 74.250000 MHz
VIC 33: 1920x1080 25.000000 Hz 16:9 28.125 kHz 74.250000 MHz
VIC 93: 3840x2160 24.000000 Hz 16:9 54.000 kHz 297.000000 MHz
VIC 94: 3840x2160 25.000000 Hz 16:9 56.250 kHz 297.000000 MHz
VIC 95: 3840x2160 30.000000 Hz 16:9 67.500 kHz 297.000000 MHz
Audio Data Block:
Linear PCM:
Max channels: 2
Supported sample rates (kHz): 48 44.1 32
Supported sample sizes (bits): 24 20 16
Speaker Allocation Data Block:
FL/FR - Front Left/Right
Vendor-Specific Data Block (HDMI), OUI 00-0C-03:
Source physical address: 1.0.0.0
DC_36bit
DC_30bit
DC_Y444
Maximum TMDS clock: 340 MHz
Extended HDMI video details:
HDMI VICs:
HDMI VIC 3: 3840x2160 24.000000 Hz 16:9 54.000 kHz 297.000000 MHz
HDMI VIC 2: 3840x2160 25.000000 Hz 16:9 56.250 kHz 297.000000 MHz
HDMI VIC 1: 3840x2160 30.000000 Hz 16:9 67.500 kHz 297.000000 MHz
Vendor-Specific Data Block (HDMI Forum), OUI C4-5D-D8:
Version: 1
Maximum TMDS Character Rate: 600 MHz
SCDC Present
Supports 10-bits/component Deep Color 4:2:0 Pixel Encoding
YCbCr 4:2:0 Capability Map Data Block:
VIC 97: 3840x2160 60.000000 Hz 16:9 135.000 kHz 594.000000 MHz
Vendor-Specific Data Block (AMD), OUI 00-00-1A:
Version: 1
Feature Caps: 0x01
Minimum Refresh Rate: 40 Hz
Maximum Refresh Rate: 60 Hz
Flags 1.x: 0xe6 (MCCS)
Detailed Timing Descriptors:
DTD 2: 2560x1440 59.950550 Hz 16:9 88.787 kHz 241.500000 MHz (597 mm x 336 mm)
Hfront 48 Hsync 32 Hback 80 Hpol P
Vfront 3 Vsync 5 Vback 33 Vpol N
Checksum: 0x2e Unused space in Extension Block: 40 bytes
EDID of '/sys/class/drm/card1-HDMI-A-2/edid' was empty.
EDID of '/sys/class/drm/card1-Writeback-1/edid' was empty.
Offline
Supports 10-bits/component Deep Color 4:2:0 Pixel Encoding
No, the idea was whether you've another monitor - instead of a fancy 4k wide-gamut display, maybe some sad 1280x1042 office panel from 2008 or so… the other thread links this very much to that specific monitor and yours is 10bpc capable as well …
Offline
Supports 10-bits/component Deep Color 4:2:0 Pixel Encoding
No, the idea was whether you've another monitor - instead of a fancy 4k wide-gamut display, maybe some sad 1280x1042 office panel from 2008 or so… the other thread links this very much to that specific monitor and yours is 10bpc capable as well …
I think I might have an old one lying somewhere, I will give it a try and see what I get.
Offline
Supports 10-bits/component Deep Color 4:2:0 Pixel Encoding
No, the idea was whether you've another monitor - instead of a fancy 4k wide-gamut display, maybe some sad 1280x1042 office panel from 2008 or so… the other thread links this very much to that specific monitor and yours is 10bpc capable as well …
I tried a Samsung 2494HM screen, but I got the same issue.
Offline
Just to be sure, can you post the edid of that monitor as well?
Offline
Just to be sure, can you post the edid of that monitor as well?
I found an even older monitor, but it also crashed.
the edid of the oldest monitor I tried is below
❯ find /sys/class/drm/*/edid -exec edid-decode {} \; 2>/dev/null
EDID of '/sys/class/drm/card1-DP-1/edid' was empty.
EDID of '/sys/class/drm/card1-DP-2/edid' was empty.
EDID of '/sys/class/drm/card1-DP-3/edid' was empty.
EDID of '/sys/class/drm/card1-DP-4/edid' was empty.
EDID of '/sys/class/drm/card1-DP-5/edid' was empty.
EDID of '/sys/class/drm/card1-DP-6/edid' was empty.
EDID of '/sys/class/drm/card1-DP-7/edid' was empty.
edid-decode (hex):
00 ff ff ff ff ff ff 00 4c 2d 6d 07 45 43 32 42
26 14 01 03 80 30 1b 78 2a 78 f1 a6 55 48 9b 26
12 50 54 bf ef 80 71 4f 81 00 81 40 81 80 95 00
95 0f a9 40 b3 00 02 3a 80 18 71 38 2d 40 58 2c
45 00 dd 0c 11 00 00 1e 01 1d 00 72 51 d0 1e 20
6e 28 55 00 dd 0c 11 00 00 1e 00 00 00 fd 00 38
4b 1e 51 11 00 0a 20 20 20 20 20 20 00 00 00 fc
00 53 4d 42 58 32 32 33 31 0a 20 20 20 20 01 39
02 03 1c f1 48 90 04 1f 05 14 13 12 03 23 09 07
07 83 01 00 00 66 03 0c 00 10 00 80 01 1d 80 d0
72 1c 16 20 10 2c 25 80 dd 0c 11 00 00 9e 01 1d
80 18 71 1c 16 20 58 2c 25 00 dd 0c 11 00 00 9e
01 1d 00 bc 52 d0 1e 20 b8 28 55 40 dd 0c 11 00
00 1e 8c 0a d0 90 20 40 31 20 0c 40 55 00 dd 0c
11 00 00 18 8c 0a d0 8a 20 e0 2d 10 10 3e 96 00
dd 0c 11 00 00 18 00 00 00 00 00 00 00 00 00 46
----------------
Block 0, Base EDID:
EDID Structure Version & Revision: 1.3
Vendor & Product Identification:
Manufacturer: SAM
Model: 1901
Serial Number: 1110590277 (0x42324345)
Made in: week 38 of 2010
Basic Display Parameters & Features:
Digital display
Maximum image size: 48 cm x 27 cm
Gamma: 2.20
DPMS levels: Off
RGB color display
First detailed timing is the preferred timing
Color Characteristics:
Red : 0.6494, 0.3349
Green: 0.2832, 0.6054
Blue : 0.1513, 0.0732
White: 0.3125, 0.3291
Established Timings I & II:
IBM : 720x400 70.081663 Hz 9:5 31.467 kHz 28.320000 MHz
DMT 0x04: 640x480 59.940476 Hz 4:3 31.469 kHz 25.175000 MHz
Apple : 640x480 66.666667 Hz 4:3 35.000 kHz 30.240000 MHz
DMT 0x05: 640x480 72.808802 Hz 4:3 37.861 kHz 31.500000 MHz
DMT 0x06: 640x480 75.000000 Hz 4:3 37.500 kHz 31.500000 MHz
DMT 0x08: 800x600 56.250000 Hz 4:3 35.156 kHz 36.000000 MHz
DMT 0x09: 800x600 60.316541 Hz 4:3 37.879 kHz 40.000000 MHz
DMT 0x0a: 800x600 72.187572 Hz 4:3 48.077 kHz 50.000000 MHz
DMT 0x0b: 800x600 75.000000 Hz 4:3 46.875 kHz 49.500000 MHz
Apple : 832x624 74.551266 Hz 4:3 49.726 kHz 57.284000 MHz
DMT 0x10: 1024x768 60.003840 Hz 4:3 48.363 kHz 65.000000 MHz
DMT 0x11: 1024x768 70.069359 Hz 4:3 56.476 kHz 75.000000 MHz
DMT 0x12: 1024x768 75.028582 Hz 4:3 60.023 kHz 78.750000 MHz
DMT 0x24: 1280x1024 75.024675 Hz 5:4 79.976 kHz 135.000000 MHz
Apple : 1152x870 75.061550 Hz 192:145 68.681 kHz 100.000000 MHz
Standard Timings:
DMT 0x15: 1152x864 75.000000 Hz 4:3 67.500 kHz 108.000000 MHz
DMT 0x1c: 1280x800 59.810326 Hz 16:10 49.702 kHz 83.500000 MHz
DMT 0x20: 1280x960 60.000000 Hz 4:3 60.000 kHz 108.000000 MHz
DMT 0x23: 1280x1024 60.019740 Hz 5:4 63.981 kHz 108.000000 MHz
DMT 0x2f: 1440x900 59.887445 Hz 16:10 55.935 kHz 106.500000 MHz
DMT 0x30: 1440x900 74.984427 Hz 16:10 70.635 kHz 136.750000 MHz
DMT 0x33: 1600x1200 60.000000 Hz 4:3 75.000 kHz 162.000000 MHz
DMT 0x3a: 1680x1050 59.954250 Hz 16:10 65.290 kHz 146.250000 MHz
Detailed Timing Descriptors:
DTD 1: 1920x1080 60.000000 Hz 16:9 67.500 kHz 148.500000 MHz (477 mm x 268 mm)
Hfront 88 Hsync 44 Hback 148 Hpol P
Vfront 4 Vsync 5 Vback 36 Vpol P
DTD 2: 1280x720 60.000000 Hz 16:9 45.000 kHz 74.250000 MHz (477 mm x 268 mm)
Hfront 110 Hsync 40 Hback 220 Hpol P
Vfront 5 Vsync 5 Vback 20 Vpol P
Display Range Limits:
Monitor ranges (GTF): 56-75 Hz V, 30-81 kHz H, max dotclock 170 MHz
Display Product Name: 'SMBX2231'
Extension blocks: 1
Checksum: 0x39
----------------
Block 1, CTA-861 Extension Block:
Revision: 3
Underscans IT Video Formats by default
Basic audio support
Supports YCbCr 4:4:4
Supports YCbCr 4:2:2
Native detailed modes: 1
Video Data Block:
VIC 16: 1920x1080 60.000000 Hz 16:9 67.500 kHz 148.500000 MHz (native)
VIC 4: 1280x720 60.000000 Hz 16:9 45.000 kHz 74.250000 MHz
VIC 31: 1920x1080 50.000000 Hz 16:9 56.250 kHz 148.500000 MHz
VIC 5: 1920x1080i 60.000000 Hz 16:9 33.750 kHz 74.250000 MHz
VIC 20: 1920x1080i 50.000000 Hz 16:9 28.125 kHz 74.250000 MHz
VIC 19: 1280x720 50.000000 Hz 16:9 37.500 kHz 74.250000 MHz
VIC 18: 720x576 50.000000 Hz 16:9 31.250 kHz 27.000000 MHz
VIC 3: 720x480 59.940060 Hz 16:9 31.469 kHz 27.000000 MHz
Audio Data Block:
Linear PCM:
Max channels: 2
Supported sample rates (kHz): 48 44.1 32
Supported sample sizes (bits): 24 20 16
Speaker Allocation Data Block:
FL/FR - Front Left/Right
Vendor-Specific Data Block (HDMI), OUI 00-0C-03:
Source physical address: 1.0.0.0
Supports_AI
Detailed Timing Descriptors:
DTD 3: 1920x1080i 50.000000 Hz 16:9 28.125 kHz 74.250000 MHz (477 mm x 268 mm)
Hfront 528 Hsync 44 Hback 148 Hpol P
Vfront 2 Vsync 5 Vback 15 Vpol P Vfront +0.5 Odd Field
Vfront 2 Vsync 5 Vback 15 Vpol P Vback +0.5 Even Field
DTD 4: 1920x1080i 60.000000 Hz 16:9 33.750 kHz 74.250000 MHz (477 mm x 268 mm)
Hfront 88 Hsync 44 Hback 148 Hpol P
Vfront 2 Vsync 5 Vback 15 Vpol P Vfront +0.5 Odd Field
Vfront 2 Vsync 5 Vback 15 Vpol P Vback +0.5 Even Field
DTD 5: 1280x720 50.000000 Hz 16:9 37.500 kHz 74.250000 MHz (477 mm x 268 mm)
Hfront 440 Hsync 40 Hback 220 Hpol P
Vfront 5 Vsync 5 Vback 20 Vpol P
DTD 6: 720x576 50.000000 Hz 5:4 31.250 kHz 27.000000 MHz (477 mm x 268 mm)
Hfront 12 Hsync 64 Hback 68 Hpol N
Vfront 5 Vsync 5 Vback 39 Vpol N
DTD 7: 720x480 59.940060 Hz 3:2 31.469 kHz 27.000000 MHz (477 mm x 268 mm)
Hfront 16 Hsync 62 Hback 60 Hpol N
Vfront 9 Vsync 6 Vback 30 Vpol N
Checksum: 0x46 Unused space in Extension Block: 9 bytes
EDID of '/sys/class/drm/card1-HDMI-A-2/edid' was empty.
EDID of '/sys/class/drm/card1-Writeback-1/edid' was empty.
Offline
Just to be sure, can you post the edid of that monitor as well?
I have upgraded to kernel `Linux 6.16.8-arch3-1` because I heard it included some gpu fixes, but still no luck.
Is this something I should be posting on mesa's issue tracker or linux's amdgpu bug tracker?`
Offline
https://aur.archlinux.org/packages/linux-drm-tip-git
Since you still experience the symptoms of https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 but w/ vcn_v4_0 while that bug (and apparently the specific patch) was for vcn_v4_0_5 you should report this upstream and point that out - maybe the same patch needs to be extended to some vcn_v4_0 code branch.
Edit: the monitor was moot, btw - the old one doesn't have fancy 10bpp
Last edited by seth (2025-09-29 08:17:31)
Offline
https://aur.archlinux.org/packages/linux-drm-tip-git
Since you still experience the symptoms of https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 but w/ vcn_v4_0 while that bug (and apparently the specific patch) was for vcn_v4_0_5 you should report this upstream and point that out - maybe the same patch needs to be extended to some vcn_v4_0 code branch.Edit: the monitor was moot, btw - the old one doesn't have fancy 10bpp
when you say upstream, do you mean mesa or the linux kernel?
and I would really appreciate it a lot if you could help me out with what I should include in the report, as I do not know what is relevant to include in the report.
Offline
https://gitlab.freedesktop.org/mesa/mesa/-/issues/ is the relevant bugtracker here, amd handles their bugs there (from all I can tell, you might get responses on the kernel bugzilla or lkml as well)
Explain the context (FF video playback) and symptoms, reference this thread, https://bbs.archlinux.org/viewtopic.php?id=306025 the other, seemingly related, AMDGPU bug https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 your journal https://0x0.st/8EjZ.txt and post them
Jun 03 19:27:32 prankish kernel: [drm] Fence fallback timer expired on ring vcn_unified_0
Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State
Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed
Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=83, emitted seq=86
Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
Jun 03 19:27:42 prankish kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Jun 03 19:27:42 prankish kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000080 != 0x00000000n
Jun 03 19:27:42 prankish kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MODE2 reset
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 03 19:27:42 prankish kernel: [drm] PCIE GART of 512M enabled (table at 0x000000803FD00000).
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resuming...
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resumed successfully!
Jun 03 19:27:42 prankish kernel: [drm] DMUB hardware initialized: version=0x08004E00
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:237
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:245
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:253
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:261
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0
Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset(8) succeeded!
and
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=44, emitted seq=47
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Process information: process mpv pid 5635 thread mpv:cs0 pid 5651
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MODE2 reset
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset succeeded, trying to resume
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resuming...
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resumed successfully!
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:169 vmid:0 pasid:0)
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x0000000000fff000 from client 18
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00043952
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x1c)
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MORE_FAULTS: 0x0
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: WALKER_ERROR: 0x1
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x5
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x1
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: RW: 0x1
Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_unified_0 test failed (-110)
Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: resume of IP block <vcn_v4_0> failed -110
Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset(6) failed
Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset end with ret = -110
Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU Recovery Failed: -110
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] AMDGPU device coredump file has been created
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=47, emitted seq=47
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Process information: process mpv pid 5635 thread mpv:cs0 pid 5651
Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
They'll most likely ask for the /sys/class/drm/card1/device/devcoredump/data (after such crash)
Answer their questions, be kind, bring a cookie - you're gonna be fine
Offline
https://gitlab.freedesktop.org/mesa/mesa/-/issues/ is the relevant bugtracker here, amd handles their bugs there (from all I can tell, you might get responses on the kernel bugzilla or lkml as well)
Explain the context (FF video playback) and symptoms, reference this thread, https://bbs.archlinux.org/viewtopic.php?id=306025 the other, seemingly related, AMDGPU bug https://gitlab.freedesktop.org/mesa/mesa/-/issues/12528 your journal https://0x0.st/8EjZ.txt and post themJun 03 19:27:32 prankish kernel: [drm] Fence fallback timer expired on ring vcn_unified_0 Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=83, emitted seq=86 Jun 03 19:27:41 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin! Jun 03 19:27:42 prankish kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n Jun 03 19:27:42 prankish kernel: [drm] Register(0) [regUVD_RB_RPTR] failed to reach value 0x00000080 != 0x00000000n Jun 03 19:27:42 prankish kernel: [drm] Register(0) [regUVD_POWER_STATUS] failed to reach value 0x00000001 != 0x00000002n Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MODE2 reset Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset succeeded, trying to resume Jun 03 19:27:42 prankish kernel: [drm] PCIE GART of 512M enabled (table at 0x000000803FD00000). Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resuming... Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resumed successfully! Jun 03 19:27:42 prankish kernel: [drm] DMUB hardware initialized: version=0x08004E00 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:237 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:245 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:253 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: [drm] REG_WAIT timeout 1us * 1000 tries - dcn314_dsc_pg_control line:261 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring gfx_0.0.0 uses VM inv eng 0 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.0.0 uses VM inv eng 1 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.1.0 uses VM inv eng 4 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.2.0 uses VM inv eng 6 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.3.0 uses VM inv eng 7 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.0.1 uses VM inv eng 8 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.1.1 uses VM inv eng 9 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.2.1 uses VM inv eng 10 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring comp_1.3.1 uses VM inv eng 11 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring sdma0 uses VM inv eng 12 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 uses VM inv eng 0 on hub 8 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring jpeg_dec uses VM inv eng 1 on hub 8 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring mes_kiq_3.1.0 uses VM inv eng 13 on hub 0 Jun 03 19:27:42 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset(8) succeeded!
and
Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] AMDGPU device coredump file has been created Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=44, emitted seq=47 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Process information: process mpv pid 5635 thread mpv:cs0 pid 5651 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin! Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MODE2 reset Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset succeeded, trying to resume Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resuming... Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: SMU is resumed successfully! Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [mmhub] page fault (src_id:0 ring:169 vmid:0 pasid:0) Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: in page starting at address 0x0000000000fff000 from client 18 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MMVM_L2_PROTECTION_FAULT_STATUS:0x00043952 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Faulty UTCL2 client ID: unknown (0x1c) Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MORE_FAULTS: 0x0 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: WALKER_ERROR: 0x1 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: PERMISSION_FAULTS: 0x5 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: MAPPING_ERROR: 0x1 Jun 11 18:50:36 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: RW: 0x1 Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_unified_0 test failed (-110) Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: resume of IP block <vcn_v4_0> failed -110 Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset(6) failed Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset end with ret = -110 Jun 11 18:50:37 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU Recovery Failed: -110 Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Dumping IP State Completed Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] AMDGPU device coredump file has been created Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: [drm] Check your /sys/class/drm/card1/device/devcoredump/data Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: ring vcn_unified_0 timeout, signaled seq=47, emitted seq=47 Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: Process information: process mpv pid 5635 thread mpv:cs0 pid 5651 Jun 11 18:50:47 prankish kernel: amdgpu 0000:c4:00.0: amdgpu: GPU reset begin!
They'll most likely ask for the /sys/class/drm/card1/device/devcoredump/data (after such crash)
Answer their questions, be kind, bring a cookie - you're gonna be fine
I greatly appreciate the detailed info seth, I will create an issue there and will post the link here.
If I reach a solution I will also update the thread.
Offline
This is the link for the issue on the mesa issue tracker: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14015
Offline