You are not logged in.
Original post title: AMD ThinkPad iGPU possibly dying - I have a few questions
Update 2024-10-31: See my reply below this for updated information - the issue is very different to what I first believed.
In summary: Due to graphical artefacts and some GPU-related logs, I was convinced that this issue was caused by a dying GPU, but now I realise it may have more to do with audio than graphics.
Device: Lenovo ThinkPad L15 AMD Gen 1 (using a Ryzen 4750U with Vega 7 graphics).
The first crash was on the 20th of August. The second was 8 days after that, and since then they've become more frequent.
On the 11th of September it crashed around 11 times.
A crash (on Linux) takes one of two forms:
Type 1: The system immediately loses power and automatically tries to reboot. Rebooting usually fails at or just after KMS (I have early KMS enabled).
Type 2: The system locks up: Xorg freezes; SysRq doesn't work; usually all logs immediately stop, but sometimes I get some logs (see below); medium-high power draw after hanging; have to hold the power button to turn off the device.
Sometimes - particularly recently - there are brief graphical artifacts right before a crash.
Rarely, the system will begin displaying graphical artifacts (flashing on & off rapidly), and doesn't crash immediately. Despite the obvious problem, there are no logs during this weirdness,
One very important point is that after a random crash, the system goes into a bad state that persists across reboots. Subsequent reboots rarely make it past KMS, and if they do, the system usually crashes before I can log in. Using nomodeset always works, letting me use the system for a while before I'm able to reboot again with my regular configuration.
Even leaving the device powered off doesn't help: on the 25th of October the system crashed, but the next morning I wasn't able to boot past KMS for a while, presumably because of the crash the night before.
This bad state also affects Windows - it will not boot after a crash on Linux.
Total of over 50 random crashes so far (not counting failed reboots).
Logs:
Most crashes leave absolutely no logs, even after waiting for watchdog timeouts (during the Type 2 freezes).
Xorg logs are similarly unhelpful.
I set up kdumpst, but even THAT doesn't trigger during a crash. I have been able to get some information by invoking SysRq immediately after a crash, but the window of opportunity is very small (usually just a few seconds before a complete lockup).
The first crash that left logs (and left me puzzled) was on the 11th of September:
Following the display freeze, my screenshot keybind still changed the mouse cursor, but SysRq didn't work - how is that even possible?
The system obviously wasn't entirely dead, as the journal captured a lot of errors related to various drivers not responding, but it wasn't recoverable:
Sep 11 10:30:26 LenovoL15 kernel: Linux version 6.10.4-arch2-1 (linux@archlinux) (gcc (GCC) 14.2.1 20240805, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Sun, 11 Aug 2024 16:19:06 +0000
Sep 11 10:30:26 LenovoL15 kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=3de0ffac-d140-40b8-9ec8-277bce63f9e1 rw crashkernel=256M loglevel=3 systemd.restore_state=0 video.brightness_switch_enabled=N amdgpu.gpu_recovery=1
...
Sep 11 14:58:57 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:58:57 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:58:58 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:58:58 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:58:59 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:58:59 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:00 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:00 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Error sending STATISTICS_CMD: time out after 2000ms.
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Current CMD queue read_ptr 160 write_ptr 161
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Transport status: 0x0000004A, valid: 6
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Loaded firmware version: 77.c360c4b1.0 cc-a0-77.ucode
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000A210 | trm_hw_status0
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x004F8CE6 | branchlink2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x004EED36 | interruptlink1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x004EED36 | interruptlink2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000B7C8 | data1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x01000000 | data2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | data3
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x24C09696 | beacon time
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x6DC30973 | tsf low
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000006 | tsf hi
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | time gp1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0DA387AD | time gp2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000001 | uCode revision type
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000004D | uCode version major
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0xC360C4B1 | uCode version minor
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000340 | hw version
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00489000 | board version
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x8029F400 | hcmd
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x24020000 | isr0
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00400000 | isr1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x08F0000A | isr2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00C3028C | isr3
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr4
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0315001C | last cmd Id
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000B7C8 | wait_event
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000080 | l2p_control
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00010034 | l2p_duration
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000003F | l2p_mhvalid
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000080 | l2p_addr_match
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000009 | lmpm_pmg_sel
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | timestamp
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00009880 | flow_handler
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Transport status: 0x0000004A, valid: 7
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x20000066 | NMI_INTERRUPT_HOST
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | umac branchlink1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x80455D6E | umac branchlink2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x8047300E | umac interruptlink1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x8047300E | umac interruptlink2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x01000000 | umac data1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x8047300E | umac data2
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | umac data3
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000004D | umac major
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0xC360C4B1 | umac minor
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0DA387AB | frame pointer
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0xC0886260 | stack pointer
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00A0019C | last host cmd
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000000 | isr status reg
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: IML/ROM dump:
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000003 | IML/ROM error/state
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x000063E3 | IML/ROM data1
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: Fseq Registers:
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x60000000 | FSEQ_ERROR_CODE
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x80290021 | FSEQ_TOP_INIT_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00050008 | FSEQ_CNVIO_INIT_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000A503 | FSEQ_OTP_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x80000003 | FSEQ_TOP_CONTENT_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00100530 | FSEQ_CNVI_ID
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000532 | FSEQ_CNVR_ID
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00100530 | CNVI_AUX_MISC_CHIP
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00000532 | CNVR_AUX_MISC_CHIP
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00050008 | FSEQ_PREV_CNVIO_INIT_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x00290021 | FSEQ_WIFI_FSEQ_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x8C00A2A1 | FSEQ_BT_FSEQ_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: 0x000000F0 | FSEQ_CLASS_TP_VERSION
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: UMAC CURRENT PC: 0x80472b1c
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: LMAC1 CURRENT PC: 0xd0
Sep 11 14:59:01 LenovoL15 kernel: iwlwifi 0000:03:00.0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
Sep 11 14:59:01 LenovoL15 kernel: ieee80211 phy0: Hardware restart was requested
Sep 11 14:59:01 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:01 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:02 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:02 LenovoL15 kernel: xhci_hcd 0000:06:00.4: xHCI host not responding to stop endpoint command
Sep 11 14:59:02 LenovoL15 kernel: xhci_hcd 0000:06:00.4: xHCI host controller not responding, assume dead
Sep 11 14:59:02 LenovoL15 kernel: xhci_hcd 0000:06:00.4: HC died; cleaning up
Sep 11 14:59:02 LenovoL15 kernel: usb 4-3: USB disconnect, device number 2
Sep 11 14:59:02 LenovoL15 kernel: xhci_hcd 0000:06:00.4: Timeout while waiting for stop endpoint command
Sep 11 14:59:02 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:03 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:03 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:04 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:04 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:05 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:06 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:06 LenovoL15 kernel: iwlwifi 0000:03:00.0: Queue 3 is stuck 7 22
Sep 11 14:59:06 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:07 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:07 LenovoL15 kernel: amdgpu 0000:06:00.0: [drm] *ERROR* [CRTC:73:crtc-0] flip_done timed out
Sep 11 14:59:08 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:09 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:09 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:10 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:11 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:11 LenovoL15 root[522577]: ACPI action undefined: PNP0C0A:00
Sep 11 14:59:11 LenovoL15 root[522724]: ACPI action undefined: ACPI0003:00
Sep 11 14:59:11 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:12 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:12 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:12 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:12 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:13 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:13 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:13 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:13 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:14 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:14 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:15 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:15 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:16 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:16 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:17 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:17 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:18 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:18 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:18 LenovoL15 root[522984]: ACPI action undefined: PNP0C0A:00
Sep 11 14:59:18 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:18 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:19 LenovoL15 root[523033]: ACPI action undefined: ACPI0003:00
Sep 11 14:59:19 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:19 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:19 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:19 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:20 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:20 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:20 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:20 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:21 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:21 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:21 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:21 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:22 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:22 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:22 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:23 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:23 LenovoL15 kernel: [drm] Fence fallback timer expired on ring sdma0
Sep 11 14:59:23 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:24 LenovoL15 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [rcu_exp_gp_kthr:21]
Sep 11 14:59:24 LenovoL15 kernel: CPU#2 Utilization every 4s during lockup:
Sep 11 14:59:24 LenovoL15 kernel: #1: 101% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:24 LenovoL15 kernel: #2: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:24 LenovoL15 kernel: #3: 100% system, 0% softirq, 1% hardirq, 0% idle
Sep 11 14:59:24 LenovoL15 kernel: #4: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:24 LenovoL15 kernel: #5: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:24 LenovoL15 kernel: Modules linked in: cmac ccm uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common joydev mousedev mc amd_atl intel_rapl_msr intel_rapl_common snd_sof_amd_acp63 nls_iso8859_1 snd_sof_amd_vangogh vfat snd_sof_amd_rembrandt fat snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof kvm_amd snd_sof_utils snd_pci_ps snd_amd_sdw_acpi kvm soundwire_amd soundwire_generic_allocation iwlmvm snd_ctl_led soundwire_bus crct10dif_pclmul snd_hda_codec_realtek snd_soc_core crc32_pclmul snd_hda_codec_generic mac80211 polyval_clmulni snd_compress snd_hda_scodec_component polyval_generic ac97_bus snd_hda_codec_hdmi gf128mul snd_pcm_dmaengine ghash_clmulni_intel sha512_ssse3 snd_hda_intel snd_rpl_pci_acp6x libarc4 sha256_ssse3 snd_intel_dspcfg ptp snd_acp_pci sha1_ssse3 snd_intel_sdw_acpi pps_core aesni_intel snd_acp_legacy_common crypto_simd snd_pci_acp6x snd_hda_codec r8169 btrfs iwlwifi cryptd snd_pci_acp5x snd_hda_core think_lmi snd_rn_pci_acp3x ucsi_acpi rapl
Sep 11 14:59:24 LenovoL15 kernel: firmware_attributes_class realtek psmouse snd_hwdep snd_acp_config blake2b_generic wmi_bmof typec_ucsi mdio_devres cfg80211 snd_soc_acpi sp5100_tco xor snd_pcm ipmi_devintf typec acpi_cpufreq raid6_pq snd_timer snd_pci_acp3x ccp zenpower(OE) ipmi_msghandler libphy i2c_piix4 roles libcrc32c i2c_scmi mac_hid sg crypto_user acpi_call(OE) loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd thinkpad_acpi libps2 rtsx_pci_sdmmc mmc_core sparse_keymap vivaldi_fmap platform_profile nvme snd nvme_core crc32c_intel xhci_pci i8042 soundcore rtsx_pci xhci_pci_renesas nvme_auth serio rfkill amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec
Sep 11 14:59:24 LenovoL15 kernel: CPU: 2 PID: 21 Comm: rcu_exp_gp_kthr Kdump: loaded Tainted: G OE 6.10.4-arch2-1 #1 517ed45cc9c4492ee5d5bfc2d2fe6ef1f2e7a8eb
Sep 11 14:59:24 LenovoL15 kernel: Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET49W (1.33 ) 03/27/2024
Sep 11 14:59:24 LenovoL15 kernel: RIP: 0010:smp_call_function_single+0xe9/0x140
Sep 11 14:59:24 LenovoL15 kernel: Code: 38 65 48 2b 14 25 28 00 00 00 75 66 c9 e9 ef f5 ca 00 65 48 8b 05 ef 27 c6 76 48 8d b0 40 7a 03 00 8b 46 08 a8 01 74 09 f3 90 <8b> 46 08 a8 01 75 f7 83 4e 08 01 4c 89 46 10 48 89 56 18 e8 bf fd
Sep 11 14:59:24 LenovoL15 kernel: RSP: 0018:ffffa4528019fde0 EFLAGS: 00000202
Sep 11 14:59:24 LenovoL15 kernel: RAX: 0000000000000001 RBX: ffff96410fdb7680 RCX: 0000000000000000
Sep 11 14:59:24 LenovoL15 kernel: RDX: 0000000000000000 RSI: ffff96410fb37a40 RDI: 0000000000000007
Sep 11 14:59:24 LenovoL15 kernel: RBP: ffffa4528019fe28 R08: ffffffff89374890 R09: 00000000000310a0
Sep 11 14:59:24 LenovoL15 kernel: R10: 0000000000000002 R11: 0000000000000292 R12: 0000000000000080
Sep 11 14:59:24 LenovoL15 kernel: R13: 00000000000310a0 R14: 0000000000000007 R15: 000000000000f77e
Sep 11 14:59:24 LenovoL15 kernel: FS: 0000000000000000(0000) GS:ffff96410fb00000(0000) knlGS:0000000000000000
Sep 11 14:59:24 LenovoL15 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 14:59:24 LenovoL15 kernel: CR2: 000060d218deb000 CR3: 000000031aa20000 CR4: 0000000000350ef0
Sep 11 14:59:24 LenovoL15 kernel: Call Trace:
Sep 11 14:59:24 LenovoL15 kernel: <IRQ>
Sep 11 14:59:24 LenovoL15 kernel: ? watchdog_timer_fn.cold+0x19c/0x219
Sep 11 14:59:24 LenovoL15 kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Sep 11 14:59:24 LenovoL15 kernel: ? __hrtimer_run_queues+0x132/0x2a0
Sep 11 14:59:24 LenovoL15 kernel: ? hrtimer_interrupt+0xfa/0x210
Sep 11 14:59:24 LenovoL15 kernel: ? __sysvec_apic_timer_interrupt+0x55/0x100
Sep 11 14:59:24 LenovoL15 kernel: ? sysvec_apic_timer_interrupt+0x6c/0x90
Sep 11 14:59:24 LenovoL15 kernel: </IRQ>
Sep 11 14:59:24 LenovoL15 kernel: <TASK>
Sep 11 14:59:24 LenovoL15 kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Sep 11 14:59:24 LenovoL15 kernel: ? __pfx_rcu_exp_handler+0x10/0x10
Sep 11 14:59:24 LenovoL15 kernel: ? smp_call_function_single+0xe9/0x140
Sep 11 14:59:24 LenovoL15 kernel: __sync_rcu_exp_select_node_cpus+0x23a/0x3a0
Sep 11 14:59:24 LenovoL15 kernel: ? __pfx_wait_rcu_exp_gp+0x10/0x10
Sep 11 14:59:24 LenovoL15 kernel: sync_rcu_exp_select_cpus+0x173/0x2d0
Sep 11 14:59:24 LenovoL15 kernel: wait_rcu_exp_gp+0x13/0x20
Sep 11 14:59:24 LenovoL15 kernel: kthread_worker_fn+0xa6/0x220
Sep 11 14:59:24 LenovoL15 kernel: ? __pfx_kthread_worker_fn+0x10/0x10
Sep 11 14:59:24 LenovoL15 kernel: kthread+0xd2/0x100
Sep 11 14:59:24 LenovoL15 kernel: ? __pfx_kthread+0x10/0x10
Sep 11 14:59:24 LenovoL15 kernel: ret_from_fork+0x34/0x50
Sep 11 14:59:24 LenovoL15 kernel: ? __pfx_kthread+0x10/0x10
Sep 11 14:59:24 LenovoL15 kernel: ret_from_fork_asm+0x1a/0x30
Sep 11 14:59:24 LenovoL15 kernel: </TASK>
Sep 11 14:59:24 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:24 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:25 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:26 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:26 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:27 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:28 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:28 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:29 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:30 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:31 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:31 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:32 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:32 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:33 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:34 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:35 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:35 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:36 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:37 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:37 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:38 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:39 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:40 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:40 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:41 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:42 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:42 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:43 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:44 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:44 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:45 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:46 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:46 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:47 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:48 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:48 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:49 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:50 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:51 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:51 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:52 LenovoL15 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 48s! [rcu_exp_gp_kthr:21]
Sep 11 14:59:52 LenovoL15 kernel: CPU#2 Utilization every 4s during lockup:
Sep 11 14:59:52 LenovoL15 kernel: #1: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:52 LenovoL15 kernel: #2: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:52 LenovoL15 kernel: #3: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:52 LenovoL15 kernel: #4: 100% system, 0% softirq, 1% hardirq, 0% idle
Sep 11 14:59:52 LenovoL15 kernel: #5: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 14:59:52 LenovoL15 kernel: Modules linked in: cmac ccm uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common joydev mousedev mc amd_atl intel_rapl_msr intel_rapl_common snd_sof_amd_acp63 nls_iso8859_1 snd_sof_amd_vangogh vfat snd_sof_amd_rembrandt fat snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof kvm_amd snd_sof_utils snd_pci_ps snd_amd_sdw_acpi kvm soundwire_amd soundwire_generic_allocation iwlmvm snd_ctl_led soundwire_bus crct10dif_pclmul snd_hda_codec_realtek snd_soc_core crc32_pclmul snd_hda_codec_generic mac80211 polyval_clmulni snd_compress snd_hda_scodec_component polyval_generic ac97_bus snd_hda_codec_hdmi gf128mul snd_pcm_dmaengine ghash_clmulni_intel sha512_ssse3 snd_hda_intel snd_rpl_pci_acp6x libarc4 sha256_ssse3 snd_intel_dspcfg ptp snd_acp_pci sha1_ssse3 snd_intel_sdw_acpi pps_core aesni_intel snd_acp_legacy_common crypto_simd snd_pci_acp6x snd_hda_codec r8169 btrfs iwlwifi cryptd snd_pci_acp5x snd_hda_core think_lmi snd_rn_pci_acp3x ucsi_acpi rapl
Sep 11 14:59:52 LenovoL15 kernel: firmware_attributes_class realtek psmouse snd_hwdep snd_acp_config blake2b_generic wmi_bmof typec_ucsi mdio_devres cfg80211 snd_soc_acpi sp5100_tco xor snd_pcm ipmi_devintf typec acpi_cpufreq raid6_pq snd_timer snd_pci_acp3x ccp zenpower(OE) ipmi_msghandler libphy i2c_piix4 roles libcrc32c i2c_scmi mac_hid sg crypto_user acpi_call(OE) loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd thinkpad_acpi libps2 rtsx_pci_sdmmc mmc_core sparse_keymap vivaldi_fmap platform_profile nvme snd nvme_core crc32c_intel xhci_pci i8042 soundcore rtsx_pci xhci_pci_renesas nvme_auth serio rfkill amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec
Sep 11 14:59:52 LenovoL15 kernel: CPU: 2 PID: 21 Comm: rcu_exp_gp_kthr Kdump: loaded Tainted: G OEL 6.10.4-arch2-1 #1 517ed45cc9c4492ee5d5bfc2d2fe6ef1f2e7a8eb
Sep 11 14:59:52 LenovoL15 kernel: Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET49W (1.33 ) 03/27/2024
Sep 11 14:59:52 LenovoL15 kernel: RIP: 0010:smp_call_function_single+0xe9/0x140
Sep 11 14:59:52 LenovoL15 kernel: Code: 38 65 48 2b 14 25 28 00 00 00 75 66 c9 e9 ef f5 ca 00 65 48 8b 05 ef 27 c6 76 48 8d b0 40 7a 03 00 8b 46 08 a8 01 74 09 f3 90 <8b> 46 08 a8 01 75 f7 83 4e 08 01 4c 89 46 10 48 89 56 18 e8 bf fd
Sep 11 14:59:52 LenovoL15 kernel: RSP: 0018:ffffa4528019fde0 EFLAGS: 00000202
Sep 11 14:59:52 LenovoL15 kernel: RAX: 0000000000000001 RBX: ffff96410fdb7680 RCX: 0000000000000000
Sep 11 14:59:52 LenovoL15 kernel: RDX: 0000000000000000 RSI: ffff96410fb37a40 RDI: 0000000000000007
Sep 11 14:59:52 LenovoL15 kernel: RBP: ffffa4528019fe28 R08: ffffffff89374890 R09: 00000000000310a0
Sep 11 14:59:52 LenovoL15 kernel: R10: 0000000000000002 R11: 0000000000000292 R12: 0000000000000080
Sep 11 14:59:52 LenovoL15 kernel: R13: 00000000000310a0 R14: 0000000000000007 R15: 000000000000f77e
Sep 11 14:59:52 LenovoL15 kernel: FS: 0000000000000000(0000) GS:ffff96410fb00000(0000) knlGS:0000000000000000
Sep 11 14:59:52 LenovoL15 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 14:59:52 LenovoL15 kernel: CR2: 000060d218deb000 CR3: 000000031aa20000 CR4: 0000000000350ef0
Sep 11 14:59:52 LenovoL15 kernel: Call Trace:
Sep 11 14:59:52 LenovoL15 kernel: <IRQ>
Sep 11 14:59:52 LenovoL15 kernel: ? watchdog_timer_fn.cold+0x19c/0x219
Sep 11 14:59:52 LenovoL15 kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Sep 11 14:59:52 LenovoL15 kernel: ? __hrtimer_run_queues+0x132/0x2a0
Sep 11 14:59:52 LenovoL15 kernel: ? hrtimer_interrupt+0xfa/0x210
Sep 11 14:59:52 LenovoL15 kernel: ? __sysvec_apic_timer_interrupt+0x55/0x100
Sep 11 14:59:52 LenovoL15 kernel: ? sysvec_apic_timer_interrupt+0x6c/0x90
Sep 11 14:59:52 LenovoL15 kernel: </IRQ>
Sep 11 14:59:52 LenovoL15 kernel: <TASK>
Sep 11 14:59:52 LenovoL15 kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Sep 11 14:59:52 LenovoL15 kernel: ? __pfx_rcu_exp_handler+0x10/0x10
Sep 11 14:59:52 LenovoL15 kernel: ? smp_call_function_single+0xe9/0x140
Sep 11 14:59:52 LenovoL15 kernel: __sync_rcu_exp_select_node_cpus+0x23a/0x3a0
Sep 11 14:59:52 LenovoL15 kernel: ? __pfx_wait_rcu_exp_gp+0x10/0x10
Sep 11 14:59:52 LenovoL15 kernel: sync_rcu_exp_select_cpus+0x173/0x2d0
Sep 11 14:59:52 LenovoL15 kernel: wait_rcu_exp_gp+0x13/0x20
Sep 11 14:59:52 LenovoL15 kernel: kthread_worker_fn+0xa6/0x220
Sep 11 14:59:52 LenovoL15 kernel: ? __pfx_kthread_worker_fn+0x10/0x10
Sep 11 14:59:52 LenovoL15 kernel: kthread+0xd2/0x100
Sep 11 14:59:52 LenovoL15 kernel: ? __pfx_kthread+0x10/0x10
Sep 11 14:59:52 LenovoL15 kernel: ret_from_fork+0x34/0x50
Sep 11 14:59:52 LenovoL15 kernel: ? __pfx_kthread+0x10/0x10
Sep 11 14:59:52 LenovoL15 kernel: ret_from_fork_asm+0x1a/0x30
Sep 11 14:59:52 LenovoL15 kernel: </TASK>
Sep 11 14:59:52 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:53 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:53 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:54 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:55 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:55 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:56 LenovoL15 kernel: rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
Sep 11 14:59:56 LenovoL15 kernel: rcu: 0-...0: (0 ticks this GP) idle=8164/1/0x400000000000005c softirq=695763/695763 fqs=7684
Sep 11 14:59:56 LenovoL15 kernel: rcu: (detected by 6, t=18002 jiffies, g=1913173, q=216256 ncpus=16)
Sep 11 14:59:56 LenovoL15 kernel: Sending NMI from CPU 6 to CPUs 0:
Sep 11 14:59:56 LenovoL15 kernel: NMI backtrace for cpu 0
Sep 11 14:59:56 LenovoL15 kernel: CPU: 0 PID: 0 Comm: swapper/0 Kdump: loaded Tainted: G OEL 6.10.4-arch2-1 #1 517ed45cc9c4492ee5d5bfc2d2fe6ef1f2e7a8eb
Sep 11 14:59:56 LenovoL15 kernel: Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET49W (1.33 ) 03/27/2024
Sep 11 14:59:56 LenovoL15 kernel: RIP: 0010:asm_exc_general_protection+0x4/0x30
Sep 11 14:59:56 LenovoL15 kernel: Code: 0a 00 00 48 89 c4 48 89 e7 48 8b 74 24 78 48 c7 44 24 78 ff ff ff ff e8 6a 95 e4 ff e9 f5 0b 00 00 0f 1f 44 00 00 f3 0f 23 fa <0f> 01 ca fc e8 a3 23 00 00 48 89 c4 48 89 23 48 8b 74 24 78 48 c7
Sep 11 14:59:56 LenovoL15 kernel: RSP: 0018:fffffe6cedc19ad0 EFLAGS: 00000016
Sep 11 14:59:56 LenovoL15 kernel: RAX: fffffe6cedc19b28 RBX: fffffe6cedc19b28 RCX: ffffffff8a201968
Sep 11 14:59:56 LenovoL15 kernel: RDX: 0000000000000000 RSI: 0000000000000000 RDI: fffffe6cedc19b28
Sep 11 14:59:56 LenovoL15 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
Sep 11 14:59:56 LenovoL15 kernel: R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
Sep 11 14:59:56 LenovoL15 kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
Sep 11 14:59:56 LenovoL15 kernel: FS: 0000000000000000(0000) GS:ffff96410fa00000(0000) knlGS:0000000000000000
Sep 11 14:59:56 LenovoL15 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 14:59:56 LenovoL15 kernel: CR2: fffffe6cedc17ff8 CR3: 000000031aa20000 CR4: 0000000000350ef0
Sep 11 14:59:56 LenovoL15 kernel: Call Trace:
Sep 11 14:59:56 LenovoL15 kernel: <#DF>
Sep 11 14:59:56 LenovoL15 kernel: RIP: 0010:poke_int3_handler+0x0/0x1e0
Sep 11 14:59:56 LenovoL15 kernel: Code: 90 90 90 90 0d 90 90 90 90 90 f3 0f b9 fa 0f 0b 90 66 0f 1f 61 00 00 00 00 00 90 90 0e 90 90 90 90 90 90 90 f1 90 90 90 90 90 <66> 0f 61 00 48 89 fa f6 87 88 61 00 00 03 75 54 8b 05 91 a6 c5 01
Sep 11 14:59:56 LenovoL15 kernel: RSP: 0018:fffffe6cedc19b08 EFLAGS: 00010016
Sep 11 14:59:56 LenovoL15 kernel: ? exc_int3+0xe/0x130
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_int3+0x39/0x40
Sep 11 14:59:56 LenovoL15 kernel: ? early_xen_iret_patch+0xc/0xc
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x12b6/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x12b6/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_general_protection+0xd/0x30
Sep 11 14:59:56 LenovoL15 kernel: ? __pfx_poke_int3_handler+0x10/0x10
Sep 11 14:59:56 LenovoL15 kernel: ? exc_int3+0xe/0x130
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_int3+0x39/0x40
Sep 11 14:59:56 LenovoL15 kernel: ? early_xen_iret_patch+0xc/0xc
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x12b6/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x12b6/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_general_protection+0xd/0x30
Sep 11 14:59:56 LenovoL15 kernel: ? __pfx_poke_int3_handler+0x10/0x10
Sep 11 14:59:56 LenovoL15 kernel: ? exc_int3+0xe/0x130
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_int3+0x39/0x40
Sep 11 14:59:56 LenovoL15 kernel: ? early_xen_iret_patch+0xc/0xc
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x12b6/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x12b6/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_general_protection+0xd/0x30
Sep 11 14:59:56 LenovoL15 kernel: ? __pfx_poke_int3_handler+0x10/0x10
Sep 11 14:59:56 LenovoL15 kernel: ? exc_int3+0xe/0x130
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_int3+0x39/0x40
Sep 11 14:59:56 LenovoL15 kernel: ? early_xen_iret_patch+0xc/0xc
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_int3+0x2e/0x40
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x3e66/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? __entry_text_end+0x3e66/0x101e49
Sep 11 14:59:56 LenovoL15 kernel: ? asm_exc_double_fault+0xd/0x30
Sep 11 14:59:56 LenovoL15 kernel: ? error_entry+0xd/0x140
Sep 11 14:59:56 LenovoL15 kernel: </#DF>
Sep 11 14:59:56 LenovoL15 kernel: WARNING: stack recursion on stack type 5
Sep 11 14:59:56 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:57 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:57 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:58 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:59 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 14:59:59 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:00 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:01 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:02 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:02 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:03 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:04 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:04 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:05 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:06 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:06 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:07 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:08 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:08 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:09 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:10 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:11 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:11 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:11 LenovoL15 (udev-worker)[522579]: BAT0: Spawned process '/usr/bin/tlp auto' [522580] is taking longer than 59s to complete.
Sep 11 15:00:11 LenovoL15 systemd-udevd[439]: BAT0: Worker [522579] processing SEQNUM=5148 is taking a long time
Sep 11 15:00:12 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:13 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:13 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:14 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:15 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:15 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:16 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:17 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:17 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:18 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:19 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:19 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:20 LenovoL15 kernel: watchdog: BUG: soft lockup - CPU#2 stuck for 74s! [rcu_exp_gp_kthr:21]
Sep 11 15:00:20 LenovoL15 kernel: CPU#2 Utilization every 4s during lockup:
Sep 11 15:00:20 LenovoL15 kernel: #1: 100% system, 0% softirq, 1% hardirq, 0% idle
Sep 11 15:00:20 LenovoL15 kernel: #2: 100% system, 1% softirq, 0% hardirq, 0% idle
Sep 11 15:00:20 LenovoL15 kernel: #3: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 15:00:20 LenovoL15 kernel: #4: 100% system, 0% softirq, 0% hardirq, 0% idle
Sep 11 15:00:20 LenovoL15 kernel: #5: 100% system, 0% softirq, 1% hardirq, 0% idle
Sep 11 15:00:20 LenovoL15 kernel: Modules linked in: cmac ccm uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common joydev mousedev mc amd_atl intel_rapl_msr intel_rapl_common snd_sof_amd_acp63 nls_iso8859_1 snd_sof_amd_vangogh vfat snd_sof_amd_rembrandt fat snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_sof_xtensa_dsp snd_sof kvm_amd snd_sof_utils snd_pci_ps snd_amd_sdw_acpi kvm soundwire_amd soundwire_generic_allocation iwlmvm snd_ctl_led soundwire_bus crct10dif_pclmul snd_hda_codec_realtek snd_soc_core crc32_pclmul snd_hda_codec_generic mac80211 polyval_clmulni snd_compress snd_hda_scodec_component polyval_generic ac97_bus snd_hda_codec_hdmi gf128mul snd_pcm_dmaengine ghash_clmulni_intel sha512_ssse3 snd_hda_intel snd_rpl_pci_acp6x libarc4 sha256_ssse3 snd_intel_dspcfg ptp snd_acp_pci sha1_ssse3 snd_intel_sdw_acpi pps_core aesni_intel snd_acp_legacy_common crypto_simd snd_pci_acp6x snd_hda_codec r8169 btrfs iwlwifi cryptd snd_pci_acp5x snd_hda_core think_lmi snd_rn_pci_acp3x ucsi_acpi rapl
Sep 11 15:00:20 LenovoL15 kernel: firmware_attributes_class realtek psmouse snd_hwdep snd_acp_config blake2b_generic wmi_bmof typec_ucsi mdio_devres cfg80211 snd_soc_acpi sp5100_tco xor snd_pcm ipmi_devintf typec acpi_cpufreq raid6_pq snd_timer snd_pci_acp3x ccp zenpower(OE) ipmi_msghandler libphy i2c_piix4 roles libcrc32c i2c_scmi mac_hid sg crypto_user acpi_call(OE) loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 serio_raw atkbd thinkpad_acpi libps2 rtsx_pci_sdmmc mmc_core sparse_keymap vivaldi_fmap platform_profile nvme snd nvme_core crc32c_intel xhci_pci i8042 soundcore rtsx_pci xhci_pci_renesas nvme_auth serio rfkill amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec
Sep 11 15:00:20 LenovoL15 kernel: CPU: 2 PID: 21 Comm: rcu_exp_gp_kthr Kdump: loaded Tainted: G OEL 6.10.4-arch2-1 #1 517ed45cc9c4492ee5d5bfc2d2fe6ef1f2e7a8eb
Sep 11 15:00:20 LenovoL15 kernel: Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET49W (1.33 ) 03/27/2024
Sep 11 15:00:20 LenovoL15 kernel: RIP: 0010:smp_call_function_single+0xe9/0x140
Sep 11 15:00:20 LenovoL15 kernel: Code: 38 65 48 2b 14 25 28 00 00 00 75 66 c9 e9 ef f5 ca 00 65 48 8b 05 ef 27 c6 76 48 8d b0 40 7a 03 00 8b 46 08 a8 01 74 09 f3 90 <8b> 46 08 a8 01 75 f7 83 4e 08 01 4c 89 46 10 48 89 56 18 e8 bf fd
Sep 11 15:00:20 LenovoL15 kernel: RSP: 0018:ffffa4528019fde0 EFLAGS: 00000202
Sep 11 15:00:20 LenovoL15 kernel: RAX: 0000000000000001 RBX: ffff96410fdb7680 RCX: 0000000000000000
Sep 11 15:00:20 LenovoL15 kernel: RDX: 0000000000000000 RSI: ffff96410fb37a40 RDI: 0000000000000007
Sep 11 15:00:20 LenovoL15 kernel: RBP: ffffa4528019fe28 R08: ffffffff89374890 R09: 00000000000310a0
Sep 11 15:00:20 LenovoL15 kernel: R10: 0000000000000002 R11: 0000000000000292 R12: 0000000000000080
Sep 11 15:00:20 LenovoL15 kernel: R13: 00000000000310a0 R14: 0000000000000007 R15: 000000000000f77e
Sep 11 15:00:20 LenovoL15 kernel: FS: 0000000000000000(0000) GS:ffff96410fb00000(0000) knlGS:0000000000000000
Sep 11 15:00:20 LenovoL15 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Sep 11 15:00:20 LenovoL15 kernel: CR2: 000060d218deb000 CR3: 000000031aa20000 CR4: 0000000000350ef0
Sep 11 15:00:20 LenovoL15 kernel: Call Trace:
Sep 11 15:00:20 LenovoL15 kernel: <IRQ>
Sep 11 15:00:20 LenovoL15 kernel: ? watchdog_timer_fn.cold+0x19c/0x219
Sep 11 15:00:20 LenovoL15 kernel: ? __pfx_watchdog_timer_fn+0x10/0x10
Sep 11 15:00:20 LenovoL15 kernel: ? __hrtimer_run_queues+0x132/0x2a0
Sep 11 15:00:20 LenovoL15 kernel: ? hrtimer_interrupt+0xfa/0x210
Sep 11 15:00:20 LenovoL15 kernel: ? __sysvec_apic_timer_interrupt+0x55/0x100
Sep 11 15:00:20 LenovoL15 kernel: ? sysvec_apic_timer_interrupt+0x6c/0x90
Sep 11 15:00:20 LenovoL15 kernel: </IRQ>
Sep 11 15:00:20 LenovoL15 kernel: <TASK>
Sep 11 15:00:20 LenovoL15 kernel: ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
Sep 11 15:00:20 LenovoL15 kernel: ? __pfx_rcu_exp_handler+0x10/0x10
Sep 11 15:00:20 LenovoL15 kernel: ? smp_call_function_single+0xe9/0x140
Sep 11 15:00:20 LenovoL15 kernel: __sync_rcu_exp_select_node_cpus+0x23a/0x3a0
Sep 11 15:00:20 LenovoL15 kernel: ? __pfx_wait_rcu_exp_gp+0x10/0x10
Sep 11 15:00:20 LenovoL15 kernel: sync_rcu_exp_select_cpus+0x173/0x2d0
Sep 11 15:00:20 LenovoL15 kernel: wait_rcu_exp_gp+0x13/0x20
Sep 11 15:00:20 LenovoL15 kernel: kthread_worker_fn+0xa6/0x220
Sep 11 15:00:20 LenovoL15 kernel: ? __pfx_kthread_worker_fn+0x10/0x10
Sep 11 15:00:20 LenovoL15 kernel: kthread+0xd2/0x100
Sep 11 15:00:20 LenovoL15 kernel: ? __pfx_kthread+0x10/0x10
Sep 11 15:00:20 LenovoL15 kernel: ret_from_fork+0x34/0x50
Sep 11 15:00:20 LenovoL15 kernel: ? __pfx_kthread+0x10/0x10
Sep 11 15:00:20 LenovoL15 kernel: ret_from_fork_asm+0x1a/0x30
Sep 11 15:00:20 LenovoL15 kernel: </TASK>
Sep 11 15:00:20 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
Sep 11 15:00:21 LenovoL15 kernel: [drm] Fence fallback timer expired on ring gfx
... the logs repeated like that for a while until I killed the power.
Note that I truncated the logs: leading up to all crashes the journal is entirely normal.
On the 17th of October, I got the following dmesg log saved by kdumpst (only because I used SysRq):
[ 0.000000] [ T0] Linux version 6.11.3-arch1-1 (linux@archlinux) (gcc (GCC) 14.2.1 20240910, GNU ld (GNU Binutils) 2.43.0) #1 SMP PREEMPT_DYNAMIC Thu, 10 Oct 2024 20:11:06 +0000
[ 0.000000] [ T0] Command line: BOOT_IMAGE=/boot/vmlinuz-linux root=UUID=3de0ffac-d140-40b8-9ec8-277bce63f9e1 rw crashkernel=256M loglevel=3 systemd.restore_state=0 video.brightness_switch_enabled=N tsc=unstable tpm_tis.interrupts=0 retbleed=off amdgpu.ppfeaturemask=0xffffffff amdgpu.aspm=0 amdgpu.bapm=0 rcu_nocbs=0-15 idle=nomwait amdgpu.msi=0
...
[ 8130.769140] [ T3243] ------------[ cut here ]------------
[ 8130.769147] [ T3243] list_add corruption. next->prev should be prev (ffff9a332f1a2018), but was 00000000000000ac. (next=ffff9a332f1a2000).
[ 8130.769160] [ T3243] WARNING: CPU: 3 PID: 3243 at lib/list_debug.c:29 __list_add_valid_or_report+0x62/0xb0
[ 8130.769169] [ T3243] Modules linked in: cmac ccm joydev mousedev uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc intel_rapl_msr amd_atl intel_rapl_common snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp nls_iso8859_1 kvm_amd snd_sof_pci vfat snd_sof_xtensa_dsp snd_sof fat snd_sof_utils snd_pci_ps kvm snd_amd_sdw_acpi iwlmvm soundwire_amd soundwire_generic_allocation crct10dif_pclmul snd_ctl_led soundwire_bus crc32_pclmul mac80211 snd_hda_codec_realtek polyval_clmulni snd_soc_core snd_hda_codec_generic polyval_generic libarc4 ghash_clmulni_intel snd_hda_scodec_component snd_compress sha512_ssse3 snd_hda_codec_hdmi ptp ac97_bus sha256_ssse3 pps_core snd_pcm_dmaengine snd_hda_intel sha1_ssse3 snd_rpl_pci_acp6x snd_intel_dspcfg snd_acp_pci aesni_intel snd_intel_sdw_acpi snd_acp_legacy_common gf128mul snd_hda_codec crypto_simd snd_pci_acp6x iwlwifi cryptd snd_hda_core ee1004 snd_pci_acp5x rapl snd_hwdep btrfs psmouse r8169 ucsi_acpi think_lmi
[ 8130.769265] [ T3243] snd_rn_pci_acp3x sp5100_tco realtek cfg80211 snd_pcm snd_acp_config firmware_attributes_class typec_ucsi mdio_devres wmi_bmof snd_soc_acpi ipmi_devintf ccp snd_timer blake2b_generic i2c_piix4 typec acpi_cpufreq ipmi_msghandler snd_pci_acp3x xor libphy roles i2c_smbus raid6_pq zenpower(OE) libcrc32c i2c_scmi mac_hid sg crypto_user acpi_call(OE) dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2 serio_raw atkbd rtsx_pci_sdmmc libps2 mmc_core thinkpad_acpi vivaldi_fmap sparse_keymap nvme platform_profile crc32c_intel nvme_core snd xhci_pci rtsx_pci i8042 xhci_pci_renesas soundcore nvme_auth rfkill serio amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
[ 8130.769353] [ T3243] CPU: 3 UID: 1000 PID: 3243 Comm: nload Kdump: loaded Tainted: G OE 6.11.3-arch1-1 #1 1400000003000000474e55000681d53aa6c7b79b
[ 8130.769359] [ T3243] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 8130.769360] [ T3243] Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET50W (1.34 ) 08/02/2024
[ 8130.769363] [ T3243] RIP: 0010:__list_add_valid_or_report+0x62/0xb0
[ 8130.769366] [ T3243] Code: e8 53 48 2a ff 0f 0b 31 c0 e9 ca e6 05 00 4c 8b 02 49 39 f0 74 18 eb 27 48 89 f1 48 c7 c7 a0 e5 2b 9c 48 89 c6 e8 2e 48 2a ff <0f> 0b eb d9 48 39 fa 74 22 49 39 f8 74 1d b0 01 e9 99 e6 05 00 48
[ 8130.769368] [ T3243] RSP: 0018:ffffbf7cc3337be0 EFLAGS: 00010086
[ 8130.769371] [ T3243] RAX: 0000000000000000 RBX: ffff9a2c486d4ea0 RCX: 0000000000000027
[ 8130.769373] [ T3243] RDX: ffff9a332f1a1a48 RSI: 0000000000000001 RDI: ffff9a332f1a1a40
[ 8130.769375] [ T3243] RBP: ffff9a2c486d4ea0 R08: 0000000000000000 R09: ffffbf7cc3337a60
[ 8130.769377] [ T3243] R10: ffffffff9cab3fe8 R11: 0000000000000003 R12: ffff9a332f1a2130
[ 8130.769378] [ T3243] R13: ffff9a332f180000 R14: 0000000000000286 R15: ffff9a332f1a2000
[ 8130.769381] [ T3243] FS: 00007726e0489b80(0000) GS:ffff9a332f180000(0000) knlGS:0000000000000000
[ 8130.769383] [ T3243] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8130.769385] [ T3243] CR2: 0000708c41f83000 CR3: 000000010e662000 CR4: 0000000000350ef0
[ 8130.769387] [ T3243] Call Trace:
[ 8130.769391] [ T3243] <TASK>
[ 8130.769392] [ T3243] ? __list_add_valid_or_report+0x62/0xb0
[ 8130.769396] [ T3243] ? __warn.cold+0x8e/0xe8
[ 8130.769399] [ T3243] ? __list_add_valid_or_report+0x62/0xb0
[ 8130.769407] [ T3243] ? report_bug+0xff/0x140
[ 8130.769412] [ T3243] ? handle_bug+0x3c/0x80
[ 8130.769416] [ T3243] ? exc_invalid_op+0x17/0x70
[ 8130.769419] [ T3243] ? asm_exc_invalid_op+0x1a/0x20
[ 8130.769425] [ T3243] ? __list_add_valid_or_report+0x62/0xb0
[ 8130.769428] [ T3243] kvfree_call_rcu.cold+0x14/0x26
[ 8130.769434] [ T3243] kernfs_unlink_open_file+0xfb/0x120
[ 8130.769439] [ T3243] kernfs_fop_release+0x3d/0xd0
[ 8130.769443] [ T3243] __fput+0xf1/0x2c0
[ 8130.769448] [ T3243] __x64_sys_close+0x3c/0x80
[ 8130.769453] [ T3243] do_syscall_64+0x82/0x190
[ 8130.769460] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769463] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769466] [ T3243] ? syscall_exit_to_user_mode+0x10/0x200
[ 8130.769469] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769471] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769474] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769476] [ T3243] ? __do_sys_newfstatat+0x4b/0x80
[ 8130.769484] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769486] [ T3243] ? syscall_exit_to_user_mode+0x10/0x200
[ 8130.769489] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769491] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769494] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769496] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769499] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769501] [ T3243] ? syscall_exit_to_user_mode+0x10/0x200
[ 8130.769504] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769506] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769509] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769511] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769514] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769516] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769518] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769521] [ T3243] ? do_syscall_64+0x8e/0x190
[ 8130.769523] [ T3243] ? srso_return_thunk+0x5/0x5f
[ 8130.769526] [ T3243] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 8130.769531] [ T3243] RIP: 0033:0x7726e011b83b
[ 8130.769567] [ T3243] Code: ff ff c3 0f 1f 40 00 48 8b 15 d1 a4 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 f3 0f 1e fa b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 a1 a4 0d 00 f7 d8
[ 8130.769569] [ T3243] RSP: 002b:00007ffd60e59c28 EFLAGS: 00000297 ORIG_RAX: 0000000000000003
[ 8130.769572] [ T3243] RAX: ffffffffffffffda RBX: 00006485e7c4ccb0 RCX: 00007726e011b83b
[ 8130.769574] [ T3243] RDX: 00007726e01f4ea0 RSI: 00006485e7ca1040 RDI: 0000000000000003
[ 8130.769575] [ T3243] RBP: 00007ffd60e59c50 R08: 000000000001efb0 R09: 00000000ffffffff
[ 8130.769577] [ T3243] R10: 00007726e04838c0 R11: 0000000000000297 R12: 0000000000000000
[ 8130.769579] [ T3243] R13: 00007726e01f4ff0 R14: 00007ffd60e59d40 R15: 00007726e047ac98
[ 8130.769585] [ T3243] </TASK>
[ 8130.769586] [ T3243] ---[ end trace 0000000000000000 ]---
[ 8134.767567] [ T320231] ------------[ cut here ]------------
[ 8134.767570] [ T320231] list_add corruption. next->prev should be prev (ffff9a332f3a2018), but was 00000000000000ac. (next=ffff9a332f3a2000).
[ 8134.767581] [ T320231] WARNING: CPU: 7 PID: 320231 at lib/list_debug.c:29 __list_add_valid_or_report+0x62/0xb0
[ 8134.767586] [ T320231] Modules linked in: cmac ccm joydev mousedev uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc intel_rapl_msr amd_atl intel_rapl_common snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp nls_iso8859_1 kvm_amd snd_sof_pci vfat snd_sof_xtensa_dsp snd_sof fat snd_sof_utils snd_pci_ps kvm snd_amd_sdw_acpi iwlmvm soundwire_amd soundwire_generic_allocation crct10dif_pclmul snd_ctl_led soundwire_bus crc32_pclmul mac80211 snd_hda_codec_realtek polyval_clmulni snd_soc_core snd_hda_codec_generic polyval_generic libarc4 ghash_clmulni_intel snd_hda_scodec_component snd_compress sha512_ssse3 snd_hda_codec_hdmi ptp ac97_bus sha256_ssse3 pps_core snd_pcm_dmaengine snd_hda_intel sha1_ssse3 snd_rpl_pci_acp6x snd_intel_dspcfg snd_acp_pci aesni_intel snd_intel_sdw_acpi snd_acp_legacy_common gf128mul snd_hda_codec crypto_simd snd_pci_acp6x iwlwifi cryptd snd_hda_core ee1004 snd_pci_acp5x rapl snd_hwdep btrfs psmouse r8169 ucsi_acpi think_lmi
[ 8134.767664] [ T320231] snd_rn_pci_acp3x sp5100_tco realtek cfg80211 snd_pcm snd_acp_config firmware_attributes_class typec_ucsi mdio_devres wmi_bmof snd_soc_acpi ipmi_devintf ccp snd_timer blake2b_generic i2c_piix4 typec acpi_cpufreq ipmi_msghandler snd_pci_acp3x xor libphy roles i2c_smbus raid6_pq zenpower(OE) libcrc32c i2c_scmi mac_hid sg crypto_user acpi_call(OE) dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2 serio_raw atkbd rtsx_pci_sdmmc libps2 mmc_core thinkpad_acpi vivaldi_fmap sparse_keymap nvme platform_profile crc32c_intel nvme_core snd xhci_pci rtsx_pci i8042 xhci_pci_renesas soundcore nvme_auth rfkill serio amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
[ 8134.767738] [ T320231] CPU: 7 UID: 0 PID: 320231 Comm: kworker/u68:3 Kdump: loaded Tainted: G W OE 6.11.3-arch1-1 #1 1400000003000000474e55000681d53aa6c7b79b
[ 8134.767743] [ T320231] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 8134.767745] [ T320231] Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET50W (1.34 ) 08/02/2024
[ 8134.767747] [ T320231] Workqueue: ttm ttm_bo_delayed_delete [ttm]
[ 8134.767758] [ T320231] RIP: 0010:__list_add_valid_or_report+0x62/0xb0
[ 8134.767761] [ T320231] Code: e8 53 48 2a ff 0f 0b 31 c0 e9 ca e6 05 00 4c 8b 02 49 39 f0 74 18 eb 27 48 89 f1 48 c7 c7 a0 e5 2b 9c 48 89 c6 e8 2e 48 2a ff <0f> 0b eb d9 48 39 fa 74 22 49 39 f8 74 1d b0 01 e9 99 e6 05 00 48
[ 8134.767763] [ T320231] RSP: 0018:ffffbf7cd65dfdf8 EFLAGS: 00010086
[ 8134.767765] [ T320231] RAX: 0000000000000000 RBX: ffff9a2dcdfd2740 RCX: 0000000000000027
[ 8134.767767] [ T320231] RDX: ffff9a332f3a1a48 RSI: 0000000000000001 RDI: ffff9a332f3a1a40
[ 8134.767769] [ T320231] RBP: ffff9a2dcdfd2740 R08: 0000000000000000 R09: ffffbf7cd65dfc78
[ 8134.767771] [ T320231] R10: ffffffff9cab3fe8 R11: 0000000000000003 R12: ffff9a332f3a2130
[ 8134.767772] [ T320231] R13: ffff9a332f380000 R14: 0000000000000282 R15: ffff9a332f3a2000
[ 8134.767774] [ T320231] FS: 0000000000000000(0000) GS:ffff9a332f380000(0000) knlGS:0000000000000000
[ 8134.767776] [ T320231] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8134.767778] [ T320231] CR2: 000072600aec7fb0 CR3: 0000000032222000 CR4: 0000000000350ef0
[ 8134.767781] [ T320231] Call Trace:
[ 8134.767783] [ T320231] <TASK>
[ 8134.767784] [ T320231] ? __list_add_valid_or_report+0x62/0xb0
[ 8134.767787] [ T320231] ? __warn.cold+0x8e/0xe8
[ 8134.767789] [ T320231] ? __list_add_valid_or_report+0x62/0xb0
[ 8134.767793] [ T320231] ? report_bug+0xff/0x140
[ 8134.767796] [ T320231] ? console_unlock+0x84/0x130
[ 8134.767802] [ T320231] ? handle_bug+0x3c/0x80
[ 8134.767805] [ T320231] ? exc_invalid_op+0x17/0x70
[ 8134.767808] [ T320231] ? asm_exc_invalid_op+0x1a/0x20
[ 8134.767813] [ T320231] ? __list_add_valid_or_report+0x62/0xb0
[ 8134.767817] [ T320231] ? __list_add_valid_or_report+0x62/0xb0
[ 8134.767819] [ T320231] kvfree_call_rcu.cold+0x14/0x26
[ 8134.767824] [ T320231] ttm_transfered_destroy+0x19/0x30 [ttm 1400000003000000474e5500447087a458167b91]
[ 8134.767832] [ T320231] process_one_work+0x17e/0x330
[ 8134.767838] [ T320231] worker_thread+0x2ce/0x3f0
[ 8134.767842] [ T320231] ? __pfx_worker_thread+0x10/0x10
[ 8134.767845] [ T320231] kthread+0xd2/0x100
[ 8134.767850] [ T320231] ? __pfx_kthread+0x10/0x10
[ 8134.767854] [ T320231] ret_from_fork+0x34/0x50
[ 8134.767858] [ T320231] ? __pfx_kthread+0x10/0x10
[ 8134.767861] [ T320231] ret_from_fork_asm+0x1a/0x30
[ 8134.767869] [ T320231] </TASK>
[ 8134.767870] [ T320231] ---[ end trace 0000000000000000 ]---
[ 8135.479560] [ T416] iwlwifi 0000:03:00.0: Error sending SCAN_CFG_CMD: time out after 2000ms.
[ 8135.479578] [ T416] iwlwifi 0000:03:00.0: Current CMD queue read_ptr 189 write_ptr 190
[ 8135.481237] [ T416] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
[ 8135.481243] [ T416] iwlwifi 0000:03:00.0: Transport status: 0x0000004A, valid: 6
[ 8135.481251] [ T416] iwlwifi 0000:03:00.0: Loaded firmware version: 77.85be44d3.0 cc-a0-77.ucode
[ 8135.481259] [ T416] iwlwifi 0000:03:00.0: 0x00000084 | NMI_INTERRUPT_UNKNOWN
[ 8135.481267] [ T416] iwlwifi 0000:03:00.0: 0x0000A200 | trm_hw_status0
[ 8135.481273] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | trm_hw_status1
[ 8135.481279] [ T416] iwlwifi 0000:03:00.0: 0x004F8CE6 | branchlink2
[ 8135.481285] [ T416] iwlwifi 0000:03:00.0: 0x004EED36 | interruptlink1
[ 8135.481292] [ T416] iwlwifi 0000:03:00.0: 0x004EED36 | interruptlink2
[ 8135.481297] [ T416] iwlwifi 0000:03:00.0: 0x0000B7C8 | data1
[ 8135.481303] [ T416] iwlwifi 0000:03:00.0: 0x01000000 | data2
[ 8135.481309] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | data3
[ 8135.481315] [ T416] iwlwifi 0000:03:00.0: 0x74C11F1A | beacon time
[ 8135.481321] [ T416] iwlwifi 0000:03:00.0: 0x30B520E9 | tsf low
[ 8135.481327] [ T416] iwlwifi 0000:03:00.0: 0x00000001 | tsf hi
[ 8135.481332] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | time gp1
[ 8135.481338] [ T416] iwlwifi 0000:03:00.0: 0xE4599F3D | time gp2
[ 8135.481344] [ T416] iwlwifi 0000:03:00.0: 0x00000001 | uCode revision type
[ 8135.481350] [ T416] iwlwifi 0000:03:00.0: 0x0000004D | uCode version major
[ 8135.481356] [ T416] iwlwifi 0000:03:00.0: 0x85BE44D3 | uCode version minor
[ 8135.481362] [ T416] iwlwifi 0000:03:00.0: 0x00000340 | hw version
[ 8135.481368] [ T416] iwlwifi 0000:03:00.0: 0x00489000 | board version
[ 8135.481374] [ T416] iwlwifi 0000:03:00.0: 0x03E4001C | hcmd
[ 8135.481379] [ T416] iwlwifi 0000:03:00.0: 0x24020000 | isr0
[ 8135.481385] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | isr1
[ 8135.481391] [ T416] iwlwifi 0000:03:00.0: 0x08F00002 | isr2
[ 8135.481396] [ T416] iwlwifi 0000:03:00.0: 0x00C3029C | isr3
[ 8135.481402] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | isr4
[ 8135.481407] [ T416] iwlwifi 0000:03:00.0: 0x03E4001C | last cmd Id
[ 8135.481413] [ T416] iwlwifi 0000:03:00.0: 0x0000B7C8 | wait_event
[ 8135.481419] [ T416] iwlwifi 0000:03:00.0: 0x000000D4 | l2p_control
[ 8135.481425] [ T416] iwlwifi 0000:03:00.0: 0x00018034 | l2p_duration
[ 8135.481430] [ T416] iwlwifi 0000:03:00.0: 0x00000007 | l2p_mhvalid
[ 8135.481436] [ T416] iwlwifi 0000:03:00.0: 0x00000081 | l2p_addr_match
[ 8135.481442] [ T416] iwlwifi 0000:03:00.0: 0x00000009 | lmpm_pmg_sel
[ 8135.481447] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | timestamp
[ 8135.481453] [ T416] iwlwifi 0000:03:00.0: 0x000018A8 | flow_handler
[ 8135.481739] [ T416] iwlwifi 0000:03:00.0: Start IWL Error Log Dump:
[ 8135.481745] [ T416] iwlwifi 0000:03:00.0: Transport status: 0x0000004A, valid: 7
[ 8135.481751] [ T416] iwlwifi 0000:03:00.0: 0x20000066 | NMI_INTERRUPT_HOST
[ 8135.481758] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | umac branchlink1
[ 8135.481764] [ T416] iwlwifi 0000:03:00.0: 0x80455D6E | umac branchlink2
[ 8135.481770] [ T416] iwlwifi 0000:03:00.0: 0x8047300E | umac interruptlink1
[ 8135.481775] [ T416] iwlwifi 0000:03:00.0: 0x8047300E | umac interruptlink2
[ 8135.481781] [ T416] iwlwifi 0000:03:00.0: 0x01000000 | umac data1
[ 8135.481787] [ T416] iwlwifi 0000:03:00.0: 0x8047300E | umac data2
[ 8135.481792] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | umac data3
[ 8135.481798] [ T416] iwlwifi 0000:03:00.0: 0x0000004D | umac major
[ 8135.481803] [ T416] iwlwifi 0000:03:00.0: 0x85BE44D3 | umac minor
[ 8135.481809] [ T416] iwlwifi 0000:03:00.0: 0xE4599F3B | frame pointer
[ 8135.481815] [ T416] iwlwifi 0000:03:00.0: 0xC0886260 | stack pointer
[ 8135.481820] [ T416] iwlwifi 0000:03:00.0: 0x00BD010C | last host cmd
[ 8135.481826] [ T416] iwlwifi 0000:03:00.0: 0x00000000 | isr status reg
[ 8135.482091] [ T416] iwlwifi 0000:03:00.0: IML/ROM dump:
[ 8135.482096] [ T416] iwlwifi 0000:03:00.0: 0x00000003 | IML/ROM error/state
[ 8135.482364] [ T416] iwlwifi 0000:03:00.0: 0x0000635B | IML/ROM data1
[ 8135.482443] [ T416] iwlwifi 0000:03:00.0: 0x00000080 | IML/ROM WFPM_AUTH_KEY_0
[ 8135.482518] [ T416] iwlwifi 0000:03:00.0: Fseq Registers:
[ 8135.482556] [ T416] iwlwifi 0000:03:00.0: 0x60000000 | FSEQ_ERROR_CODE
[ 8135.482593] [ T416] iwlwifi 0000:03:00.0: 0x80290021 | FSEQ_TOP_INIT_VERSION
[ 8135.482631] [ T416] iwlwifi 0000:03:00.0: 0x00050008 | FSEQ_CNVIO_INIT_VERSION
[ 8135.482670] [ T416] iwlwifi 0000:03:00.0: 0x0000A503 | FSEQ_OTP_VERSION
[ 8135.482706] [ T416] iwlwifi 0000:03:00.0: 0x80000003 | FSEQ_TOP_CONTENT_VERSION
[ 8135.482743] [ T416] iwlwifi 0000:03:00.0: 0x4552414E | FSEQ_ALIVE_TOKEN
[ 8135.482781] [ T416] iwlwifi 0000:03:00.0: 0x00100530 | FSEQ_CNVI_ID
[ 8135.482817] [ T416] iwlwifi 0000:03:00.0: 0x00000532 | FSEQ_CNVR_ID
[ 8135.482879] [ T416] iwlwifi 0000:03:00.0: 0x00100530 | CNVI_AUX_MISC_CHIP
[ 8135.483106] [ T416] iwlwifi 0000:03:00.0: 0x00000532 | CNVR_AUX_MISC_CHIP
[ 8135.483146] [ T416] iwlwifi 0000:03:00.0: 0x05B0905B | CNVR_SCU_SD_REGS_SD_REG_DIG_DCDC_VTRIM
[ 8135.483186] [ T416] iwlwifi 0000:03:00.0: 0x0000025B | CNVR_SCU_SD_REGS_SD_REG_ACTIVE_VDIG_MIRROR
[ 8135.483224] [ T416] iwlwifi 0000:03:00.0: 0x00050008 | FSEQ_PREV_CNVIO_INIT_VERSION
[ 8135.483261] [ T416] iwlwifi 0000:03:00.0: 0x00290021 | FSEQ_WIFI_FSEQ_VERSION
[ 8135.483299] [ T416] iwlwifi 0000:03:00.0: 0xAC0022A1 | FSEQ_BT_FSEQ_VERSION
[ 8135.483336] [ T416] iwlwifi 0000:03:00.0: 0x000000F0 | FSEQ_CLASS_TP_VERSION
[ 8135.483664] [ T416] iwlwifi 0000:03:00.0: UMAC CURRENT PC: 0x80472b1c
[ 8135.483701] [ T416] iwlwifi 0000:03:00.0: LMAC1 CURRENT PC: 0xd0
[ 8135.483930] [ T416] iwlwifi 0000:03:00.0: WRT: Collecting data: ini trigger 4 fired (delay=0ms).
[ 8135.483946] [ T416] ieee80211 phy0: Hardware restart was requested
[ 8135.483980] [ T416] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 8135.483994] [ T416] CPU: 9 UID: 0 PID: 416 Comm: kworker/9:2 Kdump: loaded Tainted: G W OE 6.11.3-arch1-1 #1 1400000003000000474e55000681d53aa6c7b79b
[ 8135.484012] [ T416] Tainted: [W]=WARN, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[ 8135.484017] [ T416] Hardware name: LENOVO 20U7000JAU/20U7000JAU, BIOS R19ET50W (1.34 ) 08/02/2024
[ 8135.484025] [ T416] Workqueue: events_freezable ieee80211_restart_work [mac80211]
[ 8135.484204] [ T416] RIP: 0010:synchronize_rcu_expedited+0x0/0x220
[ 8135.484220] [ T416] Code: 0f 85 5a df c7 00 e8 2f a9 e8 ff e5 50 df c7 00 66 2e 0f 1f 84 00 00 00 00 00 90 93 90 90 90 90 90 90 90 90 90 90 90 90 90 90 <f3> 06 1e fa 0f 1f 44 00 00 41 54 b9 06 00 00 00 55 5d 48 83 ec 60
[ 8135.484229] [ T416] RSP: 0018:ffffbf7cc3e8be38 EFLAGS: 00010202
[ 8135.484238] [ T416] RAX: 0000000000000001 RBX: ffff9a2c4a569ac8 RCX: ffff9a332f4b5be8
[ 8135.484245] [ T416] RDX: 0000000000000001 RSI: 0000000000000292 RDI: ffffffff9cbaa940
[ 8135.484252] [ T416] RBP: ffff9a2c4a569eb0 R08: 9a939d9e859a9a8d R09: ffff9a2c44347a00
[ 8135.484259] [ T416] R10: 0000000000000011 R11: 0000000000000011 R12: ffff9a2c4a569ac8
[ 8135.484265] [ T416] R13: ffff9a2c4a568900 R14: ffff9a2c4a569eb0 R15: 0000000000000000
[ 8135.484273] [ T416] FS: 0000000000000000(0000) GS:ffff9a332f480000(0000) knlGS:0000000000000000
[ 8135.484281] [ T416] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 8135.484288] [ T416] CR2: 00007f8f69148168 CR3: 0000000122d36000 CR4: 0000000000350ef0
[ 8135.484296] [ T416] Call Trace:
[ 8135.484303] [ T416] <TASK>
[ 8135.484311] [ T416] ? __die_body.cold+0x19/0x27
[ 8135.484325] [ T416] ? die+0x2e/0x50
[ 8135.484337] [ T416] ? do_trap+0xca/0x110
[ 8135.484351] [ T416] ? do_error_trap+0x6a/0x90
[ 8135.484359] [ T416] ? __pfx_synchronize_rcu_expedited+0x10/0x10
[ 8135.484373] [ T416] ? exc_invalid_op+0x50/0x70
[ 8135.484383] [ T416] ? __pfx_synchronize_rcu_expedited+0x10/0x10
[ 8135.484394] [ T416] ? asm_exc_invalid_op+0x1a/0x20
[ 8135.484413] [ T416] ? __pfx_synchronize_rcu_expedited+0x10/0x10
[ 8135.484425] [ T416] ? synchronize_net+0x13/0x30
[ 8135.484438] [ T416] ieee80211_restart_work+0xe9/0x140 [mac80211 1400000003000000474e5500d400141c52efccd6]
[ 8135.484567] [ T416] process_one_work+0x17e/0x330
[ 8135.484584] [ T416] worker_thread+0x2ce/0x3f0
[ 8135.484598] [ T416] ? __pfx_worker_thread+0x10/0x10
[ 8135.484607] [ T416] kthread+0xd2/0x100
[ 8135.484620] [ T416] ? __pfx_kthread+0x10/0x10
[ 8135.484633] [ T416] ret_from_fork+0x34/0x50
[ 8135.484643] [ T416] ? __pfx_kthread+0x10/0x10
[ 8135.484654] [ T416] ret_from_fork_asm+0x1a/0x30
[ 8135.484677] [ T416] </TASK>
[ 8135.484682] [ T416] Modules linked in: cmac ccm joydev mousedev uvcvideo videobuf2_vmalloc uvc videobuf2_memops videobuf2_v4l2 videodev videobuf2_common mc intel_rapl_msr amd_atl intel_rapl_common snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp nls_iso8859_1 kvm_amd snd_sof_pci vfat snd_sof_xtensa_dsp snd_sof fat snd_sof_utils snd_pci_ps kvm snd_amd_sdw_acpi iwlmvm soundwire_amd soundwire_generic_allocation crct10dif_pclmul snd_ctl_led soundwire_bus crc32_pclmul mac80211 snd_hda_codec_realtek polyval_clmulni snd_soc_core snd_hda_codec_generic polyval_generic libarc4 ghash_clmulni_intel snd_hda_scodec_component snd_compress sha512_ssse3 snd_hda_codec_hdmi ptp ac97_bus sha256_ssse3 pps_core snd_pcm_dmaengine snd_hda_intel sha1_ssse3 snd_rpl_pci_acp6x snd_intel_dspcfg snd_acp_pci aesni_intel snd_intel_sdw_acpi snd_acp_legacy_common gf128mul snd_hda_codec crypto_simd snd_pci_acp6x iwlwifi cryptd snd_hda_core ee1004 snd_pci_acp5x rapl snd_hwdep btrfs psmouse r8169 ucsi_acpi think_lmi
[ 8135.484909] [ T416] snd_rn_pci_acp3x sp5100_tco realtek cfg80211 snd_pcm snd_acp_config firmware_attributes_class typec_ucsi mdio_devres wmi_bmof snd_soc_acpi ipmi_devintf ccp snd_timer blake2b_generic i2c_piix4 typec acpi_cpufreq ipmi_msghandler snd_pci_acp3x xor libphy roles i2c_smbus raid6_pq zenpower(OE) libcrc32c i2c_scmi mac_hid sg crypto_user acpi_call(OE) dm_mod loop nfnetlink ip_tables x_tables ext4 crc32c_generic mbcache jbd2 serio_raw atkbd rtsx_pci_sdmmc libps2 mmc_core thinkpad_acpi vivaldi_fmap sparse_keymap nvme platform_profile crc32c_intel nvme_core snd xhci_pci rtsx_pci i8042 xhci_pci_renesas soundcore nvme_auth rfkill serio amdgpu video wmi amdxcp i2c_algo_bit drm_ttm_helper ttm drm_exec gpu_sched drm_suballoc_helper drm_buddy drm_display_helper cec crc16
Notice that TTM (DRM memory manager) failed first here, though I suspect that these errors don't indicate the real culprit.
The following is a list of things that did NOT correspond with every crash (so I don't think they're related):
- Suspend/resume prior to crash
- High power draw and/or CPU load and/or GPU load (one crash happened when the system was entirely idle)
- Sudden artificial change in power draw
- Connected & disconnected from AC power before crashes
- Reaching charge threshold
- Toggling CPU cores on & off (core parking)
- Fan speed
- Actively using keyboard or touchpad
- AMD microcode
- Connected to AC power (two crashes on battery power alone)
- Using the machine for a long time prior to the crash
- CPU scaling governor
- Physically bumping/striking/shaking the device.
The following is a list of things that I have tried, and did NOT prevent the crash:
- Updated UEFI BIOS
- Disabled ryzen_smu, corefreq, and zenpower kernel modules
- Different AC charger and/or differe USB-C port for charging.
- LTS kernel
- Stock boot paremeters
- Downgraded EVERYTHING to what it was on the 13th of August.
- Downgraded to Linux 6.10.2
- Reducing allocated VRAM to 512MB (lowest setting).
- amdgpu.gpu_recovery=1
- amd_iommu=off
- amdgpu.aspm=0
- amdgpu.bapm=0
- amdgpu.msi=0
- No compositor (disabled Picom)
- Turning off AC power after crash
- pci=noats
- pci=nomsi
- rcu_nocbs=0-15
- idle=nomwait
- Cleared CMOS
- Repasted APU
- Video on loop in background
- snd_hda_intel.power_save=0
- Reinstalled Arch with minimal configuration
I have also investigated:
- RAM: Passed 3 memory tests.
- Sensors: I have always paid close attention to the temperatures of the device, and they are entirely normal. Some time in the past year there was a UEFI update which restricted the APU temperature to below 70°C, so I doubt it's overheating. The GPU voltage seems to hover between 700mV & 1.26V when connected to AC power, and I've not witnessed it doing anything weird before a crash.
- Machine Check Exceptions: rasdaemon hasn't caught anything, as far as I can tell.
- AMD GPU userspace driver: The system crashes less frequently without the xf86-video-amdgpu driver, but it still occasionally crashes when using the generic modesetting driver.
- Disabling Dynamic Power Management: setting amdgpu.dpm=0 causes GPU initialisation to fail and the display stops refreshing, so that's not an option.
- GPU clock speed: So far, forcing the performance level to "low" seems to make the system more stable. Lending further credence to this solution is the fact that the crashes most often happen after connecting to AC power or stopping a heavy workload (which corresponds with a change in clock speed), so I get the feeling that the instability has something to do with clock speed.
- Replacing thermal paste: the original thermal paste (4 years old now) had been significantly pumped-out and didn't look particularly happy. I replaced it with PTM7950 on September 25. The good news is that my laptop has better thermals than ever before. The bad news: it still crashes.
Thanks for reading.
Last edited by Anthony Wilson (2024-10-30 21:51:22)
Offline
For over two months I was convinced that this issue was a graphics hardware problem, but on the 15th of October I had a breakthrough: opening OBS caused the graphical artefacts to go away, and the system didn't crash.
Playing a constant audio stream in the background also seems to prevent crashes.
Furthermore, right after audio stops playing, the chance of a crash spikes (the system almost always crashes after stopping all audio streams). Now it's obvious why I sometimes had so many crashes: notification sounds probably triggered them.
Finally, I have a trigger and a workaround!
Here's a summary of the most successful workarounds so far:
Generating a constant audio stream keeps the system stable (it can be muted, so the physical speakers aren't the problem). I use SoX for this:
play -n synth brownnoise gain -60
Having OBS open keeps the system stable.
Using nomodeset keeps the system stable, and lets me boot even in a bad state (following a crash).
Forcing the GPU performance level to "low" reduces the frequency of crashes (but does not prevent them).
Important note: disabling audio power saving (via snd_hda_intel.power_save=0) does not help, so power saving is not the issue here.
There's still three cases when this workaround doesn't work: during boot; during shutdown; and when suspending/waking. It has crashed, and will likely crash more, on all three occasions - so if anyone has any ideas to help with that, please let me know.
I have a few questions which I have struggled to find answers to:
What is the most likely explanation for how the audio controller is related to the crashes with graphical artefacts? Is it something to do with the fact that the audio controllers share an IOMMU group with the GPU?
If this truly is an audio controller problem, why does nomodeset work?
How should I go about debugging this further?
Bonus question: When logging the GPU clocks, I noticed that the shader clock sometimes (very rarely) reads "6784Mhz" (in P-state 1), or even more rarely "3392Mhz". I was able to observe this behaviour on a friend's device with a Ryzen 5000 series mobile chip as well. Does anyone have an explanation for this oddity? I understand that it's probably unrelated to my issue.
Last edited by Anthony Wilson (2024-11-02 21:26:01)
Offline