You are not logged in.

#1 2024-03-03 21:07:06

mesaprotector
Member
Registered: 2024-03-03
Posts: 163

[SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

EDIT: If you've been directed to this thread, the tl;dr is: don't use nvidia 550 proprietary driver. Use 550 open or an older driver. 555 does not fix the issue.

Dual-booting Arch with W11 on a fairly new Lenovo LOQ 15IRH8. Recently added a second SSD and moved my Arch install over with rsync (my media folder is still mounted on the original drive), no apparent problems. Changed mkinitcpio at the same time to use systemd initramfs, lz4 compression, and only six hooks (systemd autodetect modconf block filesystems fsck); I also set up hibernate, which I had not had before. I boot from EFISTUB with a UKI.

On the 28th, my laptop froze waking up from S3 sleep after a couple hours - it woke up, but no display or input. Didn't think much of it and did a hard shutdown. Saw nothing relevant in the logs.

On the 1st, laptop froze after being left idle (in S0) for a while. I woke up the screen and it immediately froze, but responded to REISUB. I looked through the logs and found a kernel oops from 45 or so minutes before I tried waking it up, shown at the end of this message.

On the 2nd, laptop froze again after entering "shutdown -r" in terminal. REISUB didn't work and had to do a hard shutdown, nothing in logs.

Today again, laptop froze after "shutdown -r". Kernel had booted on 6.7.7 and I was rebooting after updating it to 6.7.8. This time it displayed a message about a kernel panic - the message does not seem to have been saved in any logs. I read enough of the message to tell that the RIP was different and ended with "systemd exited with irqs disabled", but I didn't look at it carefully.

Someone on Reddit (/r/linuxquestions), with a different Lenovo laptop running Arch, had a very similar kernel panic a couple days ago, which makes me think this is a bug that isn't specific to my install.

Mar 01 13:33:28 Marojejy kernel: BUG: unable to handle page fault for address: 0000000000007fa4
Mar 01 13:33:28 Marojejy kernel: #PF: supervisor read access in kernel mode
Mar 01 13:33:28 Marojejy kernel: #PF: error_code(0x0000) - not-present page
Mar 01 13:33:28 Marojejy kernel: PGD 0 P4D 0 
Mar 01 13:33:28 Marojejy kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mar 01 13:33:28 Marojejy kernel: CPU: 5 PID: 99583 Comm: kworker/5:0 Tainted: P           OE      6.7.6-arch1-2 #1 36a1d137df2a95849ad6b7232a6023837991924e
Mar 01 13:33:28 Marojejy kernel: Hardware name: LENOVO 82XV/LNVNB161216, BIOS LZCN23WW 02/22/2023
Mar 01 13:33:28 Marojejy kernel: Workqueue: cgroup_destroy css_free_rwork_fn
Mar 01 13:33:28 Marojejy kernel: RIP: 0010:rb_first+0xf/0x30
Mar 01 13:33:28 Marojejy kernel: Code: 10 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07 48 85 c0 74 14 48 89 c2 <48> 8b 40 10 48 85 c0 75 f4 48 89 d0 c3 cc cc cc cc 31 d2 eb f4 66
Mar 01 13:33:28 Marojejy kernel: RSP: 0018:ffffa7e0ce943dd8 EFLAGS: 00010202
Mar 01 13:33:28 Marojejy kernel: RAX: 0000000000007f94 RBX: ffff9bee29b45300 RCX: 000000000080007c
Mar 01 13:33:28 Marojejy kernel: RDX: 0000000000007f94 RSI: 0000000000000000 RDI: ffff9bee020548a8
Mar 01 13:33:28 Marojejy kernel: RBP: ffff9bee0817c300 R08: 0000000000000000 R09: 000000000080007c
Mar 01 13:33:28 Marojejy kernel: R10: ffff9bee28ee8260 R11: fefefefefefefeff R12: 0000000000000000
Mar 01 13:33:28 Marojejy kernel: R13: ffff9bee020548a8 R14: ffff9bee12637800 R15: ffff9bee12637890
Mar 01 13:33:28 Marojejy kernel: FS:  0000000000000000(0000) GS:ffff9bf19f940000(0000) knlGS:0000000000000000
Mar 01 13:33:28 Marojejy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 01 13:33:28 Marojejy kernel: CR2: 0000000000007fa4 CR3: 000000017aa72000 CR4: 0000000000f50ef0
Mar 01 13:33:28 Marojejy kernel: PKRU: 55555554
Mar 01 13:33:28 Marojejy kernel: Call Trace:
Mar 01 13:33:28 Marojejy kernel:  <TASK>
Mar 01 13:33:28 Marojejy kernel:  ? __die+0x23/0x70
Mar 01 13:33:28 Marojejy kernel:  ? page_fault_oops+0x171/0x4e0
Mar 01 13:33:28 Marojejy kernel:  ? exc_page_fault+0x7f/0x180
Mar 01 13:33:28 Marojejy kernel:  ? asm_exc_page_fault+0x26/0x30
Mar 01 13:33:28 Marojejy kernel:  ? rb_first+0xf/0x30
Mar 01 13:33:28 Marojejy kernel:  simple_xattrs_free+0x29/0x90
Mar 01 13:33:28 Marojejy kernel:  kernfs_put.part.0+0x60/0x150
Mar 01 13:33:28 Marojejy kernel:  css_free_rwork_fn+0x131/0x430
Mar 01 13:33:28 Marojejy kernel:  process_one_work+0x178/0x350
Mar 01 13:33:28 Marojejy kernel:  worker_thread+0x30f/0x450
Mar 01 13:33:28 Marojejy kernel:  ? __pfx_worker_thread+0x10/0x10
Mar 01 13:33:28 Marojejy kernel:  kthread+0xe5/0x120
Mar 01 13:33:28 Marojejy kernel:  ? __pfx_kthread+0x10/0x10
Mar 01 13:33:28 Marojejy kernel:  ret_from_fork+0x31/0x50
Mar 01 13:33:28 Marojejy kernel:  ? __pfx_kthread+0x10/0x10
Mar 01 13:33:28 Marojejy kernel:  ret_from_fork_asm+0x1b/0x30
Mar 01 13:33:28 Marojejy kernel:  </TASK>
Mar 01 13:33:28 Marojejy kernel: Modules linked in: xt_mark xt_connmark wireguard curve25519_x86_64 libchacha20poly1305 chacha_x86_64 poly1305_x86_64 libcurve25519_generic libchacha ip6_udp_tunnel udp_tunnel ip6table_security ip6table_raw ip6table_mangle ip6table_nat iptable_security iptable_raw iptable_mangle iptable_nat nf_nat rfcomm snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio xt_comment ccm algif_aead crypto_null des3_ede_x86_64 cbc des_generic libdes cmac algif_skcipher md4 bnep algif_hash af_alg snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel ip6t_REJECT nf_reject_ipv6 snd_sof_intel_hda_mlink soundwire_cadence xt_hl snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp ip6t_rt snd_sof snd_sof_utils snd_soc_hdac_hda ipt_REJECT nf_reject_ipv4 snd_hda_ext_core snd_soc_acpi_intel_match xt_LOG nf_log_syslog snd_soc_acpi soundwire_generic_allocation soundwire_bus snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_codec_hdmi xt_limit xt_addrtype xt_tcpudp xt_conntrack nf_conntrack nf_defrag_ipv6
Mar 01 13:33:28 Marojejy kernel:  nf_defrag_ipv4 libcrc32c ip6table_filter ip6_tables iptable_filter nvidia_drm(POE) intel_uncore_frequency nvidia_modeset(POE) intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm rtw89_8852be irqbypass rtw89_8852b i915 crct10dif_pclmul rtw89_pci crc32_pclmul polyval_clmulni polyval_generic gf128mul rtw89_core uvcvideo ghash_clmulni_intel sha512_ssse3 sha256_ssse3 drm_buddy videobuf2_vmalloc i2c_algo_bit sha1_ssse3 uvc videobuf2_memops aesni_intel iTCO_wdt snd_hda_intel ttm videobuf2_v4l2 btusb hid_multitouch mac80211 intel_pmc_bxt snd_intel_dspcfg crypto_simd processor_thermal_device_pci btrtl cryptd snd_intel_sdw_acpi processor_thermal_device serio_raw videodev joydev mousedev btintel rapl atkbd drm_display_helper iTCO_vendor_support snd_hda_codec pkcs8_key_parser processor_thermal_wt_hint videobuf2_common btbcm libarc4 libps2 vfat mei_pxp intel_cstate ideapad_laptop processor_thermal_rfim intel_rapl_msr pmt_telemetry fat mei_hdcp nvidia_uvm(POE)
Mar 01 13:33:28 Marojejy kernel:  hid_generic vivaldi_fmap pmt_class wmi_bmof nvidia_wmi_ec_backlight r8169 processor_thermal_rapl btmtk intel_uncore snd_hda_core mc intel_lpss_pci pcspkr i2c_i801 realtek intel_rapl_common spi_nor sparse_keymap cec cfg80211 intel_lpss snd_hwdep processor_thermal_wt_req mdio_devres bluetooth mtd libphy i2c_smbus snd_pcm processor_thermal_power_floor intel_gtt usbhid idma64 platform_profile mei_me ucsi_acpi snd_timer video mei typec_ucsi snd processor_thermal_mbox int3403_thermal i8042 int3400_thermal i2c_hid_acpi typec ecdh_generic rfkill i2c_hid soundcore roles intel_vsec acpi_thermal_rel intel_pmc_core serio pinctrl_tigerlake wmi int340x_thermal_zone acpi_pad mac_hid nvidia(POE) crypto_user fuse loop dm_mod nfnetlink ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 nvme xhci_pci nvme_core crc32c_intel spi_intel_pci xhci_pci_renesas spi_intel nvme_auth
Mar 01 13:33:28 Marojejy kernel: CR2: 0000000000007fa4
Mar 01 13:33:28 Marojejy kernel: ---[ end trace 0000000000000000 ]---
Mar 01 13:33:28 Marojejy kernel: RIP: 0010:rb_first+0xf/0x30
Mar 01 13:33:28 Marojejy kernel: Code: 10 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07 48 85 c0 74 14 48 89 c2 <48> 8b 40 10 48 85 c0 75 f4 48 89 d0 c3 cc cc cc cc 31 d2 eb f4 66
Mar 01 13:33:28 Marojejy kernel: RSP: 0018:ffffa7e0ce943dd8 EFLAGS: 00010202
Mar 01 13:33:28 Marojejy kernel: RAX: 0000000000007f94 RBX: ffff9bee29b45300 RCX: 000000000080007c
Mar 01 13:33:28 Marojejy kernel: RDX: 0000000000007f94 RSI: 0000000000000000 RDI: ffff9bee020548a8
Mar 01 13:33:28 Marojejy kernel: RBP: ffff9bee0817c300 R08: 0000000000000000 R09: 000000000080007c
Mar 01 13:33:28 Marojejy kernel: R10: ffff9bee28ee8260 R11: fefefefefefefeff R12: 0000000000000000
Mar 01 13:33:28 Marojejy kernel: R13: ffff9bee020548a8 R14: ffff9bee12637800 R15: ffff9bee12637890
Mar 01 13:33:28 Marojejy kernel: FS:  0000000000000000(0000) GS:ffff9bf19f940000(0000) knlGS:0000000000000000
Mar 01 13:33:28 Marojejy kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 01 13:33:28 Marojejy kernel: CR2: 0000000000007fa4 CR3: 000000017aa72000 CR4: 0000000000f50ef0
Mar 01 13:33:28 Marojejy kernel: PKRU: 55555554
Mar 01 13:33:28 Marojejy kernel: note: kworker/5:0[99583] exited with irqs disabled

Last edited by mesaprotector (2024-07-05 21:15:10)

Offline

#2 2024-03-04 05:23:33

bbaa
Member
Registered: 2024-03-04
Posts: 3

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I'm having the same problem.
Some log here:
https://fars.ee/xvpp
https://fars.ee/4cWR

Offline

#3 2024-03-04 05:48:05

bbaa
Member
Registered: 2024-03-04
Posts: 3

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I also dual boot with Windows 11,and mounted a ntfs partition automatically.
Maybe it is a ntfs3 driver bug.
Edit: the panic also occurs without the ntfs3 driver.

Last edited by bbaa (2024-03-06 04:41:17)

Offline

#4 2024-03-04 20:57:57

cjvth
Member
Registered: 2024-03-04
Posts: 2

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Same problem. Auto-mounting ntfs from Win10 and Lenovo laptop

kernel: BUG: unable to handle page fault for address: 00000000000029af
kernel: #PF: supervisor read access in kernel mode
kernel: #PF: error_code(0x0000) - not-present page

Nothing after that

Offline

#5 2024-03-04 22:30:35

loqs
Member
Registered: 2014-03-06
Posts: 18,078

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Have you tried bisecting the issue between 6.7.6 and 6.7.7?

Offline

#6 2024-03-04 22:45:37

gnox
Member
Registered: 2013-05-18
Posts: 83

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I have the same laptop, I saw that kernel panic only when rebooting twice since 6.7, I would recommend to update your bios and use thermald (that laptop is a small oven)  :
yours:

LENOVO 82XV/LNVNB161216, BIOS LZCN23WW 02/22/2023

Current updated on my laptop

LENOVO 82XV/LNVNB161216, BIOS LZCN34WW 12/19/2023

Offline

#7 2024-03-05 00:26:19

mesaprotector
Member
Registered: 2024-03-03
Posts: 163

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

loqs wrote:

Have you tried bisecting the issue between 6.7.6 and 6.7.7?

It definitely happens with:
6.7.6.arch1-1
6.7.6.arch1-2
6.7.7.arch1-1

No issues so far on 6.7.8, but it's only been a day and a half, and it apparently took several days before a crash on 6.7.6.arch1-1. I updated to it on the morning of the 23rd. If I find a way to reproduce the bug by force, I'm happy to test it on all recent versions; as it is all I can really do is wait. The only thing all of my freezes have in common are that they all involved some aspect of the system being in a low-power state. Would be interested knowing if that's the case for other people.

gnox wrote:

I have the same laptop, I saw that kernel panic only when rebooting twice since 6.7, I would recommend to update your bios and use thermald (that laptop is a small oven)  :

My temps are weirdly extremely good (as I type this, 35 for my CPU, 25/31 for my two drives; even running a stress test nothing is ever close to hot, and cooler on Linux than on Windows. I should check DIMM temperatures, given the error, but the Arch wiki's guide on that doesn't work for me. All I know is that they're normal on Windows 11. I'd rather not do a bios update for many reasons, but if this keeps happening and there really is no other explanation, then I guess I will.

For whatever it's worth, I did a full system backup on the afternoon of the 28th (just before the first crash, but after the update to 6.7.6), so if there's any useful information I can look up there, let me know.

Offline

#8 2024-03-05 03:02:49

gnox
Member
Registered: 2013-05-18
Posts: 83

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

mesaprotector wrote:

My temps are weirdly extremely good (as I type this, 35 for my CPU, 25/31 for my two drives; even running a stress test nothing is ever close to hot, and cooler on Linux than on Windows. I should check DIMM temperatures, given the error, but the Arch wiki's guide on that doesn't work for me. All I know is that they're normal on Windows 11. I'd rather not do a bios update for many reasons, but if this keeps happening and there really is no other explanation, then I guess I will.

For whatever it's worth, I did a full system backup on the afternoon of the 28th (just before the first crash, but after the update to 6.7.6), so if there's any useful information I can look up there, let me know.

Well since I bought that laptop August last year there were 3 bios updates, one fixed a strange behavior of the fans that they were running constantly when the laptop was on igpu mode (config on bios) when the logical thing was to be running when on hybrid/dgpu mode,  only  with the fans running constantly before that bios update I got low temps, normally since then it is on 45C avg. 

Also I changed nvme and wifi card just after I bought it and that error happened even with that. The thing that I see different is the memory that supposedly is quad channel, when checking the speed on Windows (hwinfo) the speed appears as it is calculated as dual channel, but checking on linux the speed appears as divided by 4 like it is running on low speed.

$sudo lshw -c memory
...
*-memory
       description: System Memory
       physical id: 28
       slot: System board or motherboard
       size: 16GiB
     *-bank:0
          description: SODIMM Synchronous 5600 MHz (0.2 ns)
          product: M425R1GB4BB0-CWMOD
          size: 8GiB
          ...
          clock: 1305MHz (0.8ns)
     *-bank:1
          description: SODIMM Synchronous 5600 MHz (0.2 ns)
          product: M425R1GB4BB0-CWMOD
          ...
          size: 8GiB
          width: 64 bits
          clock: 1305MHz (0.8ns)

In Windows (hwinfo) appears as ~25xx Mhz,
 
1305 Mhz  * 4 = 5220

$sudo dmidecode -t memory
...
Memory Device
        ...
    Size: 8 GB
    Form Factor: SODIMM
    Set: None
    Locator: Controller1-ChannelA-DIMM0
    Bank Locator: BANK 0
    Type: DDR5
    Type Detail: Synchronous
    Speed: 5600 MT/s
        ...
    Configured Memory Speed: 5200 MT/s
        ...
Memory Device
       ...
    Configured Memory Speed: 5200 MT/s
       ...

But I dont know if lshw shows the correct speed directly or if it is doing  something different.

Offline

#9 2024-03-05 08:31:31

seth
Member
Registered: 2012-09-03
Posts: 59,043

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I also dual boot with Windows 11

3rd link below. Mandatory.
Disable it (it's NOT the BIOS setting!) and reboot windows and linux twice for voodo reasons.

Maybe it is a ntfs3 driver bug.

https://wiki.archlinux.org/title/NTFS-3G

I read enough of the message to tell that the RIP was different and ended with "systemd exited with irqs disabled", but I didn't look at it carefully.

https://bbs.archlinux.org/viewtopic.php … 9#p2154609

Online

#10 2024-03-05 15:45:39

torvic9
Member
Registered: 2022-08-26
Posts: 8

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Offline

#11 2024-03-05 18:10:33

~tfa
Member
From: Germany
Registered: 2021-11-03
Posts: 14
Website

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I have the same oops on the lts kernel. It only appears randomly on every 2nd or 3rd shutdown and prevents the system from powering off. I will try and go with the numa=off kernel option.

Mär 05 16:18:49 discordiae kernel: BUG: unable to handle page fault for address: 0000000000023704
Mär 05 16:18:49 discordiae kernel: #PF: supervisor read access in kernel mode
Mär 05 16:18:49 discordiae kernel: #PF: error_code(0x0000) - not-present page
Mär 05 16:18:49 discordiae kernel: PGD 0 P4D 0 
Mär 05 16:18:49 discordiae kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mär 05 16:18:49 discordiae kernel: CPU: 2 PID: 1151 Comm: systemd Tainted: P     U  W  OE      6.6.20-1-lts #1 6e375b09266a17eb8bcc1d35607d8a8f02f567ee
Mär 05 16:18:49 discordiae kernel: Hardware name: LENOVO 83BU/LNVNB161216, BIOS MBCN26WW 07/25/2023
Mär 05 16:18:49 discordiae kernel: RIP: 0010:rb_first+0xf/0x30
Mär 05 16:18:49 discordiae kernel: Code: 10 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07 48 85 c0 74 14 48 89 c2 <48> 8b 40 10 48 85 c0 75 f4 48 89 d0 c3 cc cc cc cc 31 d2 eb f4 66
Mär 05 16:18:49 discordiae kernel: RSP: 0018:ffffc900077afcc0 EFLAGS: 00010202
Mär 05 16:18:49 discordiae kernel: RAX: 00000000000236f4 RBX: ffff888176cedf00 RCX: 000000008100003d
Mär 05 16:18:49 discordiae kernel: RDX: 00000000000236f4 RSI: 0000000000000000 RDI: ffff88811bc0ed58
Mär 05 16:18:49 discordiae kernel: RBP: ffff88811edd9700 R08: 0000000000000000 R09: 000000008100003d
Mär 05 16:18:49 discordiae kernel: R10: ffff888263e9e360 R11: fefefefefefefeff R12: 0000000000000000
Mär 05 16:18:49 discordiae kernel: R13: ffff88811bc0ed58 R14: ffffffff8c2dacd8 R15: ffff88832ce6d800
Mär 05 16:18:49 discordiae kernel: FS:  000079e1b5dbd880(0000) GS:ffff88888f680000(0000) knlGS:0000000000000000
Mär 05 16:18:49 discordiae kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mär 05 16:18:49 discordiae kernel: CR2: 0000000000023704 CR3: 0000000101b7a000 CR4: 0000000000f50ee0
Mär 05 16:18:49 discordiae kernel: PKRU: 55555554
Mär 05 16:18:49 discordiae kernel: Call Trace:
Mär 05 16:18:49 discordiae kernel:  <TASK>
Mär 05 16:18:49 discordiae kernel:  ? __die+0x23/0x70
Mär 05 16:18:49 discordiae kernel:  ? page_fault_oops+0x171/0x4e0
Mär 05 16:18:49 discordiae kernel:  ? exc_page_fault+0x7f/0x180
Mär 05 16:18:49 discordiae kernel:  ? asm_exc_page_fault+0x26/0x30
Mär 05 16:18:49 discordiae kernel:  ? rb_first+0xf/0x30
Mär 05 16:18:49 discordiae kernel:  simple_xattrs_free+0x29/0x90
Mär 05 16:18:49 discordiae kernel:  kernfs_put.part.0+0x60/0x150
Mär 05 16:18:49 discordiae kernel:  kernfs_remove_by_name_ns+0x81/0xd0
Mär 05 16:18:49 discordiae kernel:  cgroup_addrm_files+0x2c6/0x350
Mär 05 16:18:49 discordiae kernel:  cgroup_destroy_locked+0xf7/0x1b0
Mär 05 16:18:49 discordiae kernel:  cgroup_rmdir+0x2b/0xd0
Mär 05 16:18:49 discordiae kernel:  kernfs_iop_rmdir+0x50/0x80
Mär 05 16:18:49 discordiae kernel:  vfs_rmdir+0x97/0x200
Mär 05 16:18:49 discordiae kernel:  do_rmdir+0x175/0x1b0
Mär 05 16:18:49 discordiae kernel:  __x64_sys_unlinkat+0x4e/0x70
Mär 05 16:18:49 discordiae kernel:  do_syscall_64+0x5d/0x90
Mär 05 16:18:49 discordiae kernel:  ? do_syscall_64+0x6c/0x90
Mär 05 16:18:49 discordiae kernel:  ? do_syscall_64+0x6c/0x90
Mär 05 16:18:49 discordiae kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Mär 05 16:18:49 discordiae kernel: RIP: 0033:0x79e1b671a04b
Mär 05 16:18:49 discordiae kernel: Code: 77 05 c3 0f 1f 40 00 48 8b 15 e1 bc 0d 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d b5 bc 0d 00 f7 d8 64 89 01 48
Mär 05 16:18:49 discordiae kernel: RSP: 002b:00007ffd28662158 EFLAGS: 00000206 ORIG_RAX: 0000000000000107
Mär 05 16:18:49 discordiae kernel: RAX: ffffffffffffffda RBX: 0000555f22157980 RCX: 000079e1b671a04b
Mär 05 16:18:49 discordiae kernel: RDX: 0000000000000200 RSI: 0000555f22147ffb RDI: 000000000000000d
Mär 05 16:18:49 discordiae kernel: RBP: 00007ffd28662440 R08: 0000555f22147fe8 R09: 0000000000000000
Mär 05 16:18:49 discordiae kernel: R10: 0000555f220be790 R11: 0000000000000206 R12: 0000000000000063
Mär 05 16:18:49 discordiae kernel: R13: 0000000000000006 R14: 0000000000000030 R15: 000079e1b68a3a70
Mär 05 16:18:49 discordiae kernel:  </TASK>
Mär 05 16:18:49 discordiae kernel: Modules linked in: usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee rfcomm snd_seq_dummy snd_hrtimer snd_seq ccm snd_seq_device cmac algif_hash algif_skcipher af_alg snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi snd_hda_codec_hdmi soundwire_generic_allocation soundwire_bus bnep uvcvideo videobuf2_vmalloc uvc btusb videobuf2_memops videobuf2_v4l2 btrtl btintel btbcm videodev btmtk videobuf2_common bluetooth mc ecdh_generic hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common industrialio hid_sensor_custom hid_sensor_hub intel_ishtp_hid intel_uncore_frequency intel_uncore_frequency_common
Mär 05 16:18:49 discordiae kernel:  intel_tcc_cooling x86_pkg_temp_thermal joydev mousedev intel_powerclamp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul snd_hda_scodec_tas2781_i2c polyval_clmulni iwlmvm snd_soc_tas2781_fmwlib polyval_generic snd_soc_tas2781_comlib gf128mul i915 snd_hda_intel ghash_clmulni_intel snd_soc_core snd_intel_dspcfg mac80211 sha512_ssse3 snd_intel_sdw_acpi sha256_ssse3 snd_hda_codec sha1_ssse3 snd_compress aesni_intel snd_hda_core ac97_bus iTCO_wdt crypto_simd drm_buddy snd_hwdep processor_thermal_device_pci hid_multitouch pmt_telemetry processor_thermal_device i2c_algo_bit cryptd snd_pcm_dmaengine intel_pmc_bxt libarc4 iwlwifi mei_hdcp mei_pxp iTCO_vendor_support pmt_class intel_rapl_msr rapl vfat snd_pcm ttm ucsi_acpi processor_thermal_rfim nvidia_drm(POE) wmi_bmof intel_cstate nvidia_wmi_ec_backlight mei_me fat spi_nor drm_display_helper intel_lpss_pci i2c_i801 snd_timer processor_thermal_mbox ideapad_laptop typec_ucsi intel_ish_ipc cfg80211 nvidia_modeset(POE) intel_uncore thunderbolt mtd pcspkr
Mär 05 16:18:49 discordiae kernel:  intel_lpss mei platform_profile cec typec i2c_smbus processor_thermal_rapl snd intel_ishtp intel_vsec intel_gtt idma64 intel_rapl_common roles rfkill soundcore i2c_hid_acpi crc8 video i2c_hid int3403_thermal int3400_thermal int340x_thermal_zone intel_hid acpi_tad wmi acpi_thermal_rel sparse_keymap acpi_pad mac_hid nvidia_uvm(POE) nvidia(POE) i2c_dev fuse crypto_user acpi_call(OE) loop dm_mod nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 rtsx_pci_sdmmc serio_raw mmc_core atkbd libps2 nvme vivaldi_fmap nvme_core spi_intel_pci rtsx_pci crc32c_intel xhci_pci spi_intel nvme_common i8042 xhci_pci_renesas serio [last unloaded: coretemp]
Mär 05 16:18:49 discordiae kernel: CR2: 0000000000023704
Mär 05 16:18:49 discordiae kernel: ---[ end trace 0000000000000000 ]---
Mär 05 16:18:49 discordiae kernel: RIP: 0010:rb_first+0xf/0x30
Mär 05 16:18:49 discordiae kernel: Code: 10 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07 48 85 c0 74 14 48 89 c2 <48> 8b 40 10 48 85 c0 75 f4 48 89 d0 c3 cc cc cc cc 31 d2 eb f4 66
Mär 05 16:18:49 discordiae kernel: RSP: 0018:ffffc900077afcc0 EFLAGS: 00010202
Mär 05 16:18:49 discordiae kernel: RAX: 00000000000236f4 RBX: ffff888176cedf00 RCX: 000000008100003d
Mär 05 16:18:49 discordiae kernel: RDX: 00000000000236f4 RSI: 0000000000000000 RDI: ffff88811bc0ed58
Mär 05 16:18:49 discordiae kernel: RBP: ffff88811edd9700 R08: 0000000000000000 R09: 000000008100003d
Mär 05 16:18:49 discordiae kernel: R10: ffff888263e9e360 R11: fefefefefefefeff R12: 0000000000000000
Mär 05 16:18:49 discordiae kernel: R13: ffff88811bc0ed58 R14: ffffffff8c2dacd8 R15: ffff88832ce6d800
Mär 05 16:18:49 discordiae kernel: FS:  000079e1b5dbd880(0000) GS:ffff88888f680000(0000) knlGS:0000000000000000
Mär 05 16:18:49 discordiae kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mär 05 16:18:49 discordiae kernel: CR2: 0000000000023704 CR3: 0000000101b7a000 CR4: 0000000000f50ee0
Mär 05 16:18:49 discordiae kernel: PKRU: 55555554
Mär 05 16:18:49 discordiae kernel: note: systemd[1151] exited with irqs disabled

Offline

#12 2024-03-05 18:17:25

mesaprotector
Member
Registered: 2024-03-03
Posts: 163

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

This doesn't seem to be an Arch-specific bug: found this from someone running 6.7.6 on the OpenSUSE forums. Exact same RIP as myself, @bbaa, and @~tfa. I'm not sure of the procedure now - should it be reported on the main linux kernel bugtracker? (I still have had no issues since going to 6.7.8, but someone else could say otherwise.)

I do notice that it seems like an awful lot of us having this bug are using gaming laptops, which makes me suspect Nvidia even though it doesn't really match the traces. Usually if something fails catastrophically on Arch, I'd wonder about the Nvidia drivers first. Unlike the kernel, which updated five and a half days (for me) before the first crash, I updated to the 550 branch of Nvidia drivers exactly on the 28th, the day of my first crash.

Last edited by mesaprotector (2024-03-05 18:28:22)

Offline

#13 2024-03-05 18:18:45

seth
Member
Registered: 2012-09-03
Posts: 59,043

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

If this is the numa bug, it's already addressed. So you'd check for that first (esp. if you can somewhat reliably trigger this)

Online

#14 2024-03-06 00:59:08

godman180
Member
Registered: 2024-03-06
Posts: 3

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Hi all,

I was getting similar bugs and I think the issue is related to the nvidia driver. I installed nvidia-open-dkms and it went away for me. I also have numa=off in my kernel command line just in case.

I was able to reproduce the issue using

stress-ng --class vm --all 1

though sometimes I needed to ctrl+c  and rerun it a couple times before I'd start getting panics. Rebooting would also hang about half the time. I tried numa=off and using an lts kernel; both did not work. Once I switched nvidia drivers to nvidia-open-dkms, I was not able to reproduce the issue, which makes me think that worked.

Offline

#15 2024-03-06 13:40:26

~tfa
Member
From: Germany
Registered: 2021-11-03
Posts: 14
Website

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I hit the bug again today, with numa=off enabled, so I can confirm, it is probably not the root cause.

Mär 06 14:33:29 discordiae kernel: BUG: unable to handle page fault for address: 00000000000109c1
Mär 06 14:33:29 discordiae kernel: #PF: supervisor read access in kernel mode
Mär 06 14:33:29 discordiae kernel: #PF: error_code(0x0000) - not-present page
Mär 06 14:33:29 discordiae kernel: PGD 0 P4D 0 
Mär 06 14:33:29 discordiae kernel: Oops: 0000 [#1] PREEMPT SMP NOPTI
Mär 06 14:33:29 discordiae kernel: CPU: 4 PID: 6424 Comm: systemd Tainted: P     U  W  OE      6.6.20-1-lts #1 6e375b09266a17eb8bcc1d35607d8a8f02f567ee
Mär 06 14:33:29 discordiae kernel: Hardware name: LENOVO 83BU/LNVNB161216, BIOS MBCN26WW 07/25/2023
Mär 06 14:33:29 discordiae kernel: RIP: 0010:rb_first+0xf/0x30
Mär 06 14:33:29 discordiae kernel: Code: 10 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07 48 85 c0 74 14 48 89 c2 <48> 8b 40 10 48 85 c0 75 f4 48 89 d0 c3 cc cc cc cc 31 d2 eb f4 66
Mär 06 14:33:29 discordiae kernel: RSP: 0018:ffffc9000d1d7cd8 EFLAGS: 00010206
Mär 06 14:33:29 discordiae kernel: RAX: 00000000000109b1 RBX: ffff888168e55600 RCX: 00000000810000b8
Mär 06 14:33:29 discordiae kernel: RDX: 00000000000109b1 RSI: 0000000000000000 RDI: ffff888162ef9e48
Mär 06 14:33:29 discordiae kernel: RBP: ffff888168e55200 R08: 0000000000000000 R09: 00000000810000b8
Mär 06 14:33:29 discordiae kernel: R10: ffff88810f1855d0 R11: ffffc9000d1d7cd8 R12: 0000000000000000
Mär 06 14:33:29 discordiae kernel: R13: ffff888162ef9e48 R14: 0000000000000002 R15: 0000000000000000
Mär 06 14:33:29 discordiae kernel: FS:  000079df3f14e880(0000) GS:ffff88888f700000(0000) knlGS:0000000000000000
Mär 06 14:33:29 discordiae kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mär 06 14:33:29 discordiae kernel: CR2: 00000000000109c1 CR3: 0000000139072000 CR4: 0000000000f50ee0
Mär 06 14:33:29 discordiae kernel: PKRU: 55555554
Mär 06 14:33:29 discordiae kernel: Call Trace:
Mär 06 14:33:29 discordiae kernel:  <TASK>
Mär 06 14:33:29 discordiae kernel:  ? __die+0x23/0x70
Mär 06 14:33:29 discordiae kernel:  ? page_fault_oops+0x171/0x4e0
Mär 06 14:33:29 discordiae kernel:  ? __slab_free+0xf1/0x380
Mär 06 14:33:29 discordiae kernel:  ? exc_page_fault+0x7f/0x180
Mär 06 14:33:29 discordiae kernel:  ? asm_exc_page_fault+0x26/0x30
Mär 06 14:33:29 discordiae kernel:  ? rb_first+0xf/0x30
Mär 06 14:33:29 discordiae kernel:  simple_xattrs_free+0x29/0x90
Mär 06 14:33:29 discordiae kernel:  kernfs_put.part.0+0x60/0x150
Mär 06 14:33:29 discordiae kernel:  evict+0xd4/0x1e0
Mär 06 14:33:29 discordiae kernel:  __dentry_kill+0xe6/0x190
Mär 06 14:33:29 discordiae kernel:  shrink_dentry_list+0x85/0x180
Mär 06 14:33:29 discordiae kernel:  shrink_dcache_parent+0xd0/0x120
Mär 06 14:33:29 discordiae kernel:  vfs_rmdir+0xb0/0x200
Mär 06 14:33:29 discordiae kernel:  do_rmdir+0x175/0x1b0
Mär 06 14:33:29 discordiae kernel:  __x64_sys_rmdir+0x42/0x70
Mär 06 14:33:29 discordiae kernel:  do_syscall_64+0x5d/0x90
Mär 06 14:33:29 discordiae kernel:  ? kmem_cache_free+0x22/0x3a0
Mär 06 14:33:29 discordiae kernel:  ? __call_rcu_common+0xf4/0x740
Mär 06 14:33:29 discordiae kernel:  ? syscall_exit_to_user_mode+0x2b/0x40
Mär 06 14:33:29 discordiae kernel:  ? do_syscall_64+0x6c/0x90
Mär 06 14:33:29 discordiae kernel:  ? do_syscall_64+0x6c/0x90
Mär 06 14:33:29 discordiae kernel:  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Mär 06 14:33:29 discordiae kernel: RIP: 0033:0x79df3fb1977b
Mär 06 14:33:29 discordiae kernel: Code: f0 ff ff 73 01 c3 48 8b 0d b2 c5 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 0f 1f 84 00 00 00 00 00 f3 0f 1e fa b8 54 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 05 c3 0f 1f 40 00 48 8b 15 81 c5 0d 00 f7 d8
Mär 06 14:33:29 discordiae kernel: RSP: 002b:00007fff7bcc0028 EFLAGS: 00000246 ORIG_RAX: 0000000000000054
Mär 06 14:33:29 discordiae kernel: RAX: ffffffffffffffda RBX: 0000587f26676510 RCX: 000079df3fb1977b
Mär 06 14:33:29 discordiae kernel: RDX: 0000000000000000 RSI: 0000000000000001 RDI: 0000587f26672660
Mär 06 14:33:29 discordiae kernel: RBP: 00007fff7bcc0080 R08: 000079df3f6aa90a R09: 0000000000000007
Mär 06 14:33:29 discordiae kernel: R10: 0000587f266cde50 R11: 0000000000000246 R12: 0000000000000001
Mär 06 14:33:29 discordiae kernel: R13: 0000000000000000 R14: 0000587f26672660 R15: 0000000000000000
Mär 06 14:33:29 discordiae kernel:  </TASK>
Mär 06 14:33:29 discordiae kernel: Modules linked in: udp_diag tcp_diag inet_diag rfcomm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_sof_probes snd_soc_hdac_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio cmac algif_hash algif_skcipher af_alg snd_hda_codec_hdmi snd_soc_dmic snd_sof_pci_intel_tgl snd_sof_intel_hda_common soundwire_intel snd_sof_intel_hda_mlink soundwire_cadence snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp snd_sof snd_sof_utils snd_soc_hdac_hda snd_hda_ext_core snd_soc_acpi_intel_match snd_soc_acpi soundwire_generic_allocation soundwire_bus bnep uvcvideo videobuf2_vmalloc uvc videobuf2_memops btusb videobuf2_v4l2 btrtl videodev btintel btbcm btmtk videobuf2_common mc bluetooth ecdh_generic hid_sensor_als hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common hid_sensor_custom industrialio hid_sensor_hub intel_ishtp_hid intel_uncore_frequency intel_uncore_frequency_common intel_tcc_cooling x86_pkg_temp_thermal
Mär 06 14:33:29 discordiae kernel:  intel_powerclamp coretemp kvm_intel mousedev joydev kvm irqbypass crct10dif_pclmul crc32_pclmul polyval_clmulni polyval_generic gf128mul iwlmvm ghash_clmulni_intel i915 sha512_ssse3 sha256_ssse3 mac80211 sha1_ssse3 snd_hda_scodec_tas2781_i2c aesni_intel snd_soc_tas2781_fmwlib snd_hda_intel snd_soc_tas2781_comlib drm_buddy snd_intel_dspcfg crypto_simd i2c_algo_bit cryptd snd_intel_sdw_acpi libarc4 ttm snd_soc_core snd_hda_codec drm_display_helper iTCO_wdt nvidia_drm(POE) snd_compress rapl nvidia_modeset(POE) mei_pxp hid_multitouch snd_hda_core cec intel_pmc_bxt ac97_bus iwlwifi mei_hdcp iTCO_vendor_support intel_gtt snd_pcm_dmaengine processor_thermal_device_pci ideapad_laptop pmt_telemetry snd_hwdep vfat intel_cstate pmt_class intel_rapl_msr platform_profile wmi_bmof nvidia_wmi_ec_backlight spi_nor processor_thermal_device cfg80211 fat intel_uncore mei_me video ucsi_acpi snd_pcm processor_thermal_rfim typec_ucsi pcspkr i2c_i801 mtd intel_ish_ipc processor_thermal_mbox typec snd_timer thunderbolt mei
Mär 06 14:33:29 discordiae kernel:  intel_lpss_pci i2c_smbus processor_thermal_rapl rfkill intel_lpss intel_ishtp roles idma64 snd intel_vsec intel_rapl_common soundcore int3403_thermal crc8 i2c_hid_acpi int340x_thermal_zone i2c_hid int3400_thermal intel_hid acpi_thermal_rel wmi sparse_keymap acpi_pad acpi_tad mac_hid nvidia_uvm(POE) nvidia(POE) i2c_dev fuse dm_mod crypto_user acpi_call(OE) loop nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid serio_raw rtsx_pci_sdmmc atkbd mmc_core libps2 nvme vivaldi_fmap xhci_pci spi_intel_pci rtsx_pci nvme_core spi_intel crc32c_intel xhci_pci_renesas i8042 nvme_common serio
Mär 06 14:33:29 discordiae kernel: CR2: 00000000000109c1
Mär 06 14:33:29 discordiae kernel: ---[ end trace 0000000000000000 ]---
Mär 06 14:33:29 discordiae kernel: RIP: 0010:rb_first+0xf/0x30
Mär 06 14:33:29 discordiae kernel: Code: 10 c3 cc cc cc cc 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 48 8b 07 48 85 c0 74 14 48 89 c2 <48> 8b 40 10 48 85 c0 75 f4 48 89 d0 c3 cc cc cc cc 31 d2 eb f4 66
Mär 06 14:33:29 discordiae kernel: RSP: 0018:ffffc9000d1d7cd8 EFLAGS: 00010206
Mär 06 14:33:29 discordiae kernel: RAX: 00000000000109b1 RBX: ffff888168e55600 RCX: 00000000810000b8
Mär 06 14:33:29 discordiae kernel: RDX: 00000000000109b1 RSI: 0000000000000000 RDI: ffff888162ef9e48
Mär 06 14:33:29 discordiae kernel: RBP: ffff888168e55200 R08: 0000000000000000 R09: 00000000810000b8
Mär 06 14:33:29 discordiae kernel: R10: ffff88810f1855d0 R11: ffffc9000d1d7cd8 R12: 0000000000000000
Mär 06 14:33:29 discordiae kernel: R13: ffff888162ef9e48 R14: 0000000000000002 R15: 0000000000000000
Mär 06 14:33:29 discordiae kernel: FS:  000079df3f14e880(0000) GS:ffff88888f700000(0000) knlGS:0000000000000000
Mär 06 14:33:29 discordiae kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mär 06 14:33:29 discordiae kernel: CR2: 00000000000109c1 CR3: 0000000139072000 CR4: 0000000000f50ee0
Mär 06 14:33:29 discordiae kernel: PKRU: 55555554
Mär 06 14:33:29 discordiae kernel: note: systemd[6424] exited with irqs disabled

The system runs stables for hours, with switches between suspend and idle and multiple monitor connects and disconnects and the issue only arises at shutdown or reboot. I am on nvidia 550 as well (nvidia-dkms-tkg). I am also on a Wayland session an a hybrid GPU setup.

Offline

#16 2024-03-06 14:08:45

seth
Member
Registered: 2012-09-03
Posts: 59,043

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

With the recent hiccups in the nvidia blob, you coul also try to revert to the 535xx drivers (dkms + utils) from the https://wiki.archlinux.org/title/Arch_Linux_Archive hmm

Online

#17 2024-03-06 14:59:21

torvic9
Member
Registered: 2022-08-26
Posts: 8

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Linux 6.7.9 has just been released, which should bring the fix for this issue.

I can also confirm that I can only reproduce the issue on my old nVidia-equipped system, but not on Intel or amdgpu graphics.

EDIT: instead of using 'numa=off', it might be better to use 'numa_balancing=off'.

Last edited by torvic9 (2024-03-06 15:40:28)

Offline

#18 2024-03-06 16:31:00

bbaa
Member
Registered: 2024-03-04
Posts: 3

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Some kdump analysis shows that kernel panic seems to be related to files in /sys/fs/cgroup.
I can confirm that doing a getfattr loop on files in /sys/fs/cgroup after uninstalling the nvidia driver won't cause a panic/oops.

Type "apropos word" to search for commands related to "word"...

      KERNEL: vmlinux  [TAINTED]                
    DUMPFILE: crashdump-2024-03-05-14-01-18  [PARTIAL DUMP]
        CPUS: 16
        DATE: Tue Mar  5 13:59:43 CST 2024
      UPTIME: 00:27:26
LOAD AVERAGE: 0.32, 0.09, 0.07
       TASKS: 923
    NODENAME: archhometx
     RELEASE: 6.7.8-zen1-1-zen
     VERSION: #1 ZEN SMP PREEMPT_DYNAMIC Mon, 04 Mar 2024 15:22:56 +0000
     MACHINE: x86_64  (3193 Mhz)
      MEMORY: 31.2 GB
       PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI" (check log for details)
         PID: 1091
     COMMAND: "systemd"
        TASK: ffff93e8e5180000  [THREAD_INFO: ffff93e8e5180000]
         CPU: 10
       STATE: TASK_RUNNING (PANIC)

crash> bt -FFsx
PID: 1091     TASK: ffff93e8e5180000  CPU: 10   COMMAND: "systemd"
 #0 [ffffb2400c7fba38] machine_kexec+0x1d0 at ffffffff92a907a0
    ffffb2400c7fba40: 00007f41e1776880 0000000000000000 
    ffffb2400c7fba50: 0000000042005000 ffff93e802005000 
    ffffb2400c7fba60: 0000000042004000 0000000000000000 
    ffffb2400c7fba70: a622253d73297b00 ffffb2400c7fba98 
    ffffb2400c7fba80: ffffb2400c7fbc28 0000000000000046 
    ffffb2400c7fba90: __crash_kexec+123 
 #1 [ffffb2400c7fba90] __crash_kexec+0x7b at ffffffff92bf712b
    ffffb2400c7fba98: [ffff93e901c944d8:dentry] [ffff93e901c94b20:dentry] 
    ffffb2400c7fbaa8: [ffff93e8c01aa900:kmalloc-256] [ffff93e965473628:Acpi-State] 
    ffffb2400c7fbab8: [ffff93e8de22d300:kernfs_node_cache] [ffff93e965472200:kernfs_node_cache] 
    ffffb2400c7fbac8: 0000000000000000 [ffff93e8f0899cf0:kmalloc-16] 
    ffffb2400c7fbad8: 0000000000000000 0000000000000001 
    ffffb2400c7fbae8: 0000000000000008 00000000810000c8 
    ffffb2400c7fbaf8: 0000000000000008 0000000000000000 
    ffffb2400c7fbb08: [ffff93e965473628:Acpi-State] ffffffffffffffff 
    ffffb2400c7fbb18: rb_first+15      0000000000000010 
    ffffb2400c7fbb28: 0000000000010202 ffffb2400c7fbcd0 
    ffffb2400c7fbb38: 0000000000000018 a622253d73297b00 
    ffffb2400c7fbb48: 0000000000000009 crash_kexec+44   
 #2 [ffffb2400c7fbb50] crash_kexec+0x2c at ffffffff92bf9cac
    ffffb2400c7fbb58: oops_end+212     
 #3 [ffffb2400c7fbb58] oops_end+0xd4 at ffffffff92a3ecd4
    ffffb2400c7fbb60: 0000000000000009 ffffb2400c7fbc28 
    ffffb2400c7fbb70: 0000000000000018 page_fault_oops+405 
 #4 [ffffb2400c7fbb78] page_fault_oops+0x195 at ffffffff92aa4e25
    ffffb2400c7fbb80: 0000000000000000 0000000000000000 
    ffffb2400c7fbb90: 0000000000000000 0000000000000046 
    ffffb2400c7fbba0: 0000000000000000 0000000000000000 
    ffffb2400c7fbbb0: 0000000000000000 0000000000000000 
    ffffb2400c7fbbc0: a622253d73297b00 ffffb2400c7fbc28 
    ffffb2400c7fbbd0: 0000000000000018 0000000000000000 
    ffffb2400c7fbbe0: 0000000000000000 0000000000000000 
    ffffb2400c7fbbf0: 0000000000000000 exc_page_fault+127 
 #5 [ffffb2400c7fbbf8] exc_page_fault+0x7f at ffffffff939e352f
    ffffb2400c7fbc00: 0000000000000000 0000000000000000 
    ffffb2400c7fbc10: 0000000000000000 0000000000000000 
    ffffb2400c7fbc20: asm_exc_page_fault+38 
 #6 [ffffb2400c7fbc20] asm_exc_page_fault+0x26 at ffffffff93c012a6
    [exception RIP: rb_first+15]
    RIP: ffffffff939c69cf  RSP: ffffb2400c7fbcd0  RFLAGS: 00010202
    RAX: 0000000000000008  RBX: ffff93e965472200  RCX: 00000000810000c8
    RDX: 0000000000000008  RSI: 0000000000000000  RDI: ffff93e965473628
    RBP: ffff93e8de22d300   R8: 0000000000000001   R9: 0000000000000000
    R10: ffff93e8f0899cf0  R11: 0000000000000000  R12: ffff93e965473628
    R13: ffff93e8c01aa900  R14: ffff93e901c94b20  R15: ffff93e901c944d8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    ffffb2400c7fbc28: [ffff93e901c944d8:dentry] [ffff93e901c94b20:dentry] 
    ffffb2400c7fbc38: [ffff93e8c01aa900:kmalloc-256] [ffff93e965473628:Acpi-State] 
    ffffb2400c7fbc48: [ffff93e8de22d300:kernfs_node_cache] [ffff93e965472200:kernfs_node_cache] 
    ffffb2400c7fbc58: 0000000000000000 [ffff93e8f0899cf0:kmalloc-16] 
    ffffb2400c7fbc68: 0000000000000000 0000000000000001 
    ffffb2400c7fbc78: 0000000000000008 00000000810000c8 
    ffffb2400c7fbc88: 0000000000000008 0000000000000000 
    ffffb2400c7fbc98: [ffff93e965473628:Acpi-State] ffffffffffffffff 
    ffffb2400c7fbca8: rb_first+15      0000000000000010 
    ffffb2400c7fbcb8: 0000000000010202 ffffb2400c7fbcd0 
    ffffb2400c7fbcc8: 0000000000000018 simple_xattrs_free+126 
 #7 [ffffb2400c7fbcd0] simple_xattrs_free+0x7e at ffffffff92f1906e
    ffffb2400c7fbcd8: [ffff93e965472200:kernfs_node_cache] [ffff93e8de22d300:kernfs_node_cache] 
    ffffb2400c7fbce8: [ffff93e8c01aa910:kmalloc-256] [ffff93e8c01aa900:kmalloc-256] 
    ffffb2400c7fbcf8: kernfs_put+96    
 #8 [ffffb2400c7fbcf8] kernfs_put+0x60 at ffffffff92fb9c80
    ffffb2400c7fbd00: [ffff93e91bc85440:inode_cache] [ffff93e91bc85558:inode_cache] 
    ffffb2400c7fbd10: [ffff93e91bc85548:inode_cache] kernfs_sops      
    ffffb2400c7fbd20: evict+216        
 #9 [ffffb2400c7fbd20] evict+0xd8 at ffffffff92eff008
    ffffb2400c7fbd28: [ffff93e901c97800:dentry] [ffff93e901c94a80:dentry] 
    ffffb2400c7fbd38: [ffff93e901c97858:dentry] [ffff93e901c97858:dentry] 
    ffffb2400c7fbd48: __dentry_kill+234 
#10 [ffffb2400c7fbd48] __dentry_kill+0xea at ffffffff92ef83da
    ffffb2400c7fbd50: [ffff93e901c94a80:dentry] ffffb2400c7fbdd0 
    ffffb2400c7fbd60: [ffff93e901c97800:dentry] shrink_dentry_list+142 
#11 [ffffb2400c7fbd68] shrink_dentry_list+0x8e at ffffffff92efa9ce
    ffffb2400c7fbd70: 0000000000002466 [ffff93e901c94a80:dentry] 
    ffffb2400c7fbd80: [ffff93e901c94480:dentry] [ffff93e901c94b20:dentry] 
    ffffb2400c7fbd90: [ffff93e901c94b20:dentry] shrink_dcache_parent+658 
#12 [ffffb2400c7fbd98] shrink_dcache_parent+0x292 at ffffffff92efb022
    ffffb2400c7fbda0: [ffff93e901c94a80:dentry] [ffff93e901c94b20:dentry] 
    ffffb2400c7fbdb0: [ffff93e901c94ad8:dentry] 0000000000000000 
    ffffb2400c7fbdc0: [ffff93e901c94a80:dentry] 000000000000000f 
    ffffb2400c7fbdd0: [ffff93e901c94500:dentry] [ffff93e901c97f40:dentry] 
    ffffb2400c7fbde0: a622253d73297b00 [ffff93e901c94a80:dentry] 
    ffffb2400c7fbdf0: 0000000000000000 [ffff93e91bf19440:inode_cache] 
    ffffb2400c7fbe00: [ffff93e901c94ad8:dentry] [ffff93e901c94a80:dentry] 
    ffffb2400c7fbe10: 0000000000000000 vfs_rmdir+240    
#13 [ffffb2400c7fbe18] vfs_rmdir+0xf0 at ffffffff92ee17f0
    ffffb2400c7fbe20: [ffff93e8c0cc5000:names_cache] 0000000000000000 
    ffffb2400c7fbe30: 00000000ffffff9c 0000000000000002 
    ffffb2400c7fbe40: do_rmdir+504     
#14 [ffffb2400c7fbe40] do_rmdir+0x1f8 at ffffffff92eea778
    ffffb2400c7fbe48: 00000000c0cc5000 [ffff93e8c17e8b60:mnt_cache] 
    ffffb2400c7fbe58: [ffff93e92a3326c0:dentry] 00000017a30c23ea 
    ffffb2400c7fbe68: [ffff93e8c0cc506a:names_cache] a622253d73297b00 
    ffffb2400c7fbe78: 00006405f632e830 ffffb2400c7fbf48 
    ffffb2400c7fbe88: 0000000000000000 0000000000000000 
    ffffb2400c7fbe98: 0000000000000000 __x64_sys_rmdir+66 
#15 [ffffb2400c7fbea0] __x64_sys_rmdir+0x42 at ffffffff92eea862
    ffffb2400c7fbea8: ffffb2400c7fbf58 do_syscall_64+100 
#16 [ffffb2400c7fbeb0] do_syscall_64+0x64 at ffffffff939dc464
    ffffb2400c7fbeb8: 000002a800000028 00000000ffffffea 
    ffffb2400c7fbec8: a622253d73297b00 ffffb2400c7fbf58 
    ffffb2400c7fbed8: ffffb2400c7fbf18 0000000000000000 
    ffffb2400c7fbee8: ffffb2400c7fbf58 ffffb2400c7fbf48 
    ffffb2400c7fbef8: srso_alias_return_thunk+5 syscall_exit_to_user_mode+43 
    ffffb2400c7fbf08: srso_alias_return_thunk+5 do_syscall_64+112 
    ffffb2400c7fbf18: srso_alias_return_thunk+5 syscall_exit_to_user_mode+43 
    ffffb2400c7fbf28: srso_alias_return_thunk+5 do_syscall_64+112 
    ffffb2400c7fbf38: 0000000000000000 0000000000000000 
    ffffb2400c7fbf48: 0000000000000000 entry_SYSCALL_64_after_hwframe+110 
#17 [ffffb2400c7fbf50] entry_SYSCALL_64_after_hwframe+0x6e at ffffffff93c0012a
    RIP: 00007f41e211977b  RSP: 00007ffc46082138  RFLAGS: 00000246
    RAX: ffffffffffffffda  RBX: 00006405f644b330  RCX: 00007f41e211977b
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: 00006405f632e830
    RBP: 00007ffc46082190   R8: 00007f41e1caa90a   R9: 0000000000000007
    R10: 00006405f6325d80  R11: 0000000000000246  R12: 0000000000000001
    R13: 0000000000000000  R14: 00006405f632e830  R15: 0000000000000000
    ORIG_RAX: 0000000000000054  CS: 0033  SS: 002b

crash> struct dentry ffff93e901c97800
struct dentry {
  d_flags = 32772,
  d_seq = {
    seqcount = {
      sequence = 6
    }
  },
  d_hash = {
    next = 0x0,
    pprev = 0x0
  },
  d_parent = 0xffff93e901c94a80,
  d_name = {
    {
      {
        hash = 941515971,
        len = 11
      },
      hash_len = 48186156227
    },
    name = 0xffff93e901c97838 "memory.high"
  },
  d_inode = 0x0,
  d_iname = "memory.high\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
  d_lockref = {
    {
      lock_count = 18446743523953737728,
      {
        lock = {
          {
            rlock = {
              raw_lock = {
                {
                  val = {
                    counter = 0
                  },
                  {
                    locked = 0 '\000',
                    pending = 0 '\000'
                  },
                  {
                    locked_pending = 0,
                    tail = 0
                  }
                }
              }
            }
          }
        },
        count = -128
      }
    }
  },
  d_op = 0xffffffff93e90980 <kernfs_dops>,
  d_sb = 0xffff93e8c72cb000,
  d_time = 41,
  d_fsdata = 0x0,
  {
    d_lru = {
      next = 0xffff93e901c97880,
      prev = 0xffff93e901c97880
    },
    d_wait = 0xffff93e901c97880
  },
  d_child = {
    next = 0xffff93e901c97f50,
    prev = 0xffff93e901c94b20
  },
  d_subdirs = {
    next = 0xffff93e901c978a0,
    prev = 0xffff93e901c978a0
  },
  d_u = {
    d_alias = {
      next = 0x0,
      pprev = 0x0
    },
    d_in_lookup_hash = {
      next = 0x0,
      pprev = 0x0
    },
    d_rcu = {
      next = 0x0,
      func = 0x0
    }
  }
}
      KERNEL: vmlinux  [TAINTED]                
    DUMPFILE: crashdump-2024-03-06-13-02-28  [PARTIAL DUMP]
        CPUS: 16
        DATE: Wed Mar  6 13:00:52 CST 2024
      UPTIME: 00:00:09
LOAD AVERAGE: 2.96, 0.61, 0.20
       TASKS: 464
    NODENAME: archhometx
     RELEASE: 6.7.8-zen1-1-zen
     VERSION: #1 ZEN SMP PREEMPT_DYNAMIC Mon, 04 Mar 2024 15:22:56 +0000
     MACHINE: x86_64  (3193 Mhz)
      MEMORY: 31.2 GB
       PANIC: "Oops: 0000 [#1] PREEMPT SMP NOPTI" (check log for details)
         PID: 1
     COMMAND: "systemd"
        TASK: ffff9834002e58c0  [THREAD_INFO: ffff9834002e58c0]
         CPU: 5
       STATE: TASK_RUNNING (PANIC)
crash> bt -FFsx
PID: 1        TASK: ffff9834002e58c0  CPU: 5    COMMAND: "systemd"
 #0 [ffffb330c0097908] machine_kexec+0x1d0 at ffffffff948907a0
    ffffb330c0097910: 00007617dfb3e880 0000000000000000 
    ffffb330c0097920: 0000000045005000 ffff983345005000 
    ffffb330c0097930: 0000000045004000 0000000000000000 
    ffffb330c0097940: 70b7f6ff9e35e000 ffffb330c0097968 
    ffffb330c0097950: ffffb330c0097af8 0000000000000046 
    ffffb330c0097960: __crash_kexec+123 
 #1 [ffffb330c0097960] __crash_kexec+0x7b at ffffffff949f712b
    ffffb330c0097968: [ffff98340477a600:filp] 0000000000000000 
    ffffb330c0097978: [ffff98340155b220:Acpi-State] 0000000000000000 
    ffffb330c0097988: .LC4+107         000000000000015b 
    ffffb330c0097998: 0000000000000000 .LC4+107         
    ffffb330c00979a8: 0000000000000000 0000000000000000 
    ffffb330c00979b8: 0000000000000000 0000000000000000 
    ffffb330c00979c8: 0000000000000000 .LC4+107         
    ffffb330c00979d8: [ffff98340155b220:Acpi-State] ffffffffffffffff 
    ffffb330c00979e8: simple_xattr_get+49 0000000000000010 
    ffffb330c00979f8: 0000000000010202 ffffb330c0097ba8 
    ffffb330c0097a08: 0000000000000018 70b7f6ff9e35e000 
    ffffb330c0097a18: 0000000000000009 crash_kexec+44   
 #2 [ffffb330c0097a20] crash_kexec+0x2c at ffffffff949f9cac
    ffffb330c0097a28: oops_end+212     
 #3 [ffffb330c0097a28] oops_end+0xd4 at ffffffff9483ecd4
    ffffb330c0097a30: 0000000000000009 ffffb330c0097af8 
    ffffb330c0097a40: 0000000000000173 page_fault_oops+405 
 #4 [ffffb330c0097a48] page_fault_oops+0x195 at ffffffff948a4e25
    ffffb330c0097a50: 0000000000003fe0 0000000000000000 
    ffffb330c0097a60: 0000000000000001 0000000000000000 
    ffffb330c0097a70: 0000000000000000 0000000000000000 
    ffffb330c0097a80: 0000000000000000 0000000000000000 
    ffffb330c0097a90: 70b7f6ff9e35e000 ffffb330c0097af8 
    ffffb330c0097aa0: 0000000000000173 0000000000000000 
    ffffb330c0097ab0: 0000000000000000 0000000000000000 
    ffffb330c0097ac0: 0000000000000000 exc_page_fault+127 
 #5 [ffffb330c0097ac8] exc_page_fault+0x7f at ffffffff957e352f
    ffffb330c0097ad0: 0000000000000000 0000000000000000 
    ffffb330c0097ae0: 0000000000000000 0000000000000000 
    ffffb330c0097af0: asm_exc_page_fault+38 
 #6 [ffffb330c0097af0] asm_exc_page_fault+0x26 at ffffffff95a012a6
    [exception RIP: simple_xattr_get+49]
    RIP: ffffffff94d18b91  RSP: ffffb330c0097ba8  RFLAGS: 00010202
    RAX: 0000000000000000  RBX: 000000000000015b  RCX: 0000000000000000
    RDX: 0000000000000000  RSI: ffffffff961ba97a  RDI: ffff98340155b220
    RBP: ffffffff961ba97a   R8: 0000000000000000   R9: 0000000000000000
    R10: ffffffff961ba97a  R11: 0000000000000000  R12: 0000000000000000
    R13: ffff98340155b220  R14: 0000000000000000  R15: ffff98340477a600
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
    ffffb330c0097af8: [ffff98340477a600:filp] 0000000000000000 
    ffffb330c0097b08: [ffff98340155b220:Acpi-State] 0000000000000000 
    ffffb330c0097b18: .LC4+107         000000000000015b 
    ffffb330c0097b28: 0000000000000000 .LC4+107         
    ffffb330c0097b38: 0000000000000000 0000000000000000 
    ffffb330c0097b48: 0000000000000000 0000000000000000 
    ffffb330c0097b58: 0000000000000000 .LC4+107         
    ffffb330c0097b68: [ffff98340155b220:Acpi-State] ffffffffffffffff 
    ffffb330c0097b78: simple_xattr_get+49 0000000000000010 
    ffffb330c0097b88: 0000000000010202 ffffb330c0097ba8 
    ffffb330c0097b98: 0000000000000018 simple_xattr_get+41 
    ffffb330c0097ba8: [ffff983404d5ab08:inode_cache] [ffff983402651c80:dentry] 
    ffffb330c0097bb8: 0000000000000000 0000000000000000 
    ffffb330c0097bc8: .LC4+107         __vfs_getxattr+130 
 #7 [ffffb330c0097bd0] __vfs_getxattr+0x82 at ffffffff94d15702
    ffffb330c0097bd8: .LC4+116         capability_hooks+280 
    ffffb330c0097be8: [ffff983402651c80:dentry] nop_mnt_idmap    
    ffffb330c0097bf8: ffffb330c0097c48 ffffb330c0097d70 
    ffffb330c0097c08: cap_inode_need_killpriv+30 
 #8 [ffffb330c0097c08] cap_inode_need_killpriv+0x1e at ffffffff94dfdbbe
    ffffb330c0097c10: security_inode_need_killpriv+48 
 #9 [ffffb330c0097c10] security_inode_need_killpriv+0x30 at ffffffff94e02f70
    ffffb330c0097c18: 0000000000000000 [ffff983402651c80:dentry] 
    ffffb330c0097c28: dentry_needs_remove_privs+50 
#10 [ffffb330c0097c28] dentry_needs_remove_privs+0x32 at ffffffff94d02292
    ffffb330c0097c30: nop_mnt_idmap    [ffff983402651c80:dentry] 
    ffffb330c0097c40: do_truncate+112  
#11 [ffffb330c0097c40] do_truncate+0x70 at ffffffff94cc3fd0
    ffffb330c0097c48: 000000000000a068 0000000000000000 
    ffffb330c0097c58: 0000000000000000 0000000000000000 
    ffffb330c0097c68: 0000000000000000 0000000000000000 
    ffffb330c0097c78: 0000000000000000 0000000000000000 
    ffffb330c0097c88: 0000000000000000 [ffff98340477a600:filp] 
    ffffb330c0097c98: 70b7f6ff9e35e000 nop_mnt_idmap    
    ffffb330c0097ca8: [ffff983404d5ab08:inode_cache] 0000000000008241 
    ffffb330c0097cb8: 0000000000000000 path_openat+4140 
#12 [ffffb330c0097cc0] path_openat+0x102c at ffffffff94ce74dc
    ffffb330c0097cc8: ffff983400000002 ffffffff00000040 
    ffffb330c0097cd8: __entry_text_end+1056566 ffff983400008241 
    ffffb330c0097ce8: [ffff983402651800:dentry] [ffff9834002e58c0:task_struct] 
    ffffb330c0097cf8: [ffff983404d5b018:inode_cache] 0000004181b67d70 
    ffffb330c0097d08: [ffff983402651800:dentry] ffffb330c0097d10 
    ffffb330c0097d18: 0000000000000000 ffffb330c0097d20 
    ffffb330c0097d28: ffffb330c0097d20 70b7f6ff9e35e000 
    ffffb330c0097d38: 0000000000000021 ffffb330c0097e80 
    ffffb330c0097d48: ffffb330c0097d70 0000000000000001 
    ffffb330c0097d58: ffffb330c0097e9c 0000000000000000 
    ffffb330c0097d68: do_filp_open+179 
#13 [ffffb330c0097d68] do_filp_open+0xb3 at ffffffff94ce9e83
    ffffb330c0097d70: [ffff983404220a20:mnt_cache] [ffff983402651c80:dentry] 
    ffffb330c0097d80: 000000164a71cdd5 [ffff98340843104d:names_cache] 
    ffffb330c0097d90: [ffff983414fb2520:mnt_cache] [ffff98340785ac00:dentry] 
    ffffb330c0097da0: [ffff983404d5ab08:inode_cache] 0000000200000301 
    ffffb330c0097db0: 0000000000000000 00001bd400002096 
    ffffb330c0097dc0: 0000000000000000 0000000000000000 
    ffffb330c0097dd0: ffffb330c0097dd8 0000000000000000 
    ffffb330c0097de0: 0000000000000000 0000000000000000 
    ffffb330c0097df0: 0000000000000000 0000000000000000 
    ffffb330c0097e00: 0000000000000000 0000000000000000 
    ffffb330c0097e10: 0000000000000000 0000000000000000 
    ffffb330c0097e20: 0000000000000000 0000000000000000 
    ffffb330c0097e30: 0000000000000000 [ffff983408431000:names_cache] 
    ffffb330c0097e40: 0000000000000000 ffffff9c00000002 
    ffffb330c0097e50: 000041ed00000000 70b7f6ff9e35e000 
    ffffb330c0097e60: 0000000000000021 [ffff983408431000:names_cache] 
    ffffb330c0097e70: 0000000000000000 0000000000000000 
    ffffb330c0097e80: 00000000ffffff9c __x64_sys_openat+469 
#14 [ffffb330c0097e88] __x64_sys_openat+0x1d5 at ffffffff94cc1955
    ffffb330c0097e90: 0000000000000000 0000824195b04110 
    ffffb330c0097ea0: 00000002000081b6 0000000100000300 
    ffffb330c0097eb0: 70b7f6ff9e35e000 ffffb330c0097f58 
    ffffb330c0097ec0: ffffb330c0097f48 0000000000000000 
    ffffb330c0097ed0: do_syscall_64+100 
#15 [ffffb330c0097ed0] do_syscall_64+0x64 at ffffffff957dc464
    ffffb330c0097ed8: 0000000000000000 0000000000000000 
    ffffb330c0097ee8: srso_alias_return_thunk+5 __x64_sys_name_to_handle_at+188 
    ffffb330c0097ef8: [ffff983404220a20:mnt_cache] [ffff983402651800:dentry] 
    ffffb330c0097f08: ffffb330c0097f58 ffffb330c0097f48 
    ffffb330c0097f18: srso_alias_return_thunk+5 syscall_exit_to_user_mode+43 
    ffffb330c0097f28: srso_alias_return_thunk+5 do_syscall_64+112 
    ffffb330c0097f38: 0000000000000000 0000000000000000 
    ffffb330c0097f48: 0000000000000000 entry_SYSCALL_64_after_hwframe+110 
#16 [ffffb330c0097f50] entry_SYSCALL_64_after_hwframe+0x6e at ffffffff95a0012a
    RIP: 00007617dff18d42  RSP: 00007ffcba84f720  RFLAGS: 00000202
    RAX: ffffffffffffffda  RBX: 0000000000080241  RCX: 00007617dff18d42
    RDX: 0000000000080241  RSI: 00005ac64727d7b0  RDI: 00000000ffffff9c
    RBP: 00005ac64727d7b0   R8: 0000000000000004   R9: 0000000000000001
    R10: 00000000000001b6  R11: 0000000000000202  R12: 00007617e0323ec1
    R13: 00007617e0323ec1  R14: 0000000000000001  R15: 0000000000000000
    ORIG_RAX: 0000000000000101  CS: 0033  SS: 002b
crash> struct dentry ffff983402651c80
struct dentry {
  d_flags = 4718596,
  d_seq = {
    seqcount = {
      sequence = 2
    }
  },
  d_hash = {
    next = 0x0,
    pprev = 0xffff983b3cf4e398
  },
  d_parent = 0xffff983402651800,
  d_name = {
    {
      {
        hash = 1248972245,
        len = 22
      },
      hash_len = 95738252757
    },
    name = 0xffff983402651cb8 "cgroup.subtree_control"
  },
  d_inode = 0xffff983404d5ab08,
  d_iname = "cgroup.subtree_control\000\000\000\000\000\000\000\000\000",
  d_lockref = {
    {
      lock_count = 8589934592,
      {
        lock = {
          {
            rlock = {
              raw_lock = {
                {
                  val = {
                    counter = 0
                  },
                  {
                    locked = 0 '\000',
                    pending = 0 '\000'
                  },
                  {
                    locked_pending = 0,
                    tail = 0
                  }
                }
              }
            }
          }
        },
        count = 2
      }
    }
  },
  d_op = 0xffffffff95c90980 <kernfs_dops>,
  d_sb = 0xffff9834091ad000,
  d_time = 41,
  d_fsdata = 0x0,
  {
    d_lru = {
      next = 0xffff983402650b00,
      prev = 0xffff983402650980
    },
    d_wait = 0xffff983402650b00
  },
  d_child = {
    next = 0xffff983402650990,
    prev = 0xffff983402650b10
  },
  d_subdirs = {
    next = 0xffff983402651d20,
    prev = 0xffff983402651d20
  },
  d_u = {
    d_alias = {
      next = 0x0,
      pprev = 0xffff983404d5ac40
    },
    d_in_lookup_hash = {
      next = 0x0,
      pprev = 0xffff983404d5ac40
    },
    d_rcu = {
      next = 0x0,
      func = 0xffff983404d5ac40
    }
  }
}

Offline

#19 2024-03-06 21:19:30

~tfa
Member
From: Germany
Registered: 2021-11-03
Posts: 14
Website

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

seth wrote:

With the recent hiccups in the nvidia blob, you coul also try to revert to the 535xx drivers (dkms + utils) from the https://wiki.archlinux.org/title/Arch_Linux_Archive hmm

This is the way to go, yet on 550 I have the best cold suspend behaviour, I ever had with nvidia. Can connect to an external monitor at work and it supends, yet still can run apps on the dGPU.

I will try another thing, since it can be related to control groups and there is always that systemd error at the end, I will disable nvidia-powerd and if that doesnt help, disable nvidia-persitenced as well. The former one enables boosting of the GPU and the latter one keeps vram between context switches as far as I know.

Last edited by ~tfa (2024-03-06 21:34:57)

Offline

#20 2024-03-07 11:32:53

mesaprotector
Member
Registered: 2024-03-03
Posts: 163

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

https://nvidia.custhelp.com/app/answers … ruary-2024

This sure looks like the update that caused problems. Notice the mentions of bugs connected to NULL pointer dereferences in Linux. Looks like Nvidia tried to fix some vulnerabilities and broke a bunch of things instead.

(I did switch to nvidia-open, but if that doesn't fix it I guess I know I can just go back to the 545 version and stay there forever lol. Thankfully I don't game on Linux so up-to-date drivers aren't a must.)

Offline

#21 2024-03-07 12:17:03

~tfa
Member
From: Germany
Registered: 2021-11-03
Posts: 14
Website

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I too reverted back to 545 and the issue is gone. I did not test my daemon idea, cause after the last update of 550, Unity in a Vulkan context kept crashing upon load. Same for the vulkan-beta drivers.
Guess this thread is solved then, at least the riddle. smile

Offline

#22 2024-03-12 22:40:56

thesword
Member
Registered: 2024-03-11
Posts: 2

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

I have the same issue. I firstly think the crash was triggered by file system corruption caused by faulty RAM stick on my laptop.
I have two PC with 550 Nvidia drivers and only the laptop is affected by the issue.

Offline

#23 2024-03-12 23:30:58

mesaprotector
Member
Registered: 2024-03-03
Posts: 163

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

thesword wrote:

I have the same issue. I firstly think the crash was triggered by file system corruption caused by faulty RAM stick on my laptop.
I have two PC with 550 Nvidia drivers and only the laptop is affected by the issue.

It's a problem with memory allocation on laptops with specifically the 550 branch of Nvidia drivers. The only current fix is rolling back to 545 or earlier (which, at least for me, required using nvidia-dkms). I did try nvidia-open for a bit but it had a different unrelated bug I didn't want to deal with.

I could mark this thread as solved - is that the protocol here? It won't truly be solved until Nvidia fixes the bug in a future driver update.

Offline

#24 2024-03-13 08:12:10

seth
Member
Registered: 2012-09-03
Posts: 59,043

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Basically yes, you found an answer and even if that answer is a "downgrade and wait for nvidia to fix their shit" workaround, there's nothing left to do in this context and others researching the same problem will benefit (so it helps them if the thread shows up as [SOLVED] in google)

Online

#25 2024-04-05 11:58:30

seth
Member
Registered: 2012-09-03
Posts: 59,043

Re: [SOLVED] Repeated kernel problems/freezes since 6.7.6 (Nvidia 550)

Update, for anyone affected by this:
If you feel adventurous to try the latest nvidia driver again, the problem is increasingly likely the nvidia_uvm module.
This is mostly relevant for cuda and to use the GPU in containers, but also for https://wiki.archlinux.org/title/NVIDIA … with_NVENC and eg. utilized by gimp for HW acceleration.

If you critically rely on that, this isn't a viable solution but it would still be awesome if you can confirm the condition.
Afaict the module also comes w/ a ton of paramters, so if you've it around, posting

modinfo nvidia_uvm

might shed further light.

https://wiki.archlinux.org/title/Kernel … and_line_2

module_blacklist=nvidia_uvm

And if anyone here happens to post at https://forums.developer.nvidia.com/t/s … /284772/27 you might want to forward this.
Particularily if you could confirm it.

Online

Board footer

Powered by FluxBB