You are not logged in.

#1 2024-03-18 19:12:08

fistrosan
Member
Registered: 2020-04-01
Posts: 171

random freezes

Hi all,

In the last couple of days I have had two random freezes that force me to perform and unclean reboot of my system. After going though journal of the last boot where the system froze it seems the nvidia driver is the offeder, as right before the freeze happens I get:

Mar 18 19:43:03 p15v kernel: BUG: unable to handle page fault for address: ffff927b81871fe8
Mar 18 19:43:03 p15v kernel: #PF: supervisor write access in kernel mode
Mar 18 19:43:03 p15v kernel: #PF: error_code(0x0003) - permissions violation
Mar 18 19:43:03 p15v kernel: PGD 40fa01067 P4D 40fa01067 PUD 14191d063 PMD 13f2da063 PTE 8000000141871021
Mar 18 19:43:03 p15v kernel: Oops: 0003 [#1] PREEMPT SMP NOPTI
Mar 18 19:43:03 p15v kernel: CPU: 0 PID: 24958 Comm: kworker/0:2 Tainted: P           OE      6.6.21-1-lts #1 0c0a74bb77159d2e130f727f514cce3b101bcba5
Mar 18 19:43:03 p15v kernel: Hardware name: LENOVO 21D8000PGE/21D8000PGE, BIOS N3EET19W (1.05 ) 03/31/2022
Mar 18 19:43:03 p15v kernel: Workqueue: kacpi_notify acpi_os_execute_deferred
Mar 18 19:43:03 p15v kernel: RIP: 0010:_nv044009rm+0x10/0x30 [nvidia]

I am now running on the latest kernel instead of the lts kernel to see if the problem persists. Meanwhile, any ideas about what is going on ? Here is the complete journal:

http://0x0.st/XrAT.txt

Offline

#2 2024-03-18 21:03:34

jl2
Member
From: 47° 18' N 8° 34' E
Registered: 2022-06-01
Posts: 275
Website

Re: random freezes

Is this the same Issue? https://bbs.archlinux.org/viewtopic.php?id=293451

Last edited by jl2 (2024-03-18 21:03:46)


Why I run Arch? To "BTW I run Arch" the guy one grade younger.
And to let my siblings and cousins laugh at Arsch Linux...

Offline

#3 2024-03-19 17:57:29

fistrosan
Member
Registered: 2020-04-01
Posts: 171

Re: random freezes

The symptoms are similar, but I don't really think it is the same issue because I am not using ndidia dmks but rather the common nvidia and nvidia-lts drivers. I could give it a try to the zswap kernel parameter though.

Offline

#4 2024-03-19 19:37:19

fistrosan
Member
Registered: 2020-04-01
Posts: 171

Re: random freezes

Adding zswap.enabled=0 to the kernel parameters did not prevent the system from freezing. It happened again just a few minutes ago. Here is the journal from that boot. Seems to me it is exactly the same problem as before.

https://0x0.st/Xrj0.txt

Offline

#5 2024-03-19 19:55:53

seth
Member
Registered: 2012-09-03
Posts: 51,553

Re: random freezes

The problem in the other thread is nvidia 550xx, the backtraces seem to change depending on zram/zswap presence but you're close enough and actually (I saw this for th first time)

Mar 19 20:25:54 p15v kernel: RIP: 0010:_nv044009rm+0x10/0x30 [nvidia]
Mar 19 20:25:54 p15v kernel: Code: 00 00 00 00 00 0f 1f 44 00 00 66 0f 1f 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 66 0f 1f 00 48 83 ec 08 48 83 ed 10 48 8d 7d 08 <48> c7 45 08 00 00 00 00 e8 b3 4d 6f ff 48 8b 45 08 48 83 c4 08 48
Mar 19 20:25:54 p15v kernel: RSP: 0018:ffffb3c9855c3d18 EFLAGS: 00010282
Mar 19 20:25:54 p15v kernel: RAX: 0000000000000000 RBX: ffffb3c980dcf8e8 RCX: ffff8c543f233b68
Mar 19 20:25:54 p15v kernel: RDX: ffff8c4d06139a08 RSI: 00000000000000c0 RDI: ffff8c4d0a394fe8
Mar 19 20:25:54 p15v kernel: RBP: ffff8c4d0a394fe0 R08: 6e6d5e686f62606a R09: ffff8c4d42a49c40
Mar 19 20:25:54 p15v kernel: R10: 000000000000000d R11: fefefefefefefeff R12: 0000000000000004
Mar 19 20:25:54 p15v kernel: R13: 0000000000000000 R14: ffffb3c980d91008 R15: ffff8c4d047d8008
Mar 19 20:25:54 p15v kernel: FS:  0000000000000000(0000) GS:ffff8c543f200000(0000) knlGS:0000000000000000
Mar 19 20:25:54 p15v kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 19 20:25:54 p15v kernel: CR2: ffff8c4d0a394fe8 CR3: 0000000425e20000 CR4: 0000000000f50ef0
Mar 19 20:25:54 p15v kernel: PKRU: 55555554
Mar 19 20:25:54 p15v kernel: Call Trace:
Mar 19 20:25:54 p15v kernel:  <TASK>
Mar 19 20:25:54 p15v kernel:  ? __die+0x23/0x70
Mar 19 20:25:54 p15v kernel:  ? page_fault_oops+0x171/0x4e0
Mar 19 20:25:54 p15v kernel:  ? exc_page_fault+0x175/0x180
Mar 19 20:25:54 p15v kernel:  ? asm_exc_page_fault+0x26/0x30
Mar 19 20:25:54 p15v kernel:  ? _nv044009rm+0x10/0x30 [nvidia 9d21cae964dcd2576bf1d7a6ddd75f027c67c580]
Mar 19 20:25:54 p15v kernel:  _nv014559rm+0x4d/0x90 [nvidia 9d21cae964dcd2576bf1d7a6ddd75f027c67c580]
Mar 19 20:25:54 p15v kernel:  _nv049696rm+0x18/0x60 [nvidia 9d21cae964dcd2576bf1d7a6ddd75f027c67c580]
Mar 19 20:25:54 p15v kernel:  _nv026805rm+0x61/0x90 [nvidia 9d21cae964dcd2576bf1d7a6ddd75f027c67c580]
Mar 19 20:25:54 p15v kernel:  rm_acpi_nvpcf_notify+0x1c/0xe0 [nvidia 9d21cae964dcd2576bf1d7a6ddd75f027c67c580]
Mar 19 20:25:54 p15v kernel:  ? __slab_free+0xf1/0x380
Mar 19 20:25:54 p15v kernel:  acpi_ev_notify_dispatch+0x4b/0x70
Mar 19 20:25:54 p15v kernel:  acpi_os_execute_deferred+0x17/0x30
Mar 19 20:25:54 p15v kernel:  process_one_work+0x178/0x350
Mar 19 20:25:54 p15v kernel:  worker_thread+0x30f/0x450
Mar 19 20:25:54 p15v kernel:  ? __pfx_worker_thread+0x10/0x10
Mar 19 20:25:54 p15v kernel:  kthread+0xe5/0x120
Mar 19 20:25:54 p15v kernel:  ? __pfx_kthread+0x10/0x10
Mar 19 20:25:54 p15v kernel:  ret_from_fork+0x31/0x50
Mar 19 20:25:54 p15v kernel:  ? __pfx_kthread+0x10/0x10
Mar 19 20:25:54 p15v kernel:  ret_from_fork_asm+0x1b/0x30
Mar 19 20:25:54 p15v kernel:  </TASK>
Mar 19 20:25:54 p15v kernel: Modules linked in: snd_seq_dummy snd_seq snd_seq_device ccm 8021q garp mrp stp llc iptable_filter snd_ctl_led snd_soc_skl_hda_dsp snd_soc_intel_hda_dsp_common snd_soc_hdac_hdmi snd_sof_probes snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic joydev tun ip6table_mangle xt_MASQUERADE xt_mark iptable_mangle ip6table_nat ip6_tables iptable_nat nf_nat uvcvideo nf_conntrack btusb videobuf2_vmalloc uvc nf_defrag_ipv6 btrtl nf_defrag_ipv4 btintel videobuf2_memops libcrc32c videobuf2_v4l2 btbcm btmtk mousedev xt_tcpudp videodev bluetooth snd_soc_dmic videobuf2_common mc ecdh_generic mei_hdcp mei_wdt mei_pxp iTCO_wdt intel_pmc_bxt iTCO_vendor_support intel_rapl_msr pmt_telemetry pmt_class intel_uncore_frequency intel_uncore_frequency_common snd_sof_pci_intel_tgl intel_tcc_cooling snd_sof_intel_hda_common x86_pkg_temp_thermal intel_powerclamp soundwire_intel coretemp snd_sof_intel_hda_mlink soundwire_cadence kvm_intel snd_sof_intel_hda snd_sof_pci snd_sof_xtensa_dsp kvm snd_sof snd_sof_utils
Mar 19 20:25:54 p15v kernel:  snd_soc_hdac_hda irqbypass snd_hda_ext_core snd_soc_acpi_intel_match rapl snd_soc_acpi intel_cstate soundwire_generic_allocation soundwire_bus intel_uncore snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine think_lmi psmouse firmware_attributes_class wmi_bmof snd_hda_intel pcspkr snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec snd_hda_core spi_nor snd_hwdep mtd thinkpad_acpi snd_pcm mei_me i2c_i801 ledtrig_audio e1000e snd_timer platform_profile vfat i2c_smbus mei snd soundcore fat int3403_thermal iwlmvm i915 mac80211 libarc4 iwlwifi drm_buddy i2c_algo_bit ttm processor_thermal_device_pci processor_thermal_device cfg80211 drm_display_helper processor_thermal_rfim ucsi_acpi cec processor_thermal_mbox intel_hid typec_ucsi nvidia_drm(POE) processor_thermal_rapl typec intel_rapl_common int3400_thermal nvidia_modeset(POE) thunderbolt rfkill acpi_thermal_rel intel_vsec int340x_thermal_zone roles sparse_keymap acpi_pad acpi_tad mac_hid intel_gtt igen6_edac nvidia_uvm(POE) nvidia(POE) fuse crypto_user loop
Mar 19 20:25:54 p15v kernel:  nfnetlink bpf_preload ip_tables x_tables ext4 crc32c_generic crc16 mbcache jbd2 usbhid dm_crypt cbc encrypted_keys trusted asn1_encoder tee dm_mod crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic gf128mul ghash_clmulni_intel sha512_ssse3 sha256_ssse3 serio_raw sha1_ssse3 atkbd sdhci_pci aesni_intel libps2 cqhci vivaldi_fmap sdhci crypto_simd nvme spi_intel_pci cryptd xhci_pci mmc_core nvme_core spi_intel xhci_pci_renesas nvme_common i8042 video serio wmi
Mar 19 20:25:54 p15v kernel: CR2: ffff8c4d0a394fe8
Mar 19 20:25:54 p15v kernel: ---[ end trace 0000000000000000 ]---
Mar 19 20:25:54 p15v kernel: RIP: 0010:_nv044009rm+0x10/0x30 [nvidia]

Is straight up nvidia.

Mitigation woud be to return to the 545xx or 535xx dkms module from the ALA, but they will not build (unpatched) against 6.8, so you'd have to use the LTS kernel.

Offline

#6 2024-04-05 18:08:24

fistrosan
Member
Registered: 2020-04-01
Posts: 171

Re: random freezes

So, after starting with a clean arch install and avoiding installing nvidia drivers, the problem repeats itself (kernel freezes) with the nouveau drivers (see journal in https://0x0.st/XibG.txt). Now, I don't know whether this boot features the same issue as the original journal I posted here. There are some errors pertaining picom as well. I am starting to think that my nvidia card is fried. As a temporary solution I have blacklisted nouveau in my grub so that I am exclusively using the Alder Lake intel card. If I knew what package installs "nouveau" I would just uninstall it from my system. Does anyone know ? Clearly cannot be xf86-video-nouveau because I don't have that one installed and yet nouveau loads unless I blacklist it.

Offline

#7 2024-04-05 20:07:44

seth
Member
Registered: 2012-09-03
Posts: 51,553

Re: random freezes

https://bbs.archlinux.org/viewtopic.php?id=294349 - the nouveau kernel module is part of the kernel package.

Fwwi, zswap is still enabled (try to disable it) and there's reason to assume the problem in the nvidia package is nvidia_uvm, so "module_blacklist=nvidia_uvm" might help you out w/ that (it's however at this point still based on a sample size of "1"…)

Offline

Board footer

Powered by FluxBB