You are not logged in.

#1 2024-03-06 22:36:54

tom8o
Member
Registered: 2021-01-12
Posts: 7

[SOLVED] Crashes, freezes, kernel panic in Thinkpad T15p Gen 3

The Cause Was NVIDIA 550.x Drivers (As of 19 Mar 2024)

For now, it appears that the properietary NVIDIA drivers are borked, as some driver updates for vulnerability fixes from February 28th (2024) seem to have broken stuff. The temporary fix seems to be avoiding the use of the nvidia driver, and possibly nvidia-lts. A manual downgrade to 545.x or switching to nvidia-open might be of assistance, but I have not tested these yet.

See below for a detailed description, one of the replies belowthread has links to other threads on the subject, here and in the official NVIDIA forums.

Introduction to the issue(s)

Greetings. Over the last several weeks, I have been experiencing a host of severe crashes on my Arch Linux installation on my Lenovo ThinkPad T15p Gen 3. These generally freeze up my laptop to the point where I am forced to hold the power button (cannot access TTYs) to forcibly shut down the laptop. In addition to these events that force a power cycle, I am also experiencing crashes during my use of Firefox, some games (Terraria), and the official Discord client from the repositories. I am experiencing some anomalous behavior as well, including crashes of individual applications, occasional disconnections of my Bluetooth keyboard (a Keychron), and the computer showing a lot of console output on screen on shutdown and then hanging on a blinking cursor or a systemd message (to the tune of "A stop job is running..." ) without a shell prompt instead of shutting down (both via KDE Plasma GUI and terminal commands, including 'reboot'). The system-wide crashes got more frequent over time and now occur near-daily.

One of these seemingly spontaneous system-wide crashes occurred during a system update, forcing me to reinstall all packages by chroot-ing from a rescue ISO and running 'pacman -Qqn | pacman -S - --overwrite=*' as a last resort (as stated at this Arch Wiki page). I had to do this because a dazzling number of important system files were listed as being present on the disk without being attributed to any package (according to a handful I tested via 'pacman -Qo' ). The behavior I am describing in the prior paragraph occurred before and after this, without any change I can discern. If this renders my system ineligible for receiving support here, I apologize.

I did not make any major hardware changes to my laptop setup prior to the onset of these issues. Furthermore, I did not change my kernel (I use the linux kernel) or to the nvidia drivers. Unfortunately I do not know if a specific kernel or driver version caused these problems.

System details

The short version is that I have been running into these issues on KDE Plasma Wayland 5 and 6, on a laptop with an integrated Intel GPU and a discrete NVIDIA GPU. I use the proprietary nvidia driver. I only have Arch installed on this system, into separate / and /home partitions on the single SSD included in the system, both partitions are formatted as ext4. I can provide /etc/fstab if desired.

Neofetch output:

OS: Arch Linux x86_64 
Host: 21DA000PUS ThinkPad T15p Gen 3 
Kernel: 6.7.8-arch1-1 
Uptime: 17 mins 
Packages: 2448 (pacman), 12 (flatpak) 
Shell: zsh 5.9 
Resolution: 1920x1080 
DE: Plasma 6.0.1 
WM: kwin 
Theme: [Plasma], Adwaita-dark [GTK2/3] 
 Icons: [Plasma], breeze-dark [GTK2/3] 
 Terminal: yakuake 
CPU: 12th Gen Intel i7-12800H (20) @ 4.700GHz 
GPU: Intel Alder Lake-P GT2 [Iris Xe Graphics] 
GPU: NVIDIA GeForce RTX 3050 Mobile 
Memory: 6457MiB / 15661MiB 

(It is important to note that this problem was present prior to kernel upgrades and before the release of KDE Plasma 6.)

Pastebins of journalctl

Due to the scattershot nature of the problem and the unpredictability of the system crashes and the system's behavior during said crashes, I am unfortunately unable to provide a comprehensive set of logs corresponding to all situations I have mentioned above. However, I have extracted three different journalctl logs (generally using 'sudo journalctl -b 1' after rebooting from a catastrophic failure) that I hope will be of some use. I have tried to look over them myself but failed to notice a significant diagnostic. I received a warning earlier today, after rebooting from one catastrophic failure, that my root partition was completely full; however, I immediately cleared the pacman cache to make some space. I am unsure if that temporary lack of space in the root file system (ext4) is the cause of all of this (I suspect this has only been present since reinstalling all packages from rescue, at worst). Regrettably, I have not taken note of what kind of crash preceded these specific logs, although I believe the first log is from a kernel panic (indicated by a blinking Caps Lock indicator on my laptop's own keyboard).

First log (2 Mar 2024)
Second log (6 Mar 2024 16:17)
Third log (6 Mar 2024 16:44)

Final thoughts

I suspect, due to the diversity of the ways in which my computer is malfunctioning, that there might be a hardware failure going on. Please let me know which hardware tests I should be running if that is indeed a possibility I should explore. Similarly, please notify me if there are any other logs (dmesg etc.) that might be of use. Some of the error messages in logs from 6 Mar (today) might be related to Vivaldi, which randomly stopped functioning almost completely on my system several weeks ago (takes minutes to load and crashes often), although resolving this is not in the scope of this thread.

Thank you in advance for your time. I will respond to any replies as soon as I am able.

Last edited by tom8o (2024-03-19 05:57:42)

Offline

#2 2024-03-13 20:47:46

jason98893
Member
Registered: 2023-02-22
Posts: 5

Re: [SOLVED] Crashes, freezes, kernel panic in Thinkpad T15p Gen 3

I'd love to hear a response on this from someone more knowledgeable as I'm having what sounds like an identical issue. Thinkpad P1G5. Running 6.7.9, but if I had to speculate, this has been happening since 6.7.6 or so.

- Sporadic "slow freeze". Typically notice it switching windows in i3, where a "ghost" window overlay appears from previous workspace.
- Then regular commands take forever to run if they run at all. Windows begin to slowly freeze until I have to hard reset.
- I've upgraded several times and they work successfully. I haven't installed anything major recently that I can think of. No new hardware changes.

Errors from my journalctl. This is really the only thing I have to go on that I know to check for:

Mar 13 14:59:30 machine1 kernel: BUG: unable to handle page fault for address: ffff900d1a298fe8
Mar 13 14:59:30 machine1 kernel: #PF: supervisor write access in kernel mode
Mar 13 14:59:30 machine1 kernel: #PF: error_code(0x0003) - permissions violation

Everything else seems normal. I'd love to know what additional tests I can run. I ran stress for a while - no issues found.

Offline

#3 2024-03-18 20:05:10

mesaprotector
Member
Registered: 2024-03-03
Posts: 34

Re: [SOLVED] Crashes, freezes, kernel panic in Thinkpad T15p Gen 3

If you're both using the 550 branch of Nvidia drivers, that's very likely the problem: https://bbs.archlinux.org/viewtopic.php … 0#p2155420. You can go back to 545 (which will probably require a kernel downgrade or using the LTS kernel), or just disable the dGPU, either way we all just have to wait for a driver that doesn't bug everything. Some people have said nvidia-open helps. Nvidia is aware of the issue as of today (see this forum post).

Last edited by mesaprotector (2024-03-18 20:11:21)

Offline

#4 2024-03-19 05:52:03

tom8o
Member
Registered: 2021-01-12
Posts: 7

Re: [SOLVED] Crashes, freezes, kernel panic in Thinkpad T15p Gen 3

mesaprotector wrote:

If you're both using the 550 branch of Nvidia drivers, that's very likely the problem: https://bbs.archlinux.org/viewtopic.php … 0#p2155420. You can go back to 545 (which will probably require a kernel downgrade or using the LTS kernel), or just disable the dGPU, either way we all just have to wait for a driver that doesn't bug everything. Some people have said nvidia-open helps. Nvidia is aware of the issue as of today (see this forum post).

Hello, I just saw this, it seems very likely that this is the cause, as I have experienced much of the anomalous behavior and the kernel bugs mentioned in the links you have sent. Thank you, this has been driving me up a wall on a daily basis for weeks.

I will mark the thread as solved, at least for the present.

Since that's likely to kill activity in this thread, let me provide a few additional pieces of information to help others isolate this problem.

I am having issues with a wide variety of tasks, including just running sudo for very quick tasks in terminal (e.g. 'sudo pacman -S nvidia-lts' hangs and cannot be killed with Ctrl-C) in addition to random application, plasmashell and seemingly systemwide crashes, as mentioned in my original post.

You are indeed correct that I was on NVIDIA 550 drivers. I briefly switched to the LTS kernel (linux-lts) and nvidia-lts after making this post, but it was no help; I think the LTS NVIDIA driver (closed) is on 550 even for the LTS kernel in the repositories, now.

25 hours of memtest86+ testing on my 16GB RAM resulted in 36 passes and 0 errors, for future reference, with all 10 available tests enabled as far as I can tell. I had not checked SSD or CPU health yet but I was close to resorting to a full reinstall, so thank you for saving me from that hassle.

Offline

#5 2024-04-24 00:12:47

omarabid
Member
Registered: 2021-04-08
Posts: 2

Re: [SOLVED] Crashes, freezes, kernel panic in Thinkpad T15p Gen 3

Same issue on a Dell XPS.. kernel panicking because of Nvidia.

Offline

Board footer

Powered by FluxBB