You are not logged in.

#1 2022-11-06 05:53:23

anant
Member
Registered: 2022-11-06
Posts: 18

[SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I just got a new Lenovo Legion 5 laptop with an integrated AMD GPU and a dedicated NVIDIA GeForce RTX 3050 Ti Mobile. Ever since I finished installation, the laptop crashes at least once every 4-12 hours. I have checked the dmesg, journalctl, and the Xorg logs but can't see anything that might be causing it.

I am using X with dwm and the proprietary NVIDIA drivers with optimus-manager. I first thought that perhaps it was something to do with the AMD or NVIDIA drivers because I have a very similar setup on an HP Pavillion (with an Intel Core i7) that I've been using for years that works just fine. I've tried using the nvidia, nvidia-lts, and nvidia-dkms packages, but it crashes in each case.

Most of the crashes were happening when I left the laptop running overnight. So I thought perhaps it was because the screen was dying so I set up a cron job with xdotool to simulate mouse movements every hour to keep the screen alive. But that didn't seem to fix the problem. It also crashed once while I was actively working so it seemed like keeping the screen alive might not have been the issue. At the time it crashed, I had two webbrowsers running (with a few but not obscene number of tabs open), and a couple of terminals and pdfs open. The CPU and memory usage (using htop) was as expected - low.

I then thought that perhaps something was overheating so started logging the output from the sensors command from lm_sensors and from nvidia-smi. None of them seem to show anything that might explain why the laptop is crashing.

I can't tell if it just feels like when I'm running my Ubuntu VM (through virtualbox) if it crashes sooner or if that's actually true. I have read through various forum posts both here and for other distros to see if there are any known solutions, but I can't seem to figure it out.

Here is the output of uname -a:

Linux aghq 5.15.77-1-lts #1 SMP Thu, 03 Nov 2022 17:26:01 +0000 x86_64 GNU/Linux

Here are outputs of various commands (links go to pastebin):

For the journalctl logs, I ran

$ journalctl --no-pager -b -1

. The output was too long for pastebin, so I had to split it up into a few different pastes:

In the journalctl output, you'll notice that I'm running a couple of cron jobs to log a bunch of stuff. I'm using this command to log what's happening with the graphics card (at least that's what I understood from reading the wiki):

nvidia-smi --query-gpu=timestamp,name,pci.bus_id,driver_version,pstate,pcie.link.gen.max,pcie.link.gen.current,temperature.gpu,utilization.gpu,utilization.memory,memory.total,memory.free,memory.used --format=csv,noheader,nounits

. Here are the logs from this command.

Please let me know if it would be helpful to add in any more logs / information. I've been reading through the wiki and the forum posts for the past week or so but haven't been able to solve the problem (or even isolate what the issue is). Any help would be really really appreciated!

Last edited by anant (2022-12-01 22:26:22)

Offline

#2 2022-11-06 06:54:12

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Have you tried the current kernel? Is your firmware up to date?


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#3 2022-11-06 15:14:02

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I don't think I tried the regular 'linux' kernel. I have booted into it now. Will be back with an update in a few hours.

Offline

#4 2022-11-07 13:22:42

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Time for an update. The machine has been alive for about 22 hours and 20 minutes at this point! All I did was boot into the linux kernel instead of the linux-lts kernel. Thanks for your help cfr! I'm marking this as solved for now and hopefully don't have to re-open it.

Offline

#5 2022-11-07 23:05:06

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Might have replied a little too soon. It just crashed again after about 30 hours and 15 minutes (first entry in journalctl was timed at Nov 06 10:01:57 and the last entry there was timed at Nov 07 16:16:02). I'm going to try to cycle through the nvidia drivers again (I'm currently running nvidia-dkms but I can try the nvidia package instead). I'll come back with an update in a couple of days.

Offline

#6 2022-11-13 16:32:53

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I was coming back to post an update saying things were fixed but it crashed again. And I'm not at all sure why (perhaps there is information in a log file that I'm overlooking). Here is what I've tried so far:

At the time of my last update, I was using the nvidia-dkms package with the linux kernel (the linux-lts kernel crashes much quicker). That combination crashed less than a day after booting (boot started at Nov 08 14:47:01 and the last entry in journalctl was at Nov 09 01:44:36).

I then switched to the nvidia package with the linux kernel. That combination was the most promising. I booted up on Nov 09 08:10:02 and the last entry in journalctl was timestamped Nov 12 20:39:52 (so it lasted over three days). During this time, I was running virtualbox as well as some fairly calculation intensive code and the machine seemed to be handling things well.

At this time, I'm not sure what exactly is causing it to crash. The dmesg output and journalctl output doesn't seem to be much different than what I posted the first time (happy to post it here if someone wants to look at it). I did disable my cron jobs that were logging the output from nvidia-smi and sensors (it didn't seem like there were any anomalies there). But if anyone has any ideas on where else I could check, I would greatly appreciate the help in trying to figure out what is going wrong.

Offline

#7 2022-11-13 19:00:20

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Maybe try booting some live distro to see if the same thing happens there?
Have you tried the nouveau drivers?

I'd probably run some hardware tests: memtest and smartctl or similar.


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#8 2022-11-15 16:59:50

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

smartctl seems to work fine and I ran memtest for about 25 hours and it passed about 11 times. I haven't tried the nouveau drivers. Perhaps I should try them next?

What would running a live distro check for, just curious?

Also, how likely is it that this is a graphics card thing (either just nvidia or the nvidia-amd combo)? Are there any other theories about what might be wrong?

Offline

#9 2022-11-17 22:09:01

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I tried the nouveau drivers which caused the display to be all garbled - it was just a lot of colors on the external screens (which are connected to the dGPU) and was not usable at all. The integrated screen (which is connected to the iGPU) worked fine, but that makes sense. I thought maybe I missed a step while setting up the nvidia drivers so I tried them again just to be sure but maybe I didn't because it just crashed again.

Offline

#10 2022-11-18 00:15:44

topcat01
Member
Registered: 2019-09-17
Posts: 123

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Can you try disconnecting all the external monitors and completely disabling the nvidia card from your optimus setup (including unloading the module), i.e., run only with intel gpu + internal display? Do you get crashes then?

Offline

#11 2022-11-18 02:31:26

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I have disconnected the external monitors. Just to be sure, I reinstalled arch and have not installed any of the nvidia packages. I also ran `rmmod nouveau`. I'm hoping that was all I had to do to make sure that the nvidia modules are not loaded (from what I can tell, the nvidia drivers are not loaded right now). I left the rest of the install process untouched. I'm not sure if there is anything specific I should be monitoring - if you have suggestions, happy to hear them. In any case, I'll be back in a few days with an update (the longest time my laptop stayed alive was right around 2.5 days so we'll see if it makes it that far this time).

Offline

#12 2022-11-18 14:55:07

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

That was quick - it crashed again and this time with none of the nvidia drivers (as far as I can tell at least). I came back this morning and I had a black screen with a blinking cursor that wouldn't accept any input (this is the exact same as every time it has crashed). I was able to switch over to tty3 (tty2 was just the black screen) but as soon as I entered my username, the screen went black with a blinking cursor. So it's not a graphics card issue?!?

Offline

#13 2022-11-18 20:11:04

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Just had a thought - last night's crash was when I was running several programs doing reasonably heavy computing. And it was one of the quickest crashes. I don't see a correlation between the amount of system resources I'm using and the amount of time it takes to crash if there is one (sometimes it crashes when there is a lot running and sometimes it doesn't crash for a while). I have restarted the computer and am logging the data from `sensors` and `uptime` to see if that will yield any insight. Earlier I was logging this data once every minute, now I'm logging it once every second. Hopefully, if there is some kind of temperature / voltage / usage spike, this will help catch that. Any other ideas? Is there anything else I can test for? Is there something different that needs to be done on an AMD CPU/GPU on arch (I've only ever had Intel)?

Offline

#14 2022-11-18 23:17:27

topcat01
Member
Registered: 2019-09-17
Posts: 123

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I would probably update the firmware and also install the amd-ucode package.

Offline

#15 2022-11-19 00:46:30

cfr
Member
From: Cymru
Registered: 2011-11-27
Posts: 7,130

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Running a live distro would just be an attempt to narrow down the possibilities. If it works fine there, for example, it is less likely to be a hardware problem. (Especially if it is a significantly different distro from Arch.)

Agree about the firmware and ucode.

Do you get crashes if you take X out of it (or Wayland, if you're using that) i.e. boot to multi-user.target & stay there?

Can you ssh into the system when things go down?

Edit: see https://wiki.archlinux.org/title/Ryzen re. microcode and, possibly, the troubleshooting section.

Last edited by cfr (2022-11-19 01:10:16)


CLI Paste | How To Ask Questions

Arch Linux | x86_64 | GPT | EFI boot | refind | stub loader | systemd | LVM2 on LUKS
Lenovo x270 | Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz | Intel Wireless 8265/8275 | US keyboard w/ Euro | 512G NVMe INTEL SSDPEKKF512G7L

Offline

#16 2022-11-19 18:51:16

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

topcat01 wrote:

I would probably update the firmware and also install the amd-ucode package.

I will go check out the amd-ucode package and the firmware again.

Ever since I started logging sensor data, it hasn't crashed (I wish I had continued with my logging). But it's been up for over 27 hours. I'm kind of hoping that the crash conditions will reproduce themselves and that they'll get caught in the sensor logs (since now I'm logging every second as opposed to every minute like I was doing previously).

cfr wrote:

Running a live distro would just be an attempt to narrow down the possibilities. If it works fine there, for example, it is less likely to be a hardware problem. (Especially if it is a significantly different distro from Arch.)

I did consider installing Ubuntu or some other distro completely (instead of just booting into a live distro). I was going to say that I will try that next, but the amd-ucode fix seems faster so I might try that first.

What bugs me is that there is no indication of what the problem is since I can't find anything in the log files. That means that I don't know how to reproduce the problem conditions or how long to wait before being reasonably sure that the problem is fixed since it is crashing after very different lengths of time.

cfr wrote:

Do you get crashes if you take X out of it (or Wayland, if you're using that) i.e. boot to multi-user.target & stay there?

Can you ssh into the system when things go down?

I have only seen crashes while I'm in X. I haven't tried using Wayland because my understanding was that it doesn't play well with nvidia. I'm not sure what multi-user.target is. I'll go look it up. Also, I haven't set up an openssh server on the computer - I can definitely try to do that.

cfr wrote:

Edit: see https://wiki.archlinux.org/title/Ryzen re. microcode and, possibly, the troubleshooting section.

This is interesting. I will go read this more closely - looks promising.

Again, thank you so much! I'm hoping that, with your help, I'm able to resolve this soon and get back to being able to focus on the computations I need to run on this machine. Be back soon with an update.

Offline

#17 2022-11-19 23:16:39

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

The firmware seems to be updated, but I did install the amd-ucode package. PSA to my future self when I forget: make sure to update the bootloader configuration so that it actually loads the microcode. I'm running grub, so I can run:

# grub-mkconfig -o /boot/grub/grub.cfg

Anyway, I've rebooted now. I checked

journalctl -k --grep=microcode

and that gave me

Nov 19 17:39:06 ag kernel: microcode: CPU0: patch_level=0x0a50000c
Nov 19 17:39:06 ag kernel: microcode: CPU1: patch_level=0x0a50000c
Nov 19 17:39:06 ag kernel: microcode: CPU2: patch_level=0x0a50000c
...
Nov 19 17:39:06 ag kernel: microcode: CPU15: patch_level=0x0a50000c
Nov 19 17:39:06 ag kernel: microcode: Microcode Update Driver: v2.2.

The Microcode wiki page says that the output should look like:

microcode: microcode updated early to new patch_level=0x0700010f
microcode: CPU0: patch_level=0x0700010f
microcode: CPU1: patch_level=0x0700010f
microcode: CPU2: patch_level=0x0700010f
microcode: CPU3: patch_level=0x0700010f
microcode: Microcode Update Driver: v2.2.

I'm a little concerned that I didn't see the "microcode updated early to new patch_level" line.

Anyway, waiting for the next crash (although, hopefully that fixes things so that I can go back to re-installing all the nvidia drivers).

Offline

#18 2022-11-21 20:41:07

topcat01
Member
Registered: 2019-09-17
Posts: 123

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

The "updated early" message is only shown if the microcode package has updates newer than what is already in the firmware.

Offline

#19 2022-11-21 20:51:02

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Does that mean installing the microcode package had no effect since it was already in the firmware? I checked journalctl again and the lines showing the microcode patch_level have been there since the beginning. So maybe that wasn't the issue (I don't understand microcode other than it's supposed to make the processor "more stable")?

Offline

#20 2022-11-21 20:53:02

topcat01
Member
Registered: 2019-09-17
Posts: 123

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

If the vendor firmware update installed all relevant microcode then yes. However, with the package you automatically get on-the-fly CPU microcode updates.

I know how to parse Intel microcode from their update package (which intel-ucode bundles) and check against the version reported in the journal. AMD must have a similar process in case you are curious.

Last edited by topcat01 (2022-11-21 20:55:12)

Offline

#21 2022-11-21 21:02:02

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I see. Well, I don't know how to tell if the microcode fixed whatever issue it is causing. On the Ryzen page that cfr mentioned above, it does list random reboots as one of the issues (although mine is random freezes). It also says that something like

kernel: mce: [Hardware Error]: Machine check events logged
kernel: mce: [Hardware Error]: CPU 22: Machine Check: 0 Bank 1: bc800800060c0859
kernel: mce: [Hardware Error]: TSC 0 ADDR 7ea8f5b00 MISC d012000000000000 IPID 100b000000000 
kernel: mce: [Hardware Error]: PROCESSOR 2:a20f10 TIME 1636645367 SOCKET 0 APIC d microcode a201016

should show up in the dmesg logs. I wasn't saving my dmesg logs so I don't know if this is what was happening but I don't know if there is anything else I can do other than wait for it to crash again (if it does, and whenever it does).

Offline

#22 2022-11-21 21:05:28

topcat01
Member
Registered: 2019-09-17
Posts: 123

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Also, since you're using linux-lts, you might consider giving linux a try.

Offline

#23 2022-11-21 21:11:06

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I've been using linux for a couple weeks now

Offline

#24 2022-11-21 21:12:51

topcat01
Member
Registered: 2019-09-17
Posts: 123

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

Cool, so it happens on both kernels. More likely to be either hardware or (nvidia) driver then.

Offline

#25 2022-11-21 21:21:20

anant
Member
Registered: 2022-11-06
Posts: 18

Re: [SOLVED] Frequent Crashes on Lenovo Legion 5 Laptop (AMD + NVIDIA)

I've had a crash with no nvidia drivers installed either. I think that was before I had amd-ucode set up though (but, from what I'm understanding, the microcode was already built into the firmware). Sounds like it's leaning towards being a hardware problem then. hmm

Offline

Board footer

Powered by FluxBB