You are not logged in.

#1 2021-04-16 20:29:59

keibak
Member
Registered: 2017-05-24
Posts: 48

[SOLVED] nvidia driver causes kernel panic

Hi,

since today my system isn't booting anymore. The last message is

Starting WPA

. I believe this message has nothing to do with the problem. Yesterday I installed an nvidia driver update to version 465. The system is booting to tty, if i add kernel parameter

module_blacklist=nvidia

. I also tried downgrading, but I had no sucess there.

Are there any known issues with the current nvidia driver?

Sorry for any inaccuracies. Because of above I'm typing the errors from memory.

Last edited by keibak (2021-07-21 19:51:34)

Offline

#2 2021-04-16 21:50:29

jasonwryan
Anarchist
From: .nz
Registered: 2009-05-09
Posts: 30,424
Website

Re: [SOLVED] nvidia driver causes kernel panic

Your system is booting, it just isn't making it to the graphical target (whatever that is). Boot into rescue mode and read your journal.
https://wiki.archlinux.org/index.php/Sy … _boot_into


Arch + dwm   •   Mercurial repos  •   Surfraw

Registered Linux User #482438

Offline

#3 2021-04-16 21:59:19

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

jasonwryan wrote:

Your system is booting, it just isn't making it to the graphical target (whatever that is)

Well, the system is dead on boot: No tty, keyboard not responding, no answer on num lock. That's kind of a bummer.

Journalctl doesn't show any hint, when nvidia is blacklisted. I'm curious what rescue mode will bring up.

Offline

#4 2021-04-17 04:46:25

vostreltsov
Member
Registered: 2012-11-15
Posts: 1

Re: [SOLVED] nvidia driver causes kernel panic

Same here, couldnt boot using nvidia graphics card. Switched to integrated video card and xf86-video-intel for now sad

Offline

#5 2021-04-17 08:49:16

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

Make sure to look at the failing boot, not the current one.
Also: can you boot the multi-user.target w/o blacklisting the nvidia module? (2nd link below)

Offline

#6 2021-04-17 09:13:43

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

The system is booting into rescue mode. Continuing normal boot results in a freeze.

Looking into a frozen log I can see a kernel panic

BUG: kernel NULL pointer dereference
Call Trace:
 ? _nv0155556rm [nvidia]
note: Xorg[607] exited with preempt_count 1
Fixing recursice fault but reboot is needed!

After the panic the log goes on for time and finally ends with

Tried to start Xorg before previous instance exited
Attempt 3 starting the Display server on vt 1 failed
Could not start Display server on vt 1

I'm not sure where to go from here. Previously I tried downgrading to the previous nvidia driver and kernel version, but still ended up w/o proper boot.

Booltog was bb6f6eae3ff1406487165551c8b6d384 2021-04-17 10:45

Offline

#7 2021-04-17 09:22:50

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

"rescue" or "multi-user"?
It looks like Xorg triggers a nvidia kernel module bug, please post your xorg log (from the multi-user.target you've network and can use the tip in the first link below)

Also, what is your graphical.target (some DM, I assume)

Offline

#8 2021-04-17 09:34:55

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

The panic appear in rescue, I might after resuming boot. My DM is sddm.

The log's timestamp is right after mentioned boot, so it should be created by above run Xorg.log

Offline

#9 2021-04-17 09:54:18

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

This is the panic from rescue:

Apr 17 11:43:52 Angband kernel: BUG: kernel NULL pointer dereference, address: 0000000000000170
Apr 17 11:43:52 Angband kernel: #PF: supervisor read access in kernel mode
Apr 17 11:43:52 Angband kernel: #PF: error_code(0x0000) - not-present page
Apr 17 11:43:52 Angband kernel: PGD 0 P4D 0 
Apr 17 11:43:52 Angband kernel: Oops: 0000 [#1] PREEMPT SMP PTI
Apr 17 11:43:52 Angband kernel: CPU: 4 PID: 543 Comm: Xorg Tainted: P           OE     5.11.14-arch1-1 #1
Apr 17 11:43:52 Angband kernel: Hardware name: System manufacturer System Product Name/PRIME Z370-A, BIOS 2401 07/12/2019
Apr 17 11:43:52 Angband kernel: RIP: 0010:_nv015534rm+0x1b6/0x330 [nvidia]
Apr 17 11:43:52 Angband kernel: Code: 8b 87 68 05 00 00 ba 01 00 00 00 be 02 00 00 00 e8 cf 70 f2 e9 41 83 c5 01 41 83 fd 1f 0f 84 0b 01 00 00 48 8b 45 10 44 89 ee <48> 8b b8 70 01 00 00 48 8b 87 d8 04 00 00 e8 a7 70 f2 e9 89 c3 48
Apr 17 11:43:52 Angband kernel: RSP: 0018:ffffbbde00fb39a0 EFLAGS: 00010293
Apr 17 11:43:52 Angband kernel: RAX: 0000000000000000 RBX: 0000000000004000 RCX: 0000000000000002
Apr 17 11:43:52 Angband kernel: RDX: 0000000000000004 RSI: 0000000000000002 RDI: 0000000000000000
Apr 17 11:43:52 Angband kernel: RBP: ffff9bb04dcf2dd0 R08: 0000000000000001 R09: ffff9bb04dcf2cb8
Apr 17 11:43:52 Angband kernel: R10: ffff9bb050fe8008 R11: 0000000010100000 R12: 0000000000004000
Apr 17 11:43:52 Angband kernel: R13: 0000000000000002 R14: ffff9bb0568d0010 R15: 0000000000000800
Apr 17 11:43:52 Angband kernel: FS:  00007fe20e433940(0000) GS:ffff9bb76eb00000(0000) knlGS:0000000000000000
Apr 17 11:43:52 Angband kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 17 11:43:52 Angband kernel: CR2: 0000000000000170 CR3: 000000011635a003 CR4: 00000000003706e0
Apr 17 11:43:52 Angband kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Apr 17 11:43:52 Angband kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Apr 17 11:43:52 Angband kernel: Call Trace:
Apr 17 11:43:52 Angband kernel:  ? _nv015556rm+0x7fd/0x1020 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? _nv027154rm+0x22c/0x4f0 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? _nv017786rm+0x303/0x5e0 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? _nv017788rm+0xe1/0x220 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? _nv022828rm+0xed/0x220 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? _nv023064rm+0x30/0x60 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? _nv000704rm+0x16da/0x22b0 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? rm_init_adapter+0xc5/0xe0 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? kthread_create_on_node+0x51/0x70
Apr 17 11:43:52 Angband kernel:  ? nv_open_device+0x122/0x8a0 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? nvidia_open+0x297/0x540 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? kobj_lookup+0xf0/0x160
Apr 17 11:43:52 Angband kernel:  ? nvidia_frontend_open+0x53/0xa0 [nvidia]
Apr 17 11:43:52 Angband kernel:  ? chrdev_open+0xca/0x240
Apr 17 11:43:52 Angband kernel:  ? cdev_device_add+0x90/0x90
Apr 17 11:43:52 Angband kernel:  ? do_dentry_open+0x14e/0x380
Apr 17 11:43:52 Angband kernel:  ? path_openat+0xb67/0x1010
Apr 17 11:43:52 Angband kernel:  ? simple_xattr_get+0x65/0x90
Apr 17 11:43:52 Angband kernel:  ? do_filp_open+0x9c/0x140
Apr 17 11:43:52 Angband kernel:  ? do_sys_openat2+0xb1/0x160
Apr 17 11:43:52 Angband kernel:  ? __x64_sys_openat+0x54/0x90
Apr 17 11:43:52 Angband kernel:  ? do_syscall_64+0x33/0x40
Apr 17 11:43:52 Angband kernel:  ? entry_SYSCALL_64_after_hwframe+0x44/0xa9
Apr 17 11:43:52 Angband kernel: Modules linked in: nvidia_drm(POE) nvidia_modeset(POE) nvidia(POE) intel_rapl_msr intel_rapl_common eeepc_wmi asus_wmi iTCO_wdt intel_pmc_bxt mei_hdcp ee1004 iTCO_vendor_support sparse_keymap wmi_bmof intel_wmi_thunderbolt mxm_wmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi x86_pkg_temp_thermal ledtrig_audio intel_powerclamp coretemp kvm_intel snd_hda_intel snd_intel_dspcfg soundwire_intel soundwire_generic_allocation soundwire_cadence wl(POE) kvm snd_hda_codec snd_hda_core snd_hwdep soundwire_bus nls_iso8859_1 irqbypass crct10dif_pclmul vfat crc32_pclmul ghash_clmulni_intel fat snd_soc_core aesni_intel crypto_simd cryptd snd_compress glue_helper rapl ac97_bus snd_pcm_dmaengine intel_cstate cfg80211 drm_kms_helper snd_pcm intel_uncore cec snd_timer snd syscopyarea mei_me sysfillrect i2c_i801 sysimgblt pcspkr e1000e i2c_smbus rfkill mei soundcore fb_sys_fops video wmi acpi_pad mac_hid vboxnetflt(OE) vboxnetadp(OE) vboxdrv(OE) drm uhid sg crypto_user fuse

Full boot log

Offline

#10 2021-04-17 09:55:35

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

Crashes during the driver init, so it's likely not sddm.
Do you get away w/ the lts kernel? (don't forget nvidia-lts)

Offline

#11 2021-04-17 09:58:54

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

I get the same with linux-lts. Also it's not booting into fallback.

Are the driver's versions identical? I see
* nvidia-465.24.02
* nvidia-lts-465.24.02

Last edited by keibak (2021-04-17 10:01:39)

Offline

#12 2021-04-17 10:08:51

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

Seconds ago, nvidia-lts was udpated 465.24.02-1 -> 465.24.02-2. Sadly issue persists and lts boot results in kernel panic.

Offline

#13 2021-04-17 12:59:02

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

Does the GPU properly register in "lspci"?
You can check whether the GPU responds to nvidia-smi, but i don't thinks it ends up being the DDX driver, so this is likely just causing the same crash.

Ensure the GPU is properly seated, you're using the PCIe x16 slot, if there's a 6/8-pin dedicated power supply check that as well.
Then try the 390xx series, https://aur.archlinux.org/packages/nvidia-390xx-dkms/ & https://aur.archlinux.org/packages/nvidia-390xx-utils/

Offline

#14 2021-04-17 13:48:27

benfl
Member
Registered: 2018-04-15
Posts: 4

Re: [SOLVED] nvidia driver causes kernel panic

Just so you're aware, there's some discussion of the problem on reddit. You might be able to get things in a working state if you downgrade not only the driver version, but also the linux kernel (to version 5.11.13.arch1-1).

Offline

#15 2021-04-17 14:51:11

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

Tried to downgrade again. This time the graphical interface comes up.

The command to downgrade all required packages at once is

pacman -U nvidia-460.67-7-x86_64.pkg.tar.zst nvidia-utils-460.67-1-x86_64.pkg.tar.zst nvidia-lts-1\:460.67-6-x86_64.pkg.tar.zst libxnvctrl-460.67-1-x86_64.pkg.tar.zst nvidia-settings-460.67-1-x86_64.pkg.tar.zst lib32-nvidia-utils-460.67-1-x86_64.pkg.tar.zst linux-5.11.13.arch1-1-x86_64.pkg.tar.zst linux-lts-5.10.30-1-x86_64.pkg.tar.zst

I've never pressed the computer's reset button so often like in the last day.

Offline

#16 2021-04-17 16:03:58

loqs
Member
Registered: 2014-03-06
Posts: 17,197

Re: [SOLVED] nvidia driver causes kernel panic

If you install from the ALA
https://archive.archlinux.org/packages/ … kg.tar.zst
https://archive.archlinux.org/packages/ … kg.tar.zst
https://archive.archlinux.org/packages/ … kg.tar.zst
and remove remove nvidia 460.67-7 and nvidia-lts 1:460.67-6  so DKMS generate modules matching each kernel .  Check that still works.  Then try upgrading one of the kernels and see if that works.
To confirm the issue is triggered just by the nvidia package update.  Not a combination of nvidia and kernel update.

Offline

#17 2021-04-18 08:01:43

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

I'm not very keen to install those packages. The way from freeze at boot to a working system has been hard.

Offline

#18 2021-04-18 13:09:32

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

Since it seems to affect 465+lts as well, it's very most likely the nvidia module alone but if this doesn't get fixed soon, you probably want to install the nvidia-460-dkms and nvidia-460-utils packages to not hold the kernel forever.

Offline

#19 2021-04-18 16:56:52

loqs
Member
Registered: 2014-03-06
Posts: 17,197

Re: [SOLVED] nvidia driver causes kernel panic

Has the issue been reported to Nvidia?  Looking through the Nvidia forum the only possibility I can see is https://forums.developer.nvidia.com/t/o … ded/172865 which is for 460 but the backtrace and RIP are different so I am doubtful that is it the same issue.

Offline

#20 2021-04-18 17:28:06

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

Also the reporter provided a patch for 460, so doesn't fit the present pattern at all.
There're a bunch of similar backtraces w/ randomly old drivers because the pattern is "try to access the GPU, crash in one of nvidias obfuscated symbols", but skimming the recent posts and going w/ a 465 regression theory I don't think it's reported.

Offline

#21 2021-04-19 17:30:36

keibak
Member
Registered: 2017-05-24
Posts: 48

Re: [SOLVED] nvidia driver causes kernel panic

Still not sure how to classify this. This issue hit me with 465 drivers. That's the same version the reddit users are complaining about.

The nvidia forum gives quite a few hits when searching for kernel NULL pointer dereference. Seem the reports are about version 455 and 460.

Offline

#22 2021-04-19 19:40:17

seth
Member
Registered: 2012-09-03
Posts: 50,012

Re: [SOLVED] nvidia driver causes kernel panic

The nullptr derefs happen all the time, you can look at the RIP code to ballpark them but if it's some "_nv123456rm" path, there's no way to say whether there's a relation between different driver version crashes. Just present your backtrace - it's nvidias job to decode their private functions and assess whether this is a known issue.

Offline

#23 2021-04-20 08:17:46

MetalMatze
Member
From: Berlin
Registered: 2013-04-24
Posts: 3
Website

Re: [SOLVED] nvidia driver causes kernel panic

loqs wrote:

If you install from the ALA
https://archive.archlinux.org/packages/ … kg.tar.zst
https://archive.archlinux.org/packages/ … kg.tar.zst
https://archive.archlinux.org/packages/ … kg.tar.zst
and remove remove nvidia 460.67-7 and nvidia-lts 1:460.67-6  so DKMS generate modules matching each kernel .  Check that still works.  Then try upgrading one of the kernels and see if that works.
To confirm the issue is triggered just by the nvidia package update.  Not a combination of nvidia and kernel update.

Thanks a lot for these links and I can confirm that throughout the last 4 days I had the same problems running a NVIDIA Corporation GP106 [GeForce GTX 1060 6GB].

I got it working with a downgrade for now as well.
The  Kernel I got from this folder in the ALA and the nvidia packages from another folder in the ALA.
It's working flawlessly again with:

linux-5.11.13.arch1-1-x86_64.pkg.tar.zst
linux-headers-5.11.13.arch1-1-x86_64.pkg.tar.zst
nvidia-460.67-5-x86_64.pkg.tar.zst
nvidia-dkms-460.67-1-x86_64.pkg.tar.zst
nvidia-settings-460.67-1-x86_64.pkg.tar.zst
nvidia-utils-460.67-1-x86_64.pkg.tar.zst

What's the way forward with this though? Are we all simply going to have to wait for new drivers to arrive and then test them in the hopes they work?
How do you handle updates in the mean time? Basically pacman -Syu and then downgrade these few packages afterwards?

Last edited by MetalMatze (2021-04-20 08:23:27)

Offline

#24 2021-04-20 08:38:32

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,428

Re: [SOLVED] nvidia driver causes kernel panic

You can put them on the ignore list, but assuming it's just a nvidia driver regression you should be able to keep the kernel up to date (... and you don't need the dkms and the nvidia package both).

People that run into this and want to actually bring it forward should update everything, provoke the issue, ssh in and run nvidia-bug-report.sh and emailing the resulting file to the address mentioned there and/or create a thread on https://forums.developer.nvidia.com/c/g … /linux/148 with the file attached.

Online

#25 2021-04-21 07:34:51

ammonium
Member
Registered: 2021-04-21
Posts: 10

Re: [SOLVED] nvidia driver causes kernel panic

MetalMatze wrote:

I got it working with a downgrade for now as well.
The  Kernel I got from this folder in the ALA and the nvidia packages from another folder in the ALA.
It's working flawlessly again with:

linux-5.11.13.arch1-1-x86_64.pkg.tar.zst
linux-headers-5.11.13.arch1-1-x86_64.pkg.tar.zst
nvidia-460.67-5-x86_64.pkg.tar.zst
nvidia-dkms-460.67-1-x86_64.pkg.tar.zst
nvidia-settings-460.67-1-x86_64.pkg.tar.zst
nvidia-utils-460.67-1-x86_64.pkg.tar.zst

Thanks!! After a whole week trying to solve this problem downgrading all these packages to this version made it work for me too. I tried only downgrading the driver before and it didn't work, maybe the kernel got incompatible with this driver or vice versa?

Weirdly it requires

nvidia-dkms

, is it normal?

Also how would you check for when an fix update is released for this issue?

For the record I have a GTX 10 series and Intel 6700, in the case this Intel+NVIDIA is the issue with these drivers

Last edited by ammonium (2021-04-21 07:38:43)

Offline

Board footer

Powered by FluxBB