You are not logged in.
I can't manage to get nvidia driver working on linux kernel, even though it works fine on linux-lts.
I installed the nvidia package, but the driver is not loading.
Some outputs:
nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.sudo modprobe nvidia -vv
modprobe: INFO: custom logging function 0x55c4ba15faf0 registered
insmod /lib/modules/6.1.1-arch1-1/extramodules/nvidia.ko.xz
modprobe: INFO: Failed to insert module '/lib/modules/6.1.1-arch1-1/extramodules/nvidia.ko.xz': No such device
modprobe: ERROR: could not insert 'nvidia': No such device
modprobe: INFO: context 0x55c4ba875460 releasedlspci -k
00:00.0 Host bridge: Intel Corporation Coffee Lake HOST and DRAM Controller (rev 0c)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel driver in use: skl_uncore
00:02.0 VGA compatible controller: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel driver in use: i915
Kernel modules: i915
02:00.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel modules: nouveau, nvidia_drm, nvidiapacman -Q | grep nvidia
lib32-nvidia-utils 525.60.11-1
nvidia 525.60.11-5
nvidia-prime 1.0-4
nvidia-settings 525.60.11-2
nvidia-utils 525.60.11-1uname -r
6.1.1-arch1-1sudo dmesg | grep -E 'nvidia|NVRM'
[ 3.391036] nvidia: loading out-of-tree module taints kernel.
[ 3.391052] nvidia: module license 'NVIDIA' taints kernel.
[ 3.461850] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[ 3.739474] nvidia-nvlink: Nvlink Core is being initialized, major device number 510
[ 3.740893] nvidia 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessible
[ 3.741087] NVRM: This is a 64-bit BAR mapped above 4GB by the system
NVRM: BIOS or the Linux kernel, but the PCI bridge
NVRM: immediately upstream of this GPU does not define
NVRM: a matching prefetchable memory window.
[ 3.741090] NVRM: This may be due to a known Linux kernel bug. Please
NVRM: see the README section on 64-bit BARs for additional
NVRM: information.
[ 3.741091] nvidia: probe of 0000:02:00.0 failed with error -1
[ 3.741111] NVRM: The NVIDIA probe routine failed for 1 device(s).
[ 3.741112] NVRM: None of the NVIDIA devices were initialized.
[ 3.741284] nvidia-nvlink: Unregistered Nvlink Core, major device number 510
[ 4.224423] nvidia-nvlink: Nvlink Core is being initialized, major device number 510Setting ibt=off does nothing. Setting "acpi_osi=Windows 2009" fixes the driver issue but breaks touchpad and power button.
Is there a way to resolve this?
Offline
Online
Tried, no luck
Offline
No luck
Did you unload the module beforehand?
Also the commands there are a bit BS - "sudo echo" won't work.
lspci -k
sudo modprobe -r nvidia # does this case any errors?
echo 1 | sudo tee '/sys/bus/pci/devices/0000:02:00.0/remove'
echo 1 | sudo tee /sys/bus/pci/rescan
sudo modprobe -v nvidia # does this indicate any action
lspci -kOnline
before:
lspci -k
00:00.0 Host bridge: Intel Corporation Coffee Lake HOST and DRAM Controller (rev 0c)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel driver in use: skl_uncore
00:02.0 VGA compatible controller: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel driver in use: i915
Kernel modules: i915
02:00.0 3D controller: NVIDIA Corporation GP108M [GeForce MX250] (rev a1)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel modules: nouveau, nvidia_drm, nvidiaafter:
lspci -k
00:00.0 Host bridge: Intel Corporation Coffee Lake HOST and DRAM Controller (rev 0c)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel driver in use: skl_uncore
00:02.0 VGA compatible controller: Intel Corporation WhiskeyLake-U GT2 [UHD Graphics 620] (rev 02)
Subsystem: Acer Incorporated [ALI] Device 1301
Kernel driver in use: i915
Kernel modules: i915So rescanning doesn't bring the gpu back. No errors with sudo modprobe -r nvidia (I think it does nothing in my case, the driver isn't loaded), and sudo modprobe -v nvidia is the same as it was:
insmod /lib/modules/6.1.1-arch1-1/extramodules/nvidia.ko.xz
modprobe: ERROR: could not insert 'nvidia': No such deviceOffline
The device is actually completely gone after the rescan.
What if you add "pcie_aspm=off" to the kernel parameters?
There're no BIOS updates available for the device?
Online
"pcie_aspm=off"
Nothing's changed.
There're no BIOS updates available for the device?
There is an update but I don't have windows installed so updating may be problematic.
Offline
And as I said, the driver works fine on linux-lts + nvidia-lts. I don't understand why linux + nvidia (and linux + nvidia-dkms) won't work.
Offline
I actually missed that.
I guess things started to fail w/ 6.0?
https://bbs.archlinux.org/viewtopic.php … 1#p2062391
However, do you get
[ 3.741087] NVRM: This is a 64-bit BAR mapped above 4GB by the system
NVRM: BIOS or the Linux kernel, but the PCI bridge
NVRM: immediately upstream of this GPU does not define
NVRM: a matching prefetchable memory window.on the lts kernel as well?
Online
I guess things started to fail w/ 6.0?
No, it started before 6.0 and hasn't been fixed since.
And no, I don't have this error in my dmesg on the lts kernel.
Offline
Test and warning aren't new.
Please post the dmesg from either kernel (no grepping)
Online
dmesg from linux:
https://pastebin.com/PRxtrWF0
dmesg from linux-lts:
https://pastebin.com/2xeLbZ8t
Offline
The BAR config is exactly the same between the kernels, so this is probably a misinterpretation and red herring and the only real problem is
[ 3.665434] nvidia 0000:02:00.0: Unable to change power state from D3cold to D0, device inaccessibleAdd "rcutree.rcu_idle_gp_delay=1" to the kernel parameters.
If that doesn't help, add i915 and nvidia, nvidia_modeset, nvidia_uvm and nvidia_drm to the initramfs (rather don't use the kms hook, remove it if it's there)
If that doesn't help, keep i915 but remove the nvidia modules from the initramfs.
Also
zgrep PREEMPT /proc/config.gzlinux:
[ 0.109706] rcu: Preemptible hierarchical RCU implementation.
[ 0.109707] rcu: RCU restricting CPUs from NR_CPUS=320 to nr_cpu_ids=8.
[ 0.109709] rcu: RCU priority boosting: priority 1 delay 500 ms.
[ 0.109714] Trampoline variant of Tasks RCU enabled.
[ 0.109715] Rude variant of Tasks RCU enabled.
[ 0.109717] Tracing variant of Tasks RCU enabled.
[ 0.109721] rcu: RCU calculated value of scheduler-enlistment delay is 30 jiffies.
[ 0.109727] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[ 0.118666] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.131129] rcu: Hierarchical SRCU implementation.
[ 0.131129] rcu: Max phase no-delay instances is 1000.linux-lts:
[ 0.139534] rcu: Hierarchical RCU implementation.
[ 0.139537] rcu: RCU restricting CPUs from NR_CPUS=320 to nr_cpu_ids=8.
[ 0.139539] Rude variant of Tasks RCU enabled.
[ 0.139540] Tracing variant of Tasks RCU enabled.
[ 0.139542] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.139543] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=8
[ 0.168081] rcu: Hierarchical SRCU implementation.Online
zgrep PREEMPT /proc/config.gz
CONFIG_PREEMPT_BUILD=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_COUNT=y
CONFIG_PREEMPTION=y
CONFIG_PREEMPT_DYNAMIC=y
CONFIG_PREEMPT_RCU=y
CONFIG_HAVE_PREEMPT_DYNAMIC=y
CONFIG_HAVE_PREEMPT_DYNAMIC_CALL=y
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_DRM_I915_PREEMPT_TIMEOUT=640
# CONFIG_DEBUG_PREEMPT is not set
# CONFIG_PREEMPT_TRACER is not set
# CONFIG_PREEMPTIRQ_DELAY_TEST is not setI'll try the kernel parameter as soon as I can
Offline
Is the config from the lts, the main or both kernels (ie. no difference at all)?
Online
It's from the main, can't reboot the machine at the moment
Offline
Ok, I tried the rcutree.rcu_idle_gp_delay=1 parameter, then modules but nothing changed
Offline
Do you have an updated journal (and also one for "acpi_osi=Windows 2009")?
I suspect some sort of race condition and the lts kernel loaded the GPU modules ~0.5s before the main kernel.
Even if that's not it, there's hopefully some sort of pattern between the good and bad cases.
Can you somewhat track down the kernel version when this started (maybe in your pacman log)?
Online
I'm not sure what you meant by updated journal.
I'll try to trace back the kernel version in which the bug first occurred.
Here's dmesg for linux + acpi_osi=! "acpi_osi=Windows 2009" on which the driver loads successfully: https://pastebin.com/HW0rctVg
Offline
After adding the modules to the initramfs (the lasted log seems to have i915 there) and "rcutree.rcu_idle_gp_delay=1"
The nvidia module loads even later in the most recent ("good") journal, so that's probably not it.
I'll say that your best bet is probably to update the BIOS, but just to be sure: have you tried this w/o apparmor?
Online
have you tried this w/o apparmor?
I haven't. I'll give it a shot
Offline
Disabling apparmor didn't help
Offline
your best bet is probably to update the BIOS
Okay, I installed windows and updated UEFI to the latest version. But nothing changed.
Do you have any other ideas? Or maybe you know someone I can address?
Offline
What happens if you blacklist all nvidia drivers (check "lsmod", do NOT use the "install /bin/true" approach) and explicitly load them after the boot?
https://wiki.archlinux.org/title/Kernel … acklisting
Online
blacklist all nvidia drivers and explicitly load them
Tried that, gives the same
insmod /lib/modules/6.1.1-arch1-1/extramodules/nvidia.ko.xz
modprobe: ERROR: could not insert 'nvidia_modeset': No such deviceinsmod /lib/modules/6.1.1-arch1-1/extramodules/nvidia.ko.xz
modprobe: ERROR: could not insert 'nvidia_drm': No such deviceand so on
Offline