You are not logged in.
Issues :
`nvidia-smi` says "No devices were found"
journalctl shows that [nvidia_drm] has an *ERROR* "Failed to allocate NvKmsKapiDevice
journalctl also shows [nvidia_drv] Failed to register device
It's a brand new laptop, and while I've used nvidia on Arch for years, this is my first time with a 5xxx gen and the "-open" drivers
The card is visible in all the usual places :
- `lspci -k -d ::o3xx shows 2 entries :
* 00:02:0 VGA controller: Intel Corporation Arrow Lake-S [Intel graphics] / Kernel driver in use: i915 / Kernel modules: i915, xe
* 02:00.0 VGA controller: Nvidia Corporation GB203M / GN22-X0 [GeForce RTX 5080 Max-Q / Mobile] / Kernel driver in use: nvidia / kernel modules: nouveau, nvidia_drm, nvidia
- lsmod | grep nvidia gives :
nvidia_drm 147456 0
drm_ttm_helper 16384 2 nvidia_drm,xe
nvidia_uvm 4009984 0
nvidia_modeset 2174976 1 nvidia_drm
video 81920 4 ideapad_laptop, xe, i915, nvidia_modeset
nvidia 12951552 2 nvidia_uvm, nvidia_modeset
Config
Arch version : freshly installed yesterday from a 2025.06.01 ISO
Arch kernel: 6.4.10-arch1-1
Nvidia drivers :
- nvidia-open-dkms 575.57.08-1
- I also tried the nvidia-open, the nvidia-open-beta from AUR (which are also 575.57.08-1) and even the nvidia (which are not supposed to work due to this being a 5xxx card)
Laptop settings : I tried dgpu only, and also Dynamic mode
I have disabled nouveau with
/etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
options nouveau modeset=0
/etc/mkinitcpio.conf has
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm) for early loading (I also tried without, same result)
I also removed `kms` from the HOOKS, as suggested by the wiki. No difference there.
Bootloader : systemd-boot, with the following kernel parameters :
options root=PARTUUID=<the UUID> rw nvidia-drm.modeset=1 nvidia-drm.fbdev=1 (i tried without the fbdev and without anything (I believe those are forced by nvidia-utils anyway), and same result)
I have reached the end of my knowledge, I run out of ideas of what to try next.
SOLUTION
This seems to be related to an upstream problem with 575 driver series
Context :
- https://github.com/NVIDIA/open-gpu-kern … issues/876
- https://bbs.archlinux.org/viewtopic.php?id=306130
Solution : downgrade to latest 570 drivers
- HOWTO from the wiki (tl;dr : `pacman -U <older package URL>, eg : https://archive.archlinux.org/packages/ … g.tar.zst)
- Make sure to downgrade both nvidia-open and nvidia-utils
- Don't forget to prevent the packages from updating by adding IgnorePkg=nvidia-open-dkms nvidia-utils in /etc/pacman.conf
Last edited by Jubijub (2025-06-13 09:08:32)
Offline
Online
Note: in the bios I forced "discrete graphics"
So here are the files in details :
sudo journalctl -b => http://0x0.st/8gC4.txt
Bingo, it also shows the firmware crash, same as the topic you pointed out.I tried to downgrade to 570.86.16-2 (there are errors during mkinitcpio, I did pick the -open-dkms version).
New error message : "NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA drivier. Make sure that teh latest NVIDIA driver is installed and running."
Logs with the 570.86 drivers :
- sudo journalctl -b => http://0x0.st/8gCl.txt
- lspci -k => http://0x0.st/8gCk.txt
Last edited by Jubijub (2025-06-09 17:47:32)
Offline
Why did you downgrade that far, and did the modules actually build?
second paste link is bad.
Last edited by Scimmia (2025-06-09 17:50:24)
Offline
Why did you downgrade that far, and did the modules actually build?
second paste link is bad.
1/ I took the latest 570 driver which is from 31-Jan-2025, there was no driver in between (the 575.57 is from June 1st : https://archive.archlinux.org/packages/ … open-dkms/ )
2/ given the errors, I am not sure the module fully built
3/ I fixed the link
Offline
The last version there is https://archive.archlinux.org/packages/ … kg.tar.zst - the listing is alphabetical and with that 570.16 > 570.153, but that's of course numerically false.
Try to install that version, make sure to have the linux-headers package installed and in doubt post the output of dkms status and the dkms build log.
Online
I'm stupid
indeed with the 570.153 it worked well (nvidia-smi posted as expected, with the laptop in dgpu only, in dynamic mode it didn't, it complains that it cannot talk to the driver.
Now I need to figure out how that whole Optimus thing.
How do I maintain this, that is how am I supposed to update the system (or not) until the upstream 575 driver gets fixed ? is it ok to add IgnorePkg=nvidia-open-dkms nvidia-utils in /etc/pacman.conf ?
Offline
is it ok to add IgnorePkg=nvidia-open-dkms nvidia-utils in /etc/pacman.conf ?
For the time being, yes. dkms will rebuild the module w/ every kernel update, but you need to pay some attention that this still works (notably w/ the upcoming 6.15 update)
If the firmware and driver no longer crashes, nvidia-smi should™ wake the GPU and get a response.
Please post your complete system journal for the new boot:
sudo journalctl -b | curl -F 'file=@-' 0x0.st
Online
In dynamic mode : sudo journalctl -b => http://0x0.st/8EQh.txt (not working)
In dGPU mode : sudo journalctl -b => http://0x0.st/8EQF.txt (WAI)
In Dynamic mode `nvidia-smi` doesn't work, and I get this NVRM: No NVIDIA GPU found.
From dmesg it seems it tries 4 times to initialize Nvlink Core, and then it unregisters Nvilink Core
Offline
jun 11 17:29:57 archlinux kernel: nvidia: loading out-of-tree module taints kernel.
jun 11 17:29:57 archlinux kernel: nvidia: module verification failed: signature and/or required key missing - tainting kernel
jun 11 17:29:57 archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240
jun 11 17:29:57 archlinux kernel: NVRM: No NVIDIA GPU found.
jun 11 17:29:57 archlinux kernel: nvidia-nvlink: Unregistered Nvlink Core, major device number 240
jun 11 17:29:57 archlinux kernel: usb 3-2: New USB device found, idVendor=17ef, idProduct=f006, bcdDevice= 0.34
jun 11 17:29:57 archlinux kernel: usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
jun 11 17:29:57 archlinux kernel: usb 3-2: Product: Idea5003
jun 11 17:29:57 archlinux kernel: usb 3-2: Manufacturer: ZEPHYR
jun 11 17:29:57 archlinux kernel: usb 3-2: SerialNumber: 0123456789ABCDEF
jun 11 17:29:57 archlinux kernel: usbcore: registered new interface driver usbhid
jun 11 17:29:57 archlinux kernel: usbhid: USB HID core driver
jun 11 17:29:57 archlinux kernel: hid-generic 0003:17EF:F006.0001: hiddev96,hidraw0: USB HID v1.10 Device [ZEPHYR Idea5003] on usb-0000:80:14.0-2/input1
jun 11 17:29:57 archlinux kernel: nvidia-nvlink: Nvlink Core is being initialized, major device number 240
jun 11 17:29:57 archlinux kernel: NVRM: No NVIDIA GPU found.
jun 11 17:33:26 archlinux kernel: pci 0000:02:00.0: [10de:2c59] type 00 class 0x030000 PCIe Legacy Endpoint
jun 11 17:33:26 archlinux kernel: pci 0000:02:00.1: [10de:22e9] type 00 class 0x040300 PCIe Endpoint
That's your nvidia GPU - whatever the "dynamic mode" is, the device isn't listed there. At all.
The cause is
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: [8086:ae4d] type 01 class 0x060400 PCIe Root Port
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: PCI bridge to [bus 02-03]
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: bridge window [io 0x9000-0x9fff]
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: bridge window [mem 0x80000000-0x840fffff]
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: bridge window [mem 0x6000000000-0x6401ffffff 64bit pref]
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: broken device, retraining non-functional downstream link at 2.5GT/s
jun 11 17:29:57 archlinux kernel: pci 0000:00:06.0: retraining failed
(that's the controller where the nvidia GPU is wired)
This would seem to be a problem w/ the firmware.
=> What are your options there?
Online
Thanks for the explanation.
My options are, as on most laptops with an nvidia card : Dynamic, Discrete and UMA, which should respectively be "both GPU turned on, external nvidia GPU only, or CPU internal GPU only".
I have found no other setting that could be relevant to this problem. (the Lenovo Legion bios is pretty terse, there is not a lot of options compared to a desktop motherboard)
Edit: scratch that. I tried to activate "Lenovo performance"...the computer had trouble booting 2-3 times, then it booted alright. Lenovo performance is still disabled, but now the GPU is waking up properly even in dynamic mode.
I have no explanation for this, maybe the computer needed a faulty reboot to reset something or whatever.
Now I will try and survive on the pinned driver version, hoping that nvidia fixes their drivers quickly.
Last edited by Jubijub (2025-06-12 20:03:26)
Offline
\o/
Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.
I'd suggest to also subscribe to https://bbs.archlinux.org/viewtopic.php?id=306130 to monitor the overall situation.
Online
is it ok to add IgnorePkg=nvidia-open-dkms nvidia-utils in /etc/pacman.conf ?
For the time being, yes. dkms will rebuild the module w/ every kernel update, but you need to pay some attention that this still works (notably w/ the upcoming 6.15 update)
And you got further "lucky" here, as 570.153.02 was patched for kernel 6.15.
https://gitlab.archlinux.org/archlinux/ … 77a062daec
You likely won't need to worry until 6.16 kernel comes around.
Also note that Arch packages New Feature Branch drivers (575 currently)...cuz we be early adopters rollin' rollin'...
The 570 series driver is the current Production Branch driver.
Now I will try and survive on the pinned driver version, hoping that nvidia fixes their drivers quickly.
Someone probably has/will package 570 in the AUR. I'd expect at least one more 570 release before 580 Production Branch release, but that's pure speculation based on historical pattern.
It's easy to build/carry/patch locally in near-perpetuity (see 390 series) as well if you really wanna geek out.
Offline
\o/
Please always remember to mark resolved threads by editing your initial posts subject - so others will know that there's no task left, but maybe a solution to find.
Thanks.I'd suggest to also subscribe to https://bbs.archlinux.org/viewtopic.php?id=306130 to monitor the overall situation.
done, and thanks A LOT for your help on this !
Offline
I'd expect at least one more 570 release before 580 Production Branch release
570.169 Production Branch driver was release today:
Offline