You are not logged in.

#1 2021-01-31 14:57:27

drybalka
Member
Registered: 2019-05-27
Posts: 10

modprobe nivida "No such device" after remove/rescan

I am trying to make nvidia-xrun work on my laptop and the problem is that it works, but only once. After manually repeating the steps from the script I narrowed down the problem to 'modprobe' not being able to find the card after it was removed from pci and then rescanned back. The shortest (not) working example is the following:

Just after boot and login the card is visible and nvidia is not loaded:

$ lspci | grep NVIDIA
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)
$ lsmod | grep nvidia
 

At this point I can load and unload the nvidia module without problems any number of times i like:

$ sudo modprobe nvidia
$ lsmod | grep nvidia
nvidia              34144256  0
$ sudo modprobe -r nvidia
$ lsmod | grep nvidia
 

The dmesg log is:

[  227.844178] nvidia: loading out-of-tree module taints kernel.
[  227.844187] nvidia: module license 'NVIDIA' taints kernel.
[  227.844188] Disabling lock debugging due to kernel taint
[  227.853395] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[  227.866409] nvidia-nvlink: Nvlink Core is being initialized, major device number 234

[  229.038560] nvidia 0000:01:00.0: enabling device (0006 -> 0007)
[  229.155129] NVRM: loading NVIDIA UNIX x86_64 Kernel Module  460.39  Thu Jan 21 21:54:06 UTC 2021

Then in the nvidia-xrun script they manage power on the nvidia card using pci. Here 0000:01:00.0 is the card id and 0000:00:01.0 is its pci. To power it off:

$ sudo tee /sys/bus/pci/devices/0000:01:00.0/remove <<<1
$ lspci | grep NVIDIA
 

To power it on again:

$ sudo tee /sys/bus/pci/devices/0000:00:01.0/power/control <<<on
$ sudo tee /sys/bus/pci/rescan <<<1
$ sudo tee /sys/bus/pci/devices/0000:01:00.0/power/control <<<on
$ lspci | grep NVIDIA
01:00.0 3D controller: NVIDIA Corporation GP107M [GeForce GTX 1050 Ti Mobile] (rev a1)

However, now the modprobe command fails:

$ sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device

The dmesg log is:

[  813.452711] nvidia-nvlink: Nvlink Core is being initialized, major device number 234

[  813.453102] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
[  813.453243] NVRM: The NVIDIA GPU 0000:01:00.0
               NVRM: (PCI ID: 10de:1c8c) installed in this system has
               NVRM: fallen off the bus and is not responding to commands.
[  813.453313] nvidia: probe of 0000:01:00.0 failed with error -1
[  813.453326] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  813.453326] NVRM: None of the NVIDIA devices were initialized.
[  813.453478] nvidia-nvlink: Unregistered the Nvlink Core, major device number 234

All my further efforts to reanimate the card were futile and only the laptop restart helps. Does anyone have any advice on the matter?

Offline

Board footer

Powered by FluxBB