You are not logged in.

#1 2023-11-25 14:21:42

Tim-Rex
Member
Registered: 2023-11-25
Posts: 7

Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

I have a multi-gpu setup with both nvidia and amdgpu drivers installed.
I have various needs to be able to boot with various combination of those drivers blacklist or not.

Specifically:
nvidia
nvidia + amdgpu
nouveau
nouveau + amdgpu

I manage this largely by setting rd.driver.blacklist and modprobe.blacklist boot options accordingly (among other things).


I used the following for a nouveau only boot mode

rd.driver.blacklist=nvidia,nvidia_uvm,nvidia_drm,nvidia_modeset,amdgpu modprobe.blacklist=nvidia,nvidia_uvm,nvidia_drm,nvidia_modeset,amdgpu

That largely works fine and only the nouveau drivers are loading as expected.
Unfortunately nvidia-modprobe has other ideas.

According to that documentation, the nvidia userspace components will helpfully try to load nvidia drivers as and when necessary.

I can see that in action whenever I run vulkaninfo. Of course the drivers fail to load (since nouveau already has ownership) and that should be expected.
But I'd like for nvidia-modprobe to not be so helpful.

An strace of vulkaninfo confirms the behaviour.. The vulkan ICD's for nVidia and radeon are being picked up

/usr/share/vulkan/icd.d/nvidia_icd.json
/usr/share/vulkan/icd.d/radeon_icd.x86_64.json

The nvidia ICD references /usr/lib/libGLX_nvidia.so.0
I can see various other nvidia libs being pulled in at this time, and then it calls out to /usr/bin/nvidia-modprobe.

In fact, just calling vulkaninfo causes nvidia-modprobe to be hit repeatedly (32 times to be exact).
This takes considerable time (43 seconds) and introduces reasonable load while it processes udev rules and whatever else.



So.. the question becomes...  What is the best way to stop nvidia-modeprobe from happening?

Option 1)
I can move the nvidia_icd.json out of the way so that it doesn't get picked up, and that does solve the issue in this particular scenario.. but i'd prefer not to be shuffling ICD's around to accomodate whichever boot profile (and blacklisting) I'm running at the time..

Option 2)
I could just remove/rename nvidia-modprobe so that it doesn't get pinged.
That seems to work, though vulkaninfo still tries to load libGLX_nvidia.so.0 and fails (quickly at least)

]$ vulkaninfo --summary
ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs:  Failed to detect any valid GPUs in the current config
ERROR at /vulkan-sdk/1.3.268.0/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:237:vkEnumeratePhysicalDevices failed with ERROR_INITIALIZATION_FAILED

Now that I've written all of this out, it strikes me that
a) The ICD for the radeon device doesn't give any trouble. I guess the radeon drivers aren't trying to be quite as helpful.
b) There is no ICD for mesa/nouveau  (doesn't support vulkan, I genuinely thought it did. Looking forward to NVK)


I still don't fully understand why vulkaninfo balks at not being able to get vkCreateInstance for libGLX_nVidia.so, but has no such trouble from libvulkan_radeon.so

I need to check out my Fedora box, as I don't recall seeing this issue there either..



Oh and.. Arch is the best.  I made the switch this week.

Last edited by Tim-Rex (2023-11-26 05:31:15)

Offline

#2 2023-11-25 16:34:46

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 14,885

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

Verify that you have vulkan-mesa-layers & lib32-vulkan-mesa-layers installed .
This will install several vulkan layers including the VK_LAYER_MESA_device_select layer.

Run

$ MESA_VK_DEVICE_SELECT=list  vulkaninfo

It will show which hardware & drivers are seen on your system.
There are several options to select a specific device or driver, see https://docs.mesa3d.org/envvars.html#vu … -variables


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#3 2023-11-26 03:55:17

Tim-Rex
Member
Registered: 2023-11-25
Posts: 7

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

I have vulkan-mesa-layers installed, I discovered these were missing while I was writing my post but failed to mention it.
The implicit layer for VkLayer_MESA_device_select.json is in place

I've also found that nvidia-modprobe seems to no longer be misbehaving, I can't quite account for why. Perhaps a reboot after installing the vulkan-mesa-layer did the trick.


$ MESA_VK_DEVICE_SELECT=list  vulkaninfo
ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs:  Failed to detect any valid GPUs in the current config
ERROR at /vulkan-sdk/1.3.268.0/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:237:vkEnumeratePhysicalDevices failed with ERROR_INITIALIZATION_FAILED
$ find /usr/share/vulkan/ -type f
/usr/share/vulkan/icd.d/nvidia_icd.json
/usr/share/vulkan/icd.d/radeon_icd.x86_64.json
/usr/share/vulkan/explicit_layer.d/VkLayer_INTEL_nullhw.json
/usr/share/vulkan/explicit_layer.d/VkLayer_MESA_overlay.json
/usr/share/vulkan/implicit_layer.d/nvidia_layers.json
/usr/share/vulkan/implicit_layer.d/VkLayer_MESA_device_select.json

Note that I do not have lib32-vulkan-mesa installed. Seems there are in the multilib repo, however I'm not running any 32-bit applications.
Is this actually required?

Offline

#4 2023-11-26 04:29:05

Tim-Rex
Member
Registered: 2023-11-25
Posts: 7

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

Okay, I've found the issue.

I needed to install lavapipe (vulkan-swrast)

Everything looks good!

Offline

#5 2023-11-26 11:33:45

Lone_Wolf
Administrator
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 14,885

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

Note that I do not have lib32-vulkan-mesa installed. Seems there are in the multilib repo, however I'm not running any 32-bit applications.

Steam & wine are the 2 most common applications that require 32-bit support, if you don't use them then you don't need it.

No clue why vulkaninfo fails for you without vulkan-swrast .


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.

clean chroot building not flexible enough ?
Try clean chroot manager by graysky

Offline

#6 2023-11-26 11:50:52

Tim-Rex
Member
Registered: 2023-11-25
Posts: 7

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

Darn...
In my excitement to get llvmpipe running, I thought I had resolved the nvidia-modprobe issue, sadly not.

When I'm running nouveau drivers the vulkan ICD will still ask the nVidia drivers about what they offer and that triggers nvidia-modprobe to try and load the drivers despite the boot-time blacklisting. And it turns out, removing nvidia-modprobe prevents the nvidia drivers from loading when I do want them (booting without nouveau).

The same thing happens with the likes of eglinfo, gbminfo.. but less eggregiously (it only polls the driver one, while vulkaninfo does so repeatedly).


Tim-Rex wrote:

So.. the question becomes...  What is the best way to stop nvidia-modprobe from happening?

Option 1)
I can move the nvidia_icd.json out of the way so that it doesn't get picked up, and that does solve the issue in this particular scenario.. but i'd prefer not to be shuffling ICD's around to accomodate whichever boot profile (and blacklisting) I'm running at the time..

Option 2)
I could just remove/rename nvidia-modprobe so that it doesn't get pinged.
That seems to work, though vulkaninfo still tries to load libGLX_nvidia.so.0 and fails (quickly at least)

]$ vulkaninfo --summary
ERROR: [Loader Message] Code 0 : loader_scanned_icd_add: Could not get 'vkCreateInstance' via 'vk_icdGetInstanceProcAddr' for ICD libGLX_nvidia.so.0
ERROR: [Loader Message] Code 0 : setup_loader_term_phys_devs:  Failed to detect any valid GPUs in the current config
ERROR at /vulkan-sdk/1.3.268.0/source/Vulkan-Tools/vulkaninfo/vulkaninfo.h:237:vkEnumeratePhysicalDevices failed with ERROR_INITIALIZATION_FAILED

So.. Option 2 doesn't work that way I wanted it to.. the nvidia drivers don't load at all in this scenario (I had thought the drivers would load automatically if not explicitly blacklisted)

Perhaps what I need here is two seperate initramfs configurations..
I'm not sure if this works the way I think it does but, I'm thinking:

- One initramfs with nvidia drivers built in for early init. This should sidestep the need for nvidia-modprobe to happen on-demand, maybe?
- A second initramfs without nvidia drivers, purely for nouveau boot scenarios

This way I could safely move nvidia-modprobe out of the way and not have it cause issues in either scenario.


I'm not too clear on how Fedora packages manage this, nvidia-modprobe isn't part of their distribution. nVidia have a ton of documentation on the various ways to package and distribute their driver but much of that is above my head..  I know Fedora use akmods and perhaps that sidesteps the need for nvidia-modprobe entirely.

I think this is going to be a problem for another day.

Last edited by Tim-Rex (2023-11-26 11:51:54)

Offline

#7 2023-11-26 17:03:50

seth
Member
From: Don't DM me only for attention
Registered: 2012-09-03
Posts: 74,254

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

"modprobe.blacklist" ignore the device aliases, it the module won't be loaded when matching HW is found, but still can by other means.
"module_blacklist=modname1,modname2,modname3" will prevent those modules from being loaded no matter what.

Didn't read the rest of the thread so idk whether there's no superior solution to blacklisting modules.

Offline

#8 2023-11-27 12:15:59

Tim-Rex
Member
Registered: 2023-11-25
Posts: 7

Re: Blacklisting nvidia doesn't prevent nvidia-modprobe trying [SOLVED]

Very near to a complete solution here.
Long story short, it should be entirely sufficient to do away with nvidia-modprobe but when booting with only nvidia drivers they fail to fully initialise.

This results in the various ICD's for EGL/Vulkan failing to return any useful device.  nvidia-modprobe usually handles this as a fallback scenario, but it gets in the way when we're not running the nvidia drivers.

This also causes problems for GDM since it's unable to locate a suitable EGL device on startup and falls back into X11 mode rather than wayland.

I've found that the nvidia drivers need a little nudge to get them to properly initialise, either by manually running nvidia-modprobe (eg: if you just moved/renamed it rather than removing entirely)..  or by running vulkaninfo as root.

I'm sure there are likely other (more correct) ways to get the drivers to complete their initialisation, but waiting to see what comes of this thread over here

Offline

Board footer

Powered by FluxBB