You are not logged in.
Hi,
I have a quite old computer (around 2008), that has an old version of Nvidia GPU. I just added on this computer, a fairly new Nvidia graphic card (GTX 1060), because I want to do some cuda on it.
As far as I understand, based on what I read, I should install the nvidia drivers and replace nouveau to the proprietary Nvidia driver to be able to use cuda. Nouveau is working fine on my old graphic card, so I would rather stick to it if I can. However, I wouldn't mind replacing it by the proprietary drivers if it works.
So, I tried to install the proprietary drivers. But if I install the nvidia driver, it seems that nouveau is removed from both graphic card (I did the mistake just before, and chose to reinstall from scratch.) My display was not working anymore... Even when I tried to "unblaklist" nouveau.
But the drivers seems to be on different kernel version, so I guess it is not compatible. Thus, if I install the old version, my old graphic card connected to the display would work, but not the new one for cuda. And vice versa. But I want to have a functional display and cuda.
So now, my question is: how could I use cuda on my new GPU, but keeping the display functional on my old graphic card?
Thanks for your help!
Cheers,
Last edited by dmidge (2019-01-06 21:56:26)
Offline
Looking at dependencies for cuda it may be possible.
Try installing nvidia-utils but NOT nvidia .
In order to keep nouveau working you'll probably have to manually remove /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf .
Edit: also install opencl-nvidia .
Last edited by Lone_Wolf (2018-12-08 12:50:51)
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
Hi Lone_Wolf,
Thanks for your help!
Unfortunately, I ended up with a black screen at reboot time.
For both GPU, lspci says that the module in use is nouveau. (I just renamed /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.conf to /usr/share/X11/xorg.conf.d/10-nvidia-drm-outputclass.confDel, in case I need to restore it. I don't think it would change anything.)
With dmesg, I get audit information. Nouveau loads and a bit later, it seems that I have a coredump invocated. I guess that it is of sddm (the starter of KDE/plasma desktop), because on some messages, that names pops up.
That is consistent to what I see in systemctl where this status is failed, with an "6/ABRT" (I guess abort) status is shown.
With journalctl -u sddm, I get a bit more info. The first error is "Failed to read display number from pipe", which comes just after the command "Running: /usr/bin/X -nolisten tcp auth [...]" ([...] because I shortened the line)
I renamed the /etc/X11/xorg.conf to /etc/X11/xorg.confDel. I managed to get back the display back. I will now test if can compile some cuda.
Last edited by dmidge (2018-12-08 18:45:10)
Offline
Sadly, cuda is still not working and I don't know how to fix it. even nvidia-smi don't know how to access the card...
Offline
Actually, at that stage, I am not even sure that I can make cuda work event though I would mind scrapping my old GPU...
Offline
Okay. Installing the linux419-nvidia and blacklisting nouveau works to have cuda. But now, my first card doesn't work and I am only in CLI.
How to make the nouveau work again with that?
Offline
You normally can't use both. The two kernel modules clash with each other so nouveau needs to be blacklisted to be able to use the nvidia module.
Maybe there's a way to have a certain kernel module only get used for one PCIe device but not another one?
Offline
That would be what I look like. Is there a way? Maybe through the proper Xorg.conf? Or some special modprobe.conf file?
Looking online, it may be more a job for udev. But I dunno.
Last edited by dmidge (2018-12-09 20:46:08)
Offline
This needs to be solved way before Xorg.conf is applied, custom udev rules might be "exactly what the doctor ordered" .
Let's hope some people that know how to write those respond.
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
@Lone_Wolf: I think you are right, but do you have any idea of how to find someone that I could poke to have some help with it?
(Merry Christmas to whom it applies. )
Last edited by dmidge (2018-12-25 19:43:56)
Offline
Of course you can use the 1060 for display AND cuda and throw out the old GPU.
One of the legacy versions may support both GPUs. See https://www.nvidia.com/object/unix.html and install with pacman if you find a suitable version.
Finally, I think it may be possible to run nouveau and nvidia side by side.
Make sure the old GPU is the one picked up by BIOS/UEFI. Typically a matter of rearranging them in PCIe slots.
Restore nouveau because that's the graphics stack you will use. If installation of nvidia removed some conflicting packages like mesa, reinstall them (check /var/log/pacman.log for that). If it installed module blacklists, remove them and blacklist nvidia instead (run "pacman -Ql nvidia" to find out what files it installed).
Once you have nouveau graphics running, reserve the 1060 like for GPU passthrough so that nouveau doesn't touch it during boot.
Manually load nvidia kernel module and see what happens. It won't touch the old GPU used by nouveau. It may bitch about nouveau being loaded. If you are lucky, it may still bind to the 1060 though and work with it.
Depending on how exactly the nvidia package is found to interfere with nouveau, some fiddling may be necessary to prevent problems at future updates.
Offline
@mich41:
Well, I can't remove the old GPU. It is soldered on the motherboard (as lot of laptops). And sadly, there are no versions that combined both. Actually, it seems it is also because old GPU drivers from Nvidia don't support the "new" linux kernels.
Sadly, it is very tough to make nouveau work back when the nvidia proprietary drivers are loaded. But it finally worked. And then, I tried to load the nvidia module by hand. It just won't because of the conflict with nouveau (and nouveau can use the new card). Thus I would need to find a way to unload the module on this specific card, but keeping it running on the other one, and then try to load the nvidia driver. Which is exactly the problem that we try to solve with udev.
Offline
You could try to pass your cuda gpu to a virtual machine and only install the nvidia drivers there.
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Online
Sadly, it is very tough to make nouveau work back when the nvidia proprietary drivers are loaded. But it finally worked.
Great, getting nouveau to work without uninstalling nvidia is half the job done.
And then, I tried to load the nvidia module by hand. It just won't because of the conflict with nouveau (and nouveau can use the new card). Thus I would need to find a way to unload the module on this specific card, but keeping it running on the other one, and then try to load the nvidia driver. Which is exactly the problem that we try to solve with udev.
I don't think it can be achieved with a udev rule. Problem is, once any of these kernel modules is loaded, it will bind to both devices, regardless of which device triggered the loading of this module. Later the other module will find both devices busy and fail to work. I don't think there is any way to stop that, except by binding the dummy pci-stub driver to the 1060 beforehand, like people do to reserve GPUs for VM passthrough. (Well, technically, you could edit nouveau code and recompile, but come on).
If you are lucky, it may be enough unbind nouveau from the 1060 after it has loaded and then load nvidia, but this approach may also fail for reasons including nouveau being buggy and crashing or nvidia failing to properly initialize a device previously initialized by nouveau.
To try it anyway, make sure that no X server is running on the 1060 and do
# lspci | grep VGA
01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS780L [Radeon 3000] <- note the numbers on the left
# echo 0000:01:05.0 > /sys/module/nouveau/drivers/pci\:nouveau/unbind <- yes, you need to add 0000:
This is obviously an integrated Radeon GPU, you are going to use numbers from your 1060.
If this succeeds and the machine doesn't crash, reload nvidia
rmmod nvidia
modprobe nvidia
Post any errors printed by modprobe or dmesg after nvidia loading, if the machine survives.
Offline
Hi,
Thanks for the precision mich41.
To try it anyway, make sure that no X server is running on the 1060 and do
# lspci | grep VGA 01:05.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] RS780L [Radeon 3000] <- note the numbers on the left # echo 0000:01:05.0 > /sys/module/nouveau/drivers/pci\:nouveau/unbind <- yes, you need to add 0000:
This is obviously an integrated Radeon GPU, you are going to use numbers from your 1060.
If this succeeds and the machine doesn't crash, reload nvidiarmmod nvidia modprobe nvidia
Post any errors printed by modprobe or dmesg after nvidia loading, if the machine survives.
So, as asked, I tried what you said. I loaded both nvidia and nouveau. Both are loaded at the same time as shown:
# lsmod
Module Size Used by
nouveau 2187264 1
mxm_wmi 16384 1 nouveau
wmi 28672 2 mxm_wmi,nouveau
i2c_algo_bit 16384 1 nouveau
ttm 126976 1 nouveau
nvidia_drm 53248 0
nvidia_modeset 1040384 1 nvidia_drm
nvidia 17313792 1 nvidia_modeset
[...]
This way, I have access to the nouveau folder:
# ls /sys/module/nouveau/drivers/pci\:nouveau
0000:02:00.0 bind module new_id remove_id uevent unbind
The lspci shows:
# lspci | grep VGA
02:00.0 VGA compatible controller: NVIDIA Corporation GT218M [GeForce 315M] (rev a2)
03:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
and then, I get this weird error:
# echo 0000:03:00.0 > /sys/module/nouveau/drivers/pci\:nouveau/unbind
-bash: echo: write error: No such device
I just don't get it...
(Btw, sorry, it is a 1050, not a 1060. I was hesitating between both during my spending adventure...)
Offline
# ls /sys/module/nouveau/drivers/pci\:nouveau
0000:02:00.0 bind module new_id remove_id uevent unbind
That suggest only the 315M is bound to the nouveau module .
check the output of lspci -k | grep VGA
Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.
(A works at time B) && (time C > time B ) ≠ (A works at time C)
Offline
That suggest only the 315M is bound to the nouveau module .
That's how it looks.
I was probably wrong saying that the first driver to load will grab both GPUs. Actually, since the old GPU is no longer supported by NVIDIA, it seems entirely reasonable that if the proprietary driver loads first, it will completely ignore that old GPU and then nouveau will be able pick it up later.
So nvidia comes first and binds to the 1050Ti, then nouveau takes the internal GPU. The display is driven by nouveau and, guess what, I think there is a chance that cuda may work on the 1050Ti.
I suggest the following checks
dmesg |grep -i nvidia it may say something about what GPUs it found and enabled
ls /sys/module/nvidia/drivers/pci\:nvidia I think this or similar directory should exists and contain 0000:03:00.0
ls /sys/module/nouveau/drivers/pci\:nouveau this directory still shouldn't contain 0000:03:00.0
nvidia-smi -L from nvidia-utils package, lists all GPUs currently accessible through the proprietary driver
try to run any random cuda demo
Offline
Good news, it works!
The X11 graphic display is however very slow. Like if it was rendered on CPU instead of GPU though... (By that, I even mean that mouving the mouse is not fluid, even with the computer doing nothing.)
How to check that the old GPU is effectively used for X rendering? (The KDE desktop is launched through startx.)
Offline
Actually, on nvidia-smi, I have this line:
0 911 G /usr/bin/plasmashell 45MiB
That means that X11 is using the wrong GPU. How to change that?
Offline
Set up a xorg config that explicitly sets up the device X should render to, something like
Section "Device"
Identifier "nouveau"
#alternatively if installed, instead of the modesetting
#Driver "nouveau"
Driver "modesetting"
BusID "PCI:0:2:0"
EndSection
Offline
@V1del:
I was afraid of seeing an answer that would be generating a Xorg.conf. I most of the time have a bad experience with them, for some reason. But it went quite smooth this time!
I regenerated the Xorg file with:
# Xorg :0 -configure
Then, I deleted everything related to the new NVidia GPU. It gave the following configuration:
Section "ServerLayout"
Identifier "X.org Configured"
Screen 0 "Screen0" 0 0
Screen 1 "Screen1" RightOf "Screen0"
InputDevice "Mouse0" "CorePointer"
InputDevice "Keyboard0" "CoreKeyboard"
EndSection
Section "Files"
ModulePath "/usr/lib/xorg/modules"
FontPath "/usr/share/fonts/misc"
FontPath "/usr/share/fonts/TTF"
FontPath "/usr/share/fonts/OTF"
FontPath "/usr/share/fonts/Type1"
FontPath "/usr/share/fonts/100dpi"
FontPath "/usr/share/fonts/75dpi"
EndSection
Section "Module"
Load "glx"
EndSection
Section "InputDevice"
Identifier "Keyboard0"
Driver "kbd"
EndSection
Section "InputDevice"
Identifier "Mouse0"
Driver "mouse"
Option "Protocol" "auto"
Option "Device" "/dev/input/mice"
Option "ZAxisMapping" "4 5 6 7"
EndSection
Section "Monitor"
Identifier "Monitor0"
VendorName "Monitor Vendor"
ModelName "Monitor Model"
EndSection
Section "Monitor"
Identifier "Monitor1"
VendorName "Monitor Vendor"
ModelName "Monitor Model"
EndSection
Section "Device"
### Available Driver options are:-
### Values: <i>: integer, <f>: float, <bool>: "True"/"False",
### <string>: "String", <freq>: "<f> Hz/kHz/MHz",
### <percent>: "<f>%"
### [arg]: arg optional
#Option "SWcursor" # [<bool>]
#Option "HWcursor" # [<bool>]
#Option "NoAccel" # [<bool>]
#Option "ShadowFB" # [<bool>]
#Option "VideoKey" # <i>
#Option "WrappedFB" # [<bool>]
#Option "GLXVBlank" # [<bool>]
#Option "ZaphodHeads" # <str>
#Option "PageFlip" # [<bool>]
#Option "SwapLimit" # <i>
#Option "AsyncUTSDFS" # [<bool>]
#Option "AccelMethod" # <str>
#Option "DRI" # <i>
Identifier "Card0"
Driver "nouveau"
BusID "PCI:2:0:0"
EndSection
Section "Screen"
Identifier "Screen0"
Device "Card0"
Monitor "Monitor0"
SubSection "Display"
Viewport 0 0
Depth 1
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 4
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 8
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 15
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 16
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 24
EndSubSection
EndSection
Section "Screen"
Identifier "Screen1"
Device "Card1"
Monitor "Monitor1"
SubSection "Display"
Viewport 0 0
Depth 1
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 4
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 8
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 15
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 16
EndSubSection
SubSection "Display"
Viewport 0 0
Depth 24
EndSubSection
EndSection
And that works! Thanks you all guys!
Offline
Summary of the steps to solve the problem (for the next ones to come):
- install nouveau. Make a working configuration for the X display.
- install the nvidia driver for the new card, which is not compatible to the old one.
- unload (blacklist) all nouveau modules in a tty console. The X display should not be loaded nor the nouveau module at startup.
- load the nvidia module.
- Then load nouveau module (after nvidia module. The order is important.)
- Then edit X configurations file to use only the old GPU for the display.
- Then startx!
Offline
That's cumbersome. I would definitely give this a try, it should load the proprietary module before nouveau is loaded automatically.
Offline
FWIW regarding the xorg config, you really don't need all that cruft, my example as written should suffice and contain everything you need to make xorg do the correct thing.
Offline