You are not logged in.

#1 2022-01-17 07:01:47

Dennis
Member
Registered: 2014-11-04
Posts: 56
Website

[gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

I have upgraded an older mainboard with a GTX 1660 Ti which I had previously used in another Arch system where it was fully operational, including CUDA.

Now on this machine with a fresh Arch installation, I can't seem to get it working with CUDA despite installing all the drivers and packages.

Kernel:

[dennis@0xDBServer ~]$ uname -a
Linux 0xDBServer 5.16.0-arch1-1 #1 SMP PREEMPT Mon, 10 Jan 2022 20:11:47 +0000 x86_64 GNU/Linux

CPU:

[dennis@0xDBServer ~]$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         36 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM)2 Quad CPU    Q6600  @ 2.40GHz
    CPU family:          6
    Model:               15
    Thread(s) per core:  1
    Core(s) per socket:  4
    Socket(s):           1
    Stepping:            11
    CPU max MHz:         2403.0000
    CPU min MHz:         1603.0000
    BogoMIPS:            4801.07
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ht 
                         tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl cpuid aperfmperf pni dtes64 monitor ds_c
                         pl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm pti tpr_shadow vnmi flexpriority vpid dtherm
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   128 KiB (4 instances)
  L1i:                   128 KiB (4 instances)
  L2:                    8 MiB (2 instances)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
Vulnerabilities:         
  Itlb multihit:         KVM: Mitigation: VMX disabled
  L1tf:                  Mitigation; PTE Inversion; VMX EPT disabled
  Mds:                   Vulnerable: Clear CPU buffers attempted, no microcode; SMT disabled
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Vulnerable
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, STIBP disabled, RSB filling
  Srbds:                 Not affected
  Tsx async abort:       Not affected

Mainboard:

[dennis@0xDBServer ~]$ sudo dmidecode
# dmidecode 3.3
Getting SMBIOS data from sysfs.
SMBIOS 2.5 present.
54 structures occupying 1991 bytes.
Table at 0x000FB4F0.

Handle 0x0000, DMI type 0, 24 bytes
BIOS Information
	Vendor: American Megatrends Inc.
	Version: V1.7
	Release Date: 07/29/2008
	Address: 0xF0000
	Runtime Size: 64 kB
	ROM Size: 512 kB
	Characteristics:
		ISA is supported
		PCI is supported
		PNP is supported
		APM is supported
		BIOS is upgradeable
		BIOS shadowing is allowed
		ESCD support is available
		Boot from CD is supported
		Selectable boot is supported
		BIOS ROM is socketed
		EDD is supported
		5.25"/1.2 MB floppy services are supported (int 13h)
		3.5"/720 kB floppy services are supported (int 13h)
		3.5"/2.88 MB floppy services are supported (int 13h)
		Print screen service is supported (int 5h)
		8042 keyboard services are supported (int 9h)
		Serial services are supported (int 14h)
		Printer services are supported (int 17h)
		CGA/mono video services are supported (int 10h)
		ACPI is supported
		USB legacy is supported
		LS-120 boot is supported
		ATAPI Zip drive boot is supported
		BIOS boot specification is supported
		Targeted content distribution is supported
	BIOS Revision: 8.13

Handle 0x0001, DMI type 1, 27 bytes
System Information
	Manufacturer: MSI
	Product Name: MS-7350
	Version: 1.0
	Serial Number: To Be Filled By O.E.M.
	UUID: Not Present
	Wake-up Type: Power Switch
	SKU Number: To Be Filled By O.E.M.
	Family: To Be Filled By O.E.M.

relevant packages:

[dennis@0xDBServer ~]$ pacman -Q | grep cuda
cuda 11.5.1-1
cuda-tools 11.5.1-1

[dennis@0xDBServer ~]$ pacman -Q | grep nvidia
lib32-nvidia-cg-toolkit 3.1-7
lib32-nvidia-utils 495.46-1
lib32-opencl-nvidia 495.46-1
nvidia-cg-toolkit 3.1-6
nvidia-dkms 495.46-2
nvidia-settings 495.46-2
nvidia-utils 495.46-2
opencl-nvidia 495.46-2

I have made sure the drivers are loaded and included in the initramfs:

[dennis@0xDBServer ~]$ cat /etc/mkinitcpio.conf | grep MODULES=\(nvidia
MODULES=(nvidia nvidia_modeset nvidia_uvm nvidia_drm)

[dennis@0xDBServer ~]$ lsmod | grep nvidia
nvidia_drm             73728  2
nvidia_uvm           2560000  0
nvidia_modeset       1155072  4 nvidia_drm
nvidia              36970496  175 nvidia_uvm,nvidia_modeset

I have also checked that I am in the video group because I recalled this used to be necessary in the past for some things:

[dennis@0xDBServer ~]$ groups
games systemd-journal video uucp lp input audio wheel dennis

One of the packages also said upon installation to try nvidia-modprobe if CUDA is not available, so I added that to my autostart:

[dennis@0xDBServer ~]$ cat ~/.config/lxsession/LXDE/autostart 
@lxpanel --profile LXDE
@pcmanfm --desktop --profile LXDE
@xscreensaver -no-splash
@sudo nvidia-modprobe -c 0 -u
@conky

Screenshot from nvidia-settings stating CUDA cores are there:
nvidia-settings

Additional version info via nvidia-smi:

[dennis@0xDBServer ~]$ nvidia-smi
Mon Jan 17 07:04:55 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 495.46       Driver Version: 495.46       CUDA Version: 11.5     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
|  0%   42C    P8     9W / 130W |    140MiB /  5943MiB |      3%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       402      G   /usr/lib/Xorg                     138MiB |
+-----------------------------------------------------------------------------+

And yet despite all that checked, CUDA is not available (e.g. in Blender) and also the deviceQuery example fails:

[dennis@0xDBServer ~]$ /opt/cuda/samples/1_Utilities/deviceQuery/deviceQuery 
/opt/cuda/samples/1_Utilities/deviceQuery/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 802
-> system not yet initialized
Result = FAIL

What am I missing?
Could the mainboard be too old for CUDA?

(there was an older thread from 2020 where CUDA was unavailable in a specific Kernel version https://bbs.archlinux.org/viewtopic.php?id=260036 could this be the case again here?)

Last edited by Dennis (2022-01-18 14:10:23)

Offline

#2 2022-01-17 08:02:07

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,719

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

Did you enable passwordless sudo execution for that modprobe command to succeed? Might want to do something akin to https://wiki.archlinux.org/title/NVIDIA … with_NVENC instead.

Offline

#3 2022-01-17 08:06:19

Dennis
Member
Registered: 2014-11-04
Posts: 56
Website

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

V1del wrote:

Did you enable passwordless sudo execution for that modprobe command to succeed? Might want to do something akin to https://wiki.archlinux.org/title/NVIDIA … with_NVENC instead.

Yes I have passwordless execution of that command in my sudoers file. I have also already tried that udev rule you linked. I even tried additionally running the nvidia-modprobe command manually before trying to access CUDA.

Offline

#4 2022-01-17 14:41:03

Dennis
Member
Registered: 2014-11-04
Posts: 56
Website

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

While I was setting up dual boot on that same machine (first tried Windows10 but that did not even let me install the Nvidia Drivers at all, so I tried Win8.1) I tried CUDA in Windows(8.1) and it worked with Nvidia Drivers 472.xxxx ... maybe I need to try an older Kernel and an older Nvidia Driver Version for Arch as well. I saw 470 drivers series in the AUR.

Offline

#5 2022-01-17 20:55:15

Dennis
Member
Registered: 2014-11-04
Posts: 56
Website

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

I have tried linux-lts510 and the nvidia-470xx packages from AUR now but the issue remains the same. I do not know what else I should be looking for.

Offline

#6 2022-01-17 21:31:03

V1del
Forum Moderator
Registered: 2012-10-16
Posts: 21,719

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

Only suggestion I have is whether you need to check/enable the nvidia-persistenced systemd service or so, seeing as you probably don't intend to have xorg running all the time and afaik the nvidia driver basically suspending the card when xorg isn't running, though you apparently do have it active in these first attempts. The error message from the CUDA sample also reads like it's "just" not entirely initialized by the time you try to run the example.

Offline

#7 2022-01-18 07:51:55

Dennis
Member
Registered: 2014-11-04
Posts: 56
Website

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

I have enabled and started the nvidia-persistenced systemd service as you suggested and I also added my user to the nvidia-persistenced group (because the files under /var/run/nvidia-persistenced use that group as owner). Also ensured again that access to the files under /dev/nvidia* is available.

Unfortunately the issue remains. I tried inside and outside xorg.

However, a "dmesg | grep error" revealed "nvidia-nvswitch: probe of 0000:00:14.0 failed with error -22". I will have to investigate what that means.

Last edited by Dennis (2022-01-18 07:52:13)

Offline

#8 2022-01-18 14:12:15

Dennis
Member
Registered: 2014-11-04
Posts: 56
Website

Re: [gave up] CUDA not available (5.16.0-arch1-1 with nvidia-dkms 495)?

Well, I have given up on this combination of mainboard and GPU and replaced it with an even older GeForce GTS 8800 and I am using the nouveau driver for that. Just can't use that machine for CUDA things. Thanks again for all the suggestions.

Offline

Board footer

Powered by FluxBB