You are not logged in.
Recently, after an update CUDA hasn't been working at all on the linux-ck kernel. Trying to use OBS with NVENC enabled doesn't work, using CUDA with TensorFlow doesn't work either, etc. etc. However, as soon as I boot into the standard linux kernel, everything works fine.
I'm using nvidia-dkms so the driver can work between both kernels. I do have linux-ck-headers and dkms seems to install fine to it, so nothing else seems wrong except the fact that no part of CUDA works when using the linux-ck kernel.
Any help is appreciated.
EDIT: Tried on linux-zen and it seems to work fine there, so a bit clueless on why it won't work on linux-ck
Last edited by SilverMight (2018-08-01 17:13:41)
Offline

Are you using modprobed-db also? Maybe you are missing a module. If you can share the project I can try reproducing it locally.
Last edited by inglor (2018-07-30 10:26:14)
Offline
Are you using modprobed-db also? Maybe you are missing a module. If you can share the project I can try reproducing it locally.
Don't believe so, however I have tried running nvidia-modprobe to no avail. I'll give that a shot.
The kernel can be found at https://aur.archlinux.org/packages/linux-ck/
Offline

The kernel can be found at https://aur.archlinux.org/packages/linux-ck/
Sorry I wasn't clear. If you tell me the steps to reproduce it I can give it a go on my PC which I have linux-ck with DKMS and CUDA avalaible. This is why I was asking for a project.
[..]Trying to use OBS with NVENC enabled doesn't work, using CUDA with TensorFlow doesn't work either, etc. etc. However, as soon as I boot into the standard linux kernel, everything works fine. [..]
Is this coming from a project ?
Offline
My bad, yes. TensorFlow is pretty large so I'd recommend installing OBS (sudo pacman -S obs-studio), going to File -> Settings and then Output and then changing the recording encoder from Software to Hardware (NVENC), then hit start recording.
Offline
Same here.
When I use linux-ck-haswell in repo-ck. I can't run the deviceQuery in cuda samples:
$ cd "cuda sample's directory"
$ ./bin/x86_64/linux/release/deviceQuery
./bin/x86_64/linux/release/deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAILHowever I have install nvidia-dkms and it works:
$ lsmod | grep nvidia
nvidia_drm             45056  10
nvidia_modeset       1093632  8 nvidia_drm
nvidia              14061568  825 nvidia_modeset
drm_kms_helper        196608  2 nvidia_drm,i915
drm                   466944  13 drm_kms_helper,nvidia_drm,i915
ipmi_msghandler        57344  2 ipmi_devintf,nvidiaIf I switch back to linux kernel then cuda works fine.
Offline
Same here.
When I use linux-ck-haswell in repo-ck. I can't run the deviceQuery in cuda samples:$ cd "cuda sample's directory" $ ./bin/x86_64/linux/release/deviceQuery ./bin/x86_64/linux/release/deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 30 -> unknown error Result = FAILHowever I have install nvidia-dkms and it works:
$ lsmod | grep nvidia nvidia_drm 45056 10 nvidia_modeset 1093632 8 nvidia_drm nvidia 14061568 825 nvidia_modeset drm_kms_helper 196608 2 nvidia_drm,i915 drm 466944 13 drm_kms_helper,nvidia_drm,i915 ipmi_msghandler 57344 2 ipmi_devintf,nvidiaIf I switch back to linux kernel then cuda works fine.
Just tried that and got the same results as you.
./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
cudaGetDeviceCount returned 30
-> unknown error
Result = FAILOffline

Same here  (linux-ck with nvidia-dkms and supposed to be working fine). Could it be that nvidia-dkms (the module) is build with gcc8 and Cuda only supports gcc7?
 (linux-ck with nvidia-dkms and supposed to be working fine). Could it be that nvidia-dkms (the module) is build with gcc8 and Cuda only supports gcc7?
Offline
Same here
(linux-ck with nvidia-dkms and supposed to be working fine). Could it be that nvidia-dkms (the module) is build with gcc8 and Cuda only supports gcc7?
I don't think so, since it works fine on any other kernel except the -ck one.
Offline

Enabled NUMA on the linux-ck kernel, recompile and works fine.
$ ./deviceQuery
./deviceQuery Starting...
 CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "GeForce GTX 1080"
  CUDA Driver Version / Runtime Version          9.2 / 9.2
  CUDA Capability Major/Minor version number:    6.1
  Total amount of global memory:                 8116 MBytes (8510701568 bytes)
  (20) Multiprocessors, (128) CUDA Cores/MP:     2560 CUDA Cores
  GPU Max Clock rate:                            1835 MHz (1.84 GHz)
  Memory Clock rate:                             5005 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  2048
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 66 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1
Result = PASS$ uname -a
Linux tiamat 4.17.11-1-ck #1 SMP PREEMPT Wed Aug 1 07:42:19 BST 2018 x86_64 GNU/LinuxOffline
Enabled NUMA on the linux-ck kernel, recompile and works fine.
$ ./deviceQuery ./deviceQuery Starting... CUDA Device Query (Runtime API) version (CUDART static linking) Detected 1 CUDA Capable device(s) Device 0: "GeForce GTX 1080" CUDA Driver Version / Runtime Version 9.2 / 9.2 CUDA Capability Major/Minor version number: 6.1 Total amount of global memory: 8116 MBytes (8510701568 bytes) (20) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores GPU Max Clock rate: 1835 MHz (1.84 GHz) Memory Clock rate: 5005 Mhz Memory Bus Width: 256-bit L2 Cache Size: 2097152 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: Yes Device PCI Domain ID / Bus ID / location ID: 0 / 66 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) > deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 9.2, CUDA Runtime Version = 9.2, NumDevs = 1 Result = PASS$ uname -a Linux tiamat 4.17.11-1-ck #1 SMP PREEMPT Wed Aug 1 07:42:19 BST 2018 x86_64 GNU/Linux
But in linux-ck's PKGBUILD it says that it's not recommend to enable this feature in single CPU platform.
Offline

With CUDA you are using the GPU as a processor, so it is not a single CPU platform anymore.
| alias CUTF='LANG=en_XX.UTF-8@POSIX ' |
Offline
Just compiled with NUMA, definitely fixed the issue. Thanks for the fix
Offline

I will comment the PKGBUILD for CUDA users and reference this discussion, thank you.
Offline