python-tensorflow-cuda not seeing GPU

bquast · 2019-06-16 09:53:31

Hello,

I am trying to set up a new machine with python-tensorflow-cuda, but it will not pick up my GPU.

I am using the onboard GPU for x11 (it switched to this from wayland when I installed the nvidia drivers).

nvidia-smi picks up the GPU:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.26       Driver Version: 430.26       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2070    Off  | 00000000:02:00.0 Off |                  N/A |
| 20%   31C    P8    24W / 175W |      0MiB /  7982MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

But when I try from python:

>>> import tensorflow as tf
>>> tf.test.gpu_device_name()

I get:

2019-06-16 11:48:53.785108: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-16 11:48:53.815245: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3601000000 Hz
2019-06-16 11:48:53.815537: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5609983ba5f0 executing computations on platform Host. Devices:
2019-06-16 11:48:53.815550: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>

Apologies if I missed any rules, not used to posting in the forum.

Bastiaan

*EDIT*

I also have these diagnostics:

[bquast@home ~]$ python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
2019-06-16 14:29:38.713557: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-06-16 14:29:38.921056: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3601000000 Hz
2019-06-16 14:29:38.924513: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55922eb466b0 executing computations on platform Host. Devices:
2019-06-16 14:29:38.924541: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-16 14:29:38.925998: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-06-16 14:29:39.656600: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-16 14:29:39.657674: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55923006a470 executing computations on platform CUDA. Devices:
2019-06-16 14:29:39.657697: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2019-06-16 14:29:39.657813: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-16 14:29:39.658087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:02:00.0
2019-06-16 14:29:39.658160: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcudart.so.10.0'; dlerror: libcudart.so.10.0: cannot open shared object file: No such file or directory
2019-06-16 14:29:39.658193: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcublas.so.10.0'; dlerror: libcublas.so.10.0: cannot open shared object file: No such file or directory
2019-06-16 14:29:39.658222: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcufft.so.10.0'; dlerror: libcufft.so.10.0: cannot open shared object file: No such file or directory
2019-06-16 14:29:39.658249: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcurand.so.10.0'; dlerror: libcurand.so.10.0: cannot open shared object file: No such file or directory
2019-06-16 14:29:39.658277: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusolver.so.10.0'; dlerror: libcusolver.so.10.0: cannot open shared object file: No such file or directory
2019-06-16 14:29:39.658306: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcusparse.so.10.0'; dlerror: libcusparse.so.10.0: cannot open shared object file: No such file or directory
2019-06-16 14:29:39.660468: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-16 14:29:39.660489: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1663] Cannot dlopen some GPU libraries. Skipping registering GPU devices...
2019-06-16 14:29:39.660504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-16 14:29:39.660509: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2019-06-16 14:29:39.660512: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
''

Last edited by bquast (2019-06-18 14:07:23)

potatoe · 2019-06-17 05:47:51

I see the same thing. It was working for me with python-tensorflow-opt-cuda-1.13.1-5 but then stopped seeing my GPU and started reporting lack of CPU optimizations when I upgraded to 1.14.0rc1-1. It also reports False for tf.test.is_built_with_cuda(), which I think is reporting a compile-time flag.

I've filed a bug report at #62916, so someone will probably figure it out soon or point out what we're doing wrong. Downgrading to 1.13.1-5 works in the meantime.

Last edited by potatoe (2019-06-17 05:50:54)

bquast · 2019-06-18 14:06:00

I compiled it from source with Bazel (from Git master).

I now have this so it seems to work (only after a restart).

[bquast@home ~]$ python
Python 3.7.3 (default, Mar 26 2019, 21:43:19) 
[GCC 8.2.1 20181127] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> tf.test.gpu_device_name()
2019-06-18 16:04:29.221380: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-18 16:04:29.254772: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3601000000 Hz
2019-06-18 16:04:29.255350: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562e5107cfc0 executing computations on platform Host. Devices:
2019-06-18 16:04:29.255363: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2019-06-18 16:04:29.258328: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2019-06-18 16:04:29.368594: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-18 16:04:29.368916: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x562e51970450 executing computations on platform CUDA. Devices:
2019-06-18 16:04:29.368929: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2070, Compute Capability 7.5
2019-06-18 16:04:29.369016: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-18 16:04:29.369251: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1645] Found device 0 with properties: 
name: GeForce RTX 2070 major: 7 minor: 5 memoryClockRate(GHz): 1.62
pciBusID: 0000:02:00.0
2019-06-18 16:04:29.369800: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-06-18 16:04:29.377125: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2019-06-18 16:04:29.380814: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2019-06-18 16:04:29.382200: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2019-06-18 16:04:29.388848: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2019-06-18 16:04:29.390114: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2019-06-18 16:04:29.402750: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2019-06-18 16:04:29.402831: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-18 16:04:29.403134: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-18 16:04:29.403380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1773] Adding visible gpu devices: 0
2019-06-18 16:04:29.403491: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2019-06-18 16:04:29.404074: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1186] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-06-18 16:04:29.404083: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1192]      0 
2019-06-18 16:04:29.404087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1205] 0:   N 
2019-06-18 16:04:29.404203: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-18 16:04:29.404473: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-18 16:04:29.404784: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1331] Created TensorFlow device (/device:GPU:0 with 7484 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2070, pci bus id: 0000:02:00.0, compute capability: 7.5)
'/device:GPU:0'
>>>

potatoe · 2019-06-19 16:12:48

A package version bump (1.14.0rc1-2) has been released and fixes the problem for me (e.g. the -cuda package includes CUDA support again).

tf.test.is_built_with_cuda() and tf.test.is_gpu_available() both return True now, and it is indeed working and using the GPU for operations when training models, etc.

Also the CPU optimizations are back again (or at least I don't get the warning message about my CPU supporting more instructions than the binary was compiled for) with the python-tensorflow-opt-cuda package.

bquast · 2019-06-22 09:06:29

Ah thank you potatoe, I will try installing this way.

Also thanks for the *opt* point, I scanned what the difference was but I didn't actually find it, this makes sense.

Arch Linux

#1 2019-06-16 09:53:31

python-tensorflow-cuda not seeing GPU

#2 2019-06-17 05:47:51

Re: python-tensorflow-cuda not seeing GPU

#3 2019-06-18 14:06:00

Re: python-tensorflow-cuda not seeing GPU

#4 2019-06-19 16:12:48

Re: python-tensorflow-cuda not seeing GPU

#5 2019-06-22 09:06:29

Re: python-tensorflow-cuda not seeing GPU

Board footer