Tensorflow and CUDA

grneir · 2024-06-02 16:57:55

I have installed the latest versions of CUDA and python-tensorflow but when I import tensorflow in python the GPU (EDIT: 2070 Super) is not recognized as a device. I summarize my current configuration

$ pacman -Q|grep nvidia
nvidia 550.78-7
nvidia-utils 550.78-1
opencl-nvidia 550.78-1

$ pacman -Q|grep cuda
cuda 12.5.0-1
cuda-tools 12.5.0-1

$ pacman -Q cudnn
cudnn 9.1.1.17-1

$ pacman -Q|grep tensorflow
python-tensorflow-cuda 2.16.1-6
python-tensorflow-estimator 2.15.0-2
tensorflow-cuda 2.16.1-6

In python:

$ python 
Python 3.12.3 (main, Apr 23 2024, 09:16:07) [GCC 13.2.1 20240417] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
2024-06-02 18:51:51.474493: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
>>> tf.config.list_logical_devices()
2024-06-02 18:51:55.765132: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-02 18:51:55.895450: W tensorflow/core/common_runtime/gpu/gpu_device.cc:2251] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
[LogicalDevice(name='/device:CPU:0', device_type='CPU')]
>>>

Is there anything I am obviously missing?

EDIT: I forgot to add the CUDA customary test

$ ./deviceQuery 
./deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 2070 SUPER"
  CUDA Driver Version / Runtime Version          12.4 / 12.5
  CUDA Capability Major/Minor version number:    7.5
  Total amount of global memory:                 7960 MBytes (8346206208 bytes)
  (040) Multiprocessors, (064) CUDA Cores/MP:    2560 CUDA Cores
  GPU Max Clock rate:                            1800 MHz (1.80 GHz)
  Memory Clock rate:                             7001 Mhz
  Memory Bus Width:                              256-bit
  L2 Cache Size:                                 4194304 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        65536 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1024
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 3 copy engine(s)
  Run time limit on kernels:                     No
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 7 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.4, CUDA Runtime Version = 12.5, NumDevs = 1
Result = PASS

Last edited by grneir (2024-06-02 19:14:41)

yataro · 2024-06-02 19:35:46

python-tensorflow-cuda needs rebuild because cudnn bumped its soversion. I'll file a bug for this

yataro · 2024-06-02 21:21:17

The bug turned out to be more complicated than a trivial soversion bumb . I reported it there: https://gitlab.archlinux.org/archlinux/ … -/issues/8

grneir · 2024-06-03 03:58:41

Indeed. Thank you for diagnosing it.

If I symbolically link /usr/lib/libcudnn.so.8 to /usr/lib/libcudnn.so.9, the GPU is recognized by tf.config.list_logical_devices(), but then a tensorflow model fit fails, the reason being that the source was compiled with cudnn 8.9.6.

Downgrading cudnn to 8.9.7 from the archive solved the problem.

FuzzySPb · 2024-06-03 14:00:04

grneir, thanks for your message. I stepped into the same thing today and indeed downgrade to cudnn 8.9.7.29-1 helped me solving the problem of GPU detection.

But... now I'm a bit confused about how to diagnose such issues. Because even now, when GPU is detected successfully, I see

2024-06-03 15:54:03.238039: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-03 15:54:03.460631: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
2024-06-03 15:54:03.461219: I external/local_xla/xla/stream_executor/cuda/cuda_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero. See more at https://github.com/torvalds/linux/blob/v6.0/Documentation/ABI/testing/sysfs-bus-pci#L344-L355
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'), PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]

I.e. it looks like a problem is still there... But initially I had the same message as you. So, just curious - how a problem with cudnn was found?

FuzzySPb · 2024-06-03 16:36:36

Actually it still doesn't fully work for me, something isn't right with CUDA. Yes, I can initialize tensorflow with GPU but my code crashes here or there, but always with the same error message:

/usr/include/c++/13.2.1/bits/stl_vector.h:1125: constexpr std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = pybind11::object; _Alloc = std::allocator<pybind11::object>; reference = pybind11::object&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

The code seems to be good as it worked fine previosly...

Last edited by FuzzySPb (2024-06-03 16:40:14)

grneir · 2024-06-04 00:41:27

I have not tested thoroughly the resulting configuration. I ran a relatively simple training session of a DNN model and the results matched essentially what I got on a different host with Ubuntu, driver 535 and CUDA 12.3.

But I'm curious, I have some 10 small examples and I'll run them all, maybe I can find some issue.

The safe approach might be to downgrade driver to 535, CUDA to 12.3 and CUDNN to 8. I had started doing that, but I stopped because there were quite a few dependencies, for example gcc12.

FuzzySPb · 2024-06-04 11:05:39

grneir wrote:

The safe approach might be to downgrade driver to 535, CUDA to 12.3 and CUDNN to 8. I had started doing that, but I stopped because there were quite a few dependencies, for example gcc12.

Yes, this isn't feasible for me due to gcc12 dependency...
I'll try to switch GPU and run on CPU but never did this trick (need to google how) and not sure that it will help with this kind of underlying error.

Last edited by FuzzySPb (2024-06-04 11:06:36)

yataro · 2024-06-15 15:11:56

Bug is fixed now, you should try again!

crocowhile · 2024-06-15 18:30:37

Where is this fixed? The package being distributed is still 9.1.1.17-1

FuzzySPb · 2024-06-15 20:18:10

yataro wrote:

Bug is fixed now, you should try again!

My pacman doesn't see updated version yet, probably mirrors need some time to sync. But thanks for your help anyway, much appreciated.

Last edited by FuzzySPb (2024-06-15 20:18:26)

yataro · 2024-06-15 22:08:16

crocowhile wrote:

Where is this fixed? The package being distributed is still 9.1.1.17-1

It's not cudnn issue but issue with tensorflow-cuda build.

yataro · 2024-06-15 22:29:43

FuzzySPb wrote:

My pacman doesn't see updated version yet, probably mirrors need some time to sync. But thanks for your help anyway, much appreciated.

In case of confusion, you should look for an update for tensorflow-cuda/tensorflow-cuda-python

FuzzySPb · 2024-06-19 16:54:54

yataro wrote:

[In case of confusion, you should look for an update for tensorflow-cuda/tensorflow-cuda-python

Yes, no confusion with it. Just my repo was 4 days behind due to some reason...

But I still have a problem I removed re-installed python-tensorflow-opt-cuda with all dependencies. It didn't change the behavior and I still get an error:

/usr/include/c++/13.2.1/bits/stl_vector.h:1125: constexpr std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = pybind11::object; _Alloc = std::allocator<pybind11::object>; reference = pybind11::object&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.
Aborted (core dumped)

It is a bit strange for me because:
- I have gcc13 installed as cuda dependency, but I have the recent one - 13.3.0-1. So reference to 13.2.1 looks a bit strange.
- I tried downgrading gcc13 to 13.2.1 version but it didn't change anything.
- But it appears it is related with pybind11 package somehow. I have it also with recent version 2.12.0-4. And it appears to be a dependency of tensorflow-opt-cuda.

So i'm a bit lost now thinking about what is broken in my system...
The list of packages that were re-installed:
cuda-12.5.0-1 cudnn-9.1.1.17-1 gcc13-13.3.0-1 gcc13-libs-13.3.0-1 pybind11-2.12.0-4 python-keras-3.3.3-1 python-keras-applications-1.0.8-10 python-pycuda-2024.1-2 python-tensorboard_plugin_wit-1.8.1-8 python-tensorflow-estimator-2.15.0-2 tensorboard-2.16.2-2 tensorflow-opt-cuda-2.16.1-8 python-tensorflow-opt-cuda-2.16.1-8

--- UPDATED ---
I also tried packages without "opt" (i.e. tensorflow-cuda and python-tensorflow-cuda - it fails with the same error).
Ok, finally I intalled simple tensorflow and python-tensorflow but it fails with the same error... So it is not about CUDA. But maybe someone will be able to give me hint of what might went wrong with my system?...

Last edited by FuzzySPb (2024-06-19 17:17:32)

yataro · 2024-06-19 21:18:40

@FuzzySPb Can you try it on a clean system/vm?
It's hard to say anything without call stack, can you run it via gdb with breakpoint at abort?

FuzzySPb · 2024-06-19 21:39:09

I don't have another system with nvidia, so clean run isn't a realistic option.
But I think I may run it via gdb tomorrow (If you may advice a right command it would be useful as I'll need googling otherwise).

yataro · 2024-06-19 21:48:13

But you said you still have a problem with regular tensorflow? I guess you can run it without nvidia/cuda with regular one.

gdb --args program ...
> run
# trigger/wait for crash...
> bt

Last edited by yataro (2024-06-19 21:58:13)

FuzzySPb · 2024-06-20 07:10:12

yataro wrote:

But you said you still have a problem with regular tensorflow? I guess you can run it without nvidia/cuda with regular one.

Yes, this is true and I did last tests with simple tensorflow.
Here is a gdb output (a bit tricky for me to read id for the first time... so I'm posting here and will try to understand by myself also). And thanks for example of the command.

[username@hostname test_folder]$ gdb --args python model.py
GNU gdb (GDB) 14.2
Copyright (C) 2023 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...

This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.archlinux.org>
Enable debuginfod for this session? (y or [n]) y
Debuginfod has been enabled.
To make this setting permanent, add 'set debuginfod enabled on' to .gdbinit.
Reading symbols from /home/username/.cache/debuginfod_client/a6ccdcbdebe1cd722ec11273c98fb76c19a1ce22/debuginfo...
(gdb) run
Starting program: /usr/bin/python model.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
2024-06-20 08:06:00.320313: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
[Detaching after vfork from child process 4814]
[Detaching after vfork from child process 4816]
[New Thread 0x7fff78c006c0 (LWP 4817)]
/usr/lib/python3.12/site-packages/keras/src/layers/convolutional/base_conv.py:107: UserWarning: Do not pass an `input_shape`/`input_dim` argument to a layer. When using Sequential models, prefer using an `Input(shape)` object as the first layer in the model instead.
  super().__init__(activity_regularizer=activity_regularizer, **kwargs)
/usr/include/c++/13.2.1/bits/stl_vector.h:1125: constexpr std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = pybind11::object; _Alloc = std::allocator<pybind11::object>; reference = pybind11::object&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

Thread 1 "python" received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44            return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
(gdb) bt
#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x00007ffff76a8eb3 in __pthread_kill_internal (threadid=<optimized out>, signo=6) at pthread_kill.c:78
#2  0x00007ffff7650a30 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007ffff76384c3 in __GI_abort () at abort.c:79
#4  0x00007ffff6ad2d60 in std::__glibcxx_assert_fail (file=<optimized out>, line=<optimized out>, function=<optimized out>, condition=<optimized out>)
    at /usr/src/debug/gcc/gcc/libstdc++-v3/src/c++11/assert_fail.cc:41
#5  0x00007fffcb6e111f in ?? () from /usr/lib/python3.12/site-packages/optree/_C.cpython-312-x86_64-linux-gnu.so
#6  0x00007fffcb6d5971 in ?? () from /usr/lib/python3.12/site-packages/optree/_C.cpython-312-x86_64-linux-gnu.so
#7  0x00007fffcb6c0e8e in ?? () from /usr/lib/python3.12/site-packages/optree/_C.cpython-312-x86_64-linux-gnu.so
#8  0x00007fffcb6ace4a in ?? () from /usr/lib/python3.12/site-packages/optree/_C.cpython-312-x86_64-linux-gnu.so
#9  0x00007ffff79b0a87 in cfunction_call (func=0x7fffcbc7a7f0, args=0x7fffa9125900, kwargs=0x0) at Objects/methodobject.c:537
#10 0x00007ffff7980abb in _PyObject_MakeTpCall (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x7fffcbc7a7f0, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:240
#11 0x00007ffff798931f in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2714
#12 0x00007ffff7983846 in _PyEval_EvalFrame (tstate=0x7ffff7e29438 <_PyRuntime+459704>, frame=0x7ffff7f97310, throwflag=0) at ./Include/internal/pycore_ceval.h:89
#13 _PyEval_Vector (tstate=0x7ffff7e29438 <_PyRuntime+459704>, func=0x7fffcb41e340, locals=0x0, args=0x7fffffffd510, argcount=<optimized out>, kwnames=0x0) at Python/ceval.c:1683
#14 _PyFunction_Vectorcall (func=0x7fffcb41e340, stack=0x7fffffffd510, nargsf=<optimized out>, kwnames=0x0) at Objects/call.c:419
#15 _PyObject_FastCallDictTstate (tstate=<optimized out>, callable=0x7fffcb41e340, args=0x7fffffffd510, nargsf=<optimized out>, kwargs=<optimized out>) at Objects/call.c:133
#16 0x00007ffff79bfaf7 in _PyObject_Call_Prepend (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x7fffcb41e340, obj=0x7fffa961ec30, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwargs=0x7fffa91259c0)
    at Objects/call.c:508
#17 slot_tp_init (self=0x7fffa961ec30, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwds=0x7fffa91259c0) at Objects/typeobject.c:9020
#18 0x00007ffff7980e71 in type_call (type=<optimized out>, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwds=0x7fffa91259c0) at Objects/typeobject.c:1676
#19 0x00007ffff79c2a5a in _PyObject_Call (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x55555911b770, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwargs=<optimized out>) at Objects/call.c:367
#20 0x00007ffff798e71b in PyCFunction_Call (callable=0x55555911b770, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwargs=0x7fffa91259c0) at Objects/call.c:387
#21 _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:3262
#22 0x00007ffff79838e1 in _PyObject_FastCallDictTstate (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x7fffcb41ca40, args=0x7fffffffd890, nargsf=<optimized out>, kwargs=<optimized out>)
    at Objects/call.c:144
#23 0x00007ffff79bfaf7 in _PyObject_Call_Prepend (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x7fffcb41ca40, obj=0x7fffab939280, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwargs=0x7fffa91256c0)
    at Objects/call.c:508
#24 slot_tp_init (self=0x7fffab939280, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwds=0x7fffa91256c0) at Objects/typeobject.c:9020
#25 0x00007ffff7980a68 in type_call (type=<optimized out>, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwds=0x7fffa91256c0) at Objects/typeobject.c:1676
#26 _PyObject_MakeTpCall (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x55555911c670, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:240
#27 0x00007ffff798931f in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2714
#28 0x00007ffff79838e1 in _PyObject_FastCallDictTstate (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x7fffcae8a7a0, args=0x7fffffffdbd0, nargsf=<optimized out>, kwargs=<optimized out>)
    at Objects/call.c:144
#29 0x00007ffff79bfaf7 in _PyObject_Call_Prepend (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x7fffcae8a7a0, obj=0x7fffa98e8740, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwargs=0x7fffa90ce7c0)
    at Objects/call.c:508
#30 slot_tp_init (self=0x7fffa98e8740, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwds=0x7fffa90ce7c0) at Objects/typeobject.c:9020
#31 0x00007ffff7980a68 in type_call (type=<optimized out>, args=0x7ffff7dcba98 <_PyRuntime+76312>, kwds=0x7fffa90ce7c0) at Objects/typeobject.c:1676
#32 _PyObject_MakeTpCall (tstate=0x7ffff7e29438 <_PyRuntime+459704>, callable=0x5555593d39e0, args=<optimized out>, nargs=<optimized out>, keywords=<optimized out>) at Objects/call.c:240
#33 0x00007ffff798931f in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2714
#34 0x00007ffff7a4d0f5 in PyEval_EvalCode (co=0x555555651740, globals=<optimized out>, locals=0x7ffff760dd00) at Python/ceval.c:578
#35 0x00007ffff7a703ea in run_eval_code_obj (tstate=tstate@entry=0x7ffff7e29438 <_PyRuntime+459704>, co=co@entry=0x555555651740, globals=globals@entry=0x7ffff760dd00, locals=locals@entry=0x7ffff760dd00)
    at Python/pythonrun.c:1722
#36 0x00007ffff7a6b2ef in run_mod (mod=mod@entry=0x5555556698e8, filename=filename@entry=0x7ffff71720d0, globals=globals@entry=0x7ffff760dd00, locals=locals@entry=0x7ffff760dd00,
    flags=flags@entry=0x7fffffffe0d0, arena=arena@entry=0x7ffff752fe10) at Python/pythonrun.c:1743
#37 0x00007ffff7a85924 in pyrun_file (fp=fp@entry=0x555555591c10, filename=filename@entry=0x7ffff71720d0, start=start@entry=257, globals=globals@entry=0x7ffff760dd00, locals=locals@entry=0x7ffff760dd00,
    closeit=closeit@entry=1, flags=0x7fffffffe0d0) at Python/pythonrun.c:1643
--Type <RET> for more, q to quit, c to continue without paging--
#38 0x00007ffff7a84c51 in _PyRun_SimpleFileObject (fp=0x555555591c10, filename=0x7ffff71720d0, closeit=1, flags=0x7fffffffe0d0) at Python/pythonrun.c:433
#39 0x00007ffff7a8480f in _PyRun_AnyFileObject (fp=0x555555591c10, filename=0x7ffff71720d0, closeit=1, flags=0x7fffffffe0d0) at Python/pythonrun.c:78
#40 0x00007ffff7a7d034 in pymain_run_file_obj (program_name=0x7ffff71aef30, filename=0x7ffff71720d0, skip_source_first_line=0) at Modules/main.c:360
#41 pymain_run_file (config=0x7ffff7dcc018 <_PyRuntime+77720>) at Modules/main.c:379
#42 pymain_run_python (exitcode=0x7fffffffe0a4) at Modules/main.c:629
#43 Py_RunMain () at Modules/main.c:709
#44 0x00007ffff7a3860c in Py_BytesMain (argc=<optimized out>, argv=<optimized out>) at Modules/main.c:763
#45 0x00007ffff7639c88 in __libc_start_call_main (main=main@entry=0x555555555120 <main>, argc=argc@entry=2, argv=argv@entry=0x7fffffffe338) at ../sysdeps/nptl/libc_start_call_main.h:58
#46 0x00007ffff7639d4c in __libc_start_main_impl (main=0x555555555120 <main>, argc=2, argv=0x7fffffffe338, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe328)
    at ../csu/libc-start.c:360
#47 0x0000555555555045 in _start ()
(gdb) exit
A debugging session is active.

        Inferior 1 [process 4802] will be killed.

Quit anyway? (y or n) y
[username@hostname test_folder]$

FuzzySPb · 2024-06-20 07:22:08

If I got it right, I have libstdc++-v3 with version 11 while I should have 13?.. (can't re-check right now which package is installed)

yataro · 2024-06-20 20:31:45

There is a falling assertion in std::vector::operator[], it comes from optree (transitive dependency of tensorflow). Are you sure your system is not broken and fully updated? Is there any other simple model that you can test?

FuzzySPb · 2024-06-21 14:07:35

yataro wrote:

There is a falling assertion in std::vector::operator[], it comes from optree (transitive dependency of tensorflow). Are you sure your system is not broken and fully updated? Is there any other simple model that you can test?

Yes, my system is up to date, I'm pretty sure in it.
With regards to model - this one isn't something complex actually. I don't think it is about the model because I use it for 2-3 years already without modifications (I only re-train it from time to time) and it worked fine. But I didn't use it from mid of April and when I came back, in May, I found that my tensorflow can't use GPU anymore.
GPU problem was fixed after tensorflow update. But this assertion in std::vector::operator[] appeared. (I think I saw it when I switched of GPU for a test previously but had no time to look deeper in it before. So, probably this problem was introduced somewhere in April-May).

Is my system broken? Good question.
When I faced this topic with GPU I did one stupid thing. I thought it was about keras, so I tried to upgrade it via pip. Then I removed it by pip and installed back from arch repo. It was indeed a bad move and it affected packages namex optree keras.
Unfortunately, I don't remember have I seen this assertion before keras update or not.
I tried to remove all python-* packages and re-install them again. But problem is still here.

Last edited by FuzzySPb (2024-06-21 14:16:50)

FuzzySPb · 2024-06-21 16:09:51

yataro wrote:

@FuzzySPb Can you try it on a clean system/vm?

I managed to try it on a clean system and the problem was reproduced. I got exactly the same error on a different machine that never had python and tensorflow installed before.
I.e. it isn't my setup problem. I think it might be because compilation was done with gcc 13.2.1 while current repo has 13.3 but I'm not sure and I have no idea which package is actually guilty here.

Last edited by FuzzySPb (2024-06-24 08:05:27)

FuzzySPb · 2024-06-26 09:25:43

I've got new updated packages (tensorflow, pycuda) today and it brought completely new behavior and error message:

A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

I think it addresses my problem. But strange thing that I already have pybind11 2.12.0-6. Need to check which exact module brings this message and how to get rid of it.

Last edited by FuzzySPb (2024-06-26 09:27:58)

FuzzySPb · 2024-06-26 17:26:22

Ok, it seems it isn't solved.
I checked and found that numpy was upgraded to 2.0.0 only 4 days ago. So I did a quick rollback to 1.26.4 and it immediately brought back initial error.
So, with numpy 1.26.4 it gives

/usr/include/c++/13.2.1/bits/stl_vector.h:1125: constexpr std::vector<_Tp, _Alloc>::reference std::vector<_Tp, _Alloc>::operator[](size_type) [with _Tp = pybind11::object; _Alloc = std::allocator<pybind11::object>; reference = pybind11::object&; size_type = long unsigned int]: Assertion '__n < this->size()' failed.

And with numpy 2.0.0 it gives

ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.



A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.

Traceback (most recent call last):  File "./model.py", line 2, in <module>
    import tensorflow as tf
  File "/usr/lib/python3.12/site-packages/tensorflow/__init__.py", line 44, in <module>
    from tensorflow._api.v2 import __internal__
  File "/usr/lib/python3.12/site-packages/tensorflow/_api/v2/__internal__/__init__.py", line 8, in <module>
    from tensorflow._api.v2.__internal__ import autograph
  File "/usr/lib/python3.12/site-packages/tensorflow/_api/v2/__internal__/autograph/__init__.py", line 8, in <module>
    from tensorflow.python.autograph.core.ag_ctx import control_status_ctx # line: 34
  File "/usr/lib/python3.12/site-packages/tensorflow/python/autograph/core/ag_ctx.py", line 21, in <module>
    from tensorflow.python.autograph.utils import ag_logging
  File "/usr/lib/python3.12/site-packages/tensorflow/python/autograph/utils/__init__.py", line 17, in <module>
    from tensorflow.python.autograph.utils.context_managers import control_dependency_on_returns
  File "/usr/lib/python3.12/site-packages/tensorflow/python/autograph/utils/context_managers.py", line 19, in <module>
    from tensorflow.python.framework import ops
  File "/usr/lib/python3.12/site-packages/tensorflow/python/framework/ops.py", line 49, in <module>
    from tensorflow.python.client import pywrap_tf_session
  File "/usr/lib/python3.12/site-packages/tensorflow/python/client/pywrap_tf_session.py", line 19, in <module>
    from tensorflow.python.client._pywrap_tf_session import *
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/numpy/core/_multiarray_umath.py", line 44, in __getattr__
    raise ImportError(msg)
ImportError: 
A module that was compiled using NumPy 1.x cannot be run in
NumPy 2.0.0 as it may crash. To support both 1.x and 2.x
versions of NumPy, modules must be compiled with NumPy 2.0.
Some module may need to rebuild instead e.g. with 'pybind11>=2.12'.

If you are a user of the module, the easiest solution will be to
downgrade to 'numpy<2' or try to upgrade the affected module.
We expect that some modules will need time to support NumPy 2.


Traceback (most recent call last):
  File "./model.py", line 2, in <module>
    import tensorflow as tf
  File "/usr/lib/python3.12/site-packages/tensorflow/__init__.py", line 44, in <module>
    from tensorflow._api.v2 import __internal__
  File "/usr/lib/python3.12/site-packages/tensorflow/_api/v2/__internal__/__init__.py", line 8, in <module>
    from tensorflow._api.v2.__internal__ import autograph
  File "/usr/lib/python3.12/site-packages/tensorflow/_api/v2/__internal__/autograph/__init__.py", line 8, in <module>
    from tensorflow.python.autograph.core.ag_ctx import control_status_ctx # line: 34
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/tensorflow/python/autograph/core/ag_ctx.py", line 21, in <module>
    from tensorflow.python.autograph.utils import ag_logging
  File "/usr/lib/python3.12/site-packages/tensorflow/python/autograph/utils/__init__.py", line 17, in <module>
    from tensorflow.python.autograph.utils.context_managers import control_dependency_on_returns
  File "/usr/lib/python3.12/site-packages/tensorflow/python/autograph/utils/context_managers.py", line 20, in <module>
    from tensorflow.python.ops import tensor_array_ops
  File "/usr/lib/python3.12/site-packages/tensorflow/python/ops/tensor_array_ops.py", line 36, in <module>
    from tensorflow.python.ops import array_ops
  File "/usr/lib/python3.12/site-packages/tensorflow/python/ops/array_ops.py", line 22, in <module>
    from tensorflow.dtensor.python import api as d_api
  File "/usr/lib/python3.12/site-packages/tensorflow/dtensor/python/api.py", line 21, in <module>
    from tensorflow.dtensor.python import dtensor_device
  File "/usr/lib/python3.12/site-packages/tensorflow/dtensor/python/dtensor_device.py", line 33, in <module>
    from tensorflow.python.framework import sparse_tensor
  File "/usr/lib/python3.12/site-packages/tensorflow/python/framework/sparse_tensor.py", line 28, in <module>
    from tensorflow.python.framework import override_binary_operator
  File "/usr/lib/python3.12/site-packages/tensorflow/python/framework/override_binary_operator.py", line 24, in <module>
    from tensorflow.python.ops.numpy_ops import np_dtypes
  File "/usr/lib/python3.12/site-packages/tensorflow/python/ops/numpy_ops/np_dtypes.py", line 30, in <module>
    complex_ = np.complex_
               ^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/numpy/__init__.py", line 397, in __getattr__
    raise AttributeError(
AttributeError: `np.complex_` was removed in the NumPy 2.0 release. Use `np.complex128` instead.. Did you mean: 'complex64'?

And I have pybind11 2.12.0-6 installed.

The same situation is on a clean system.

Last edited by FuzzySPb (2024-06-26 17:26:57)

crocowhile · 2024-06-26 23:23:33

yataro wrote:

crocowhile wrote:
Where is this fixed? The package being distributed is still 9.1.1.17-1
It's not cudnn issue but issue with tensorflow-cuda build.

I see. I was confused because my problem was with the tensorflow cuda docker container. I can confirm that cudnn 9.1.1 works well with the latest https://hub.docker.com/layers/tensorflo … xt=explore.

Arch Linux

#1 2024-06-02 16:57:55

Tensorflow and CUDA

#2 2024-06-02 19:35:46

Re: Tensorflow and CUDA

#3 2024-06-02 21:21:17

Re: Tensorflow and CUDA

#4 2024-06-03 03:58:41

Re: Tensorflow and CUDA

#5 2024-06-03 14:00:04

Re: Tensorflow and CUDA

#6 2024-06-03 16:36:36

Re: Tensorflow and CUDA

#7 2024-06-04 00:41:27

Re: Tensorflow and CUDA

#8 2024-06-04 11:05:39

Re: Tensorflow and CUDA

#9 2024-06-15 15:11:56

Re: Tensorflow and CUDA

#10 2024-06-15 18:30:37

Re: Tensorflow and CUDA

#11 2024-06-15 20:18:10

Re: Tensorflow and CUDA

#12 2024-06-15 22:08:16

Re: Tensorflow and CUDA

#13 2024-06-15 22:29:43

Re: Tensorflow and CUDA

#14 2024-06-19 16:54:54

Re: Tensorflow and CUDA

#15 2024-06-19 21:18:40

Re: Tensorflow and CUDA

#16 2024-06-19 21:39:09

Re: Tensorflow and CUDA

#17 2024-06-19 21:48:13

Re: Tensorflow and CUDA

#18 2024-06-20 07:10:12

Re: Tensorflow and CUDA

#19 2024-06-20 07:22:08

Re: Tensorflow and CUDA

#20 2024-06-20 20:31:45

Re: Tensorflow and CUDA

#21 2024-06-21 14:07:35

Re: Tensorflow and CUDA

#22 2024-06-21 16:09:51

Re: Tensorflow and CUDA

#23 2024-06-26 09:25:43

Re: Tensorflow and CUDA

#24 2024-06-26 17:26:22

Re: Tensorflow and CUDA

#25 2024-06-26 23:23:33

Re: Tensorflow and CUDA

Board footer