You are not logged in.

#1 2020-07-30 14:59:44

joanmanel
Member
Registered: 2012-11-06
Posts: 232

Tensorflow GPU won't work after system update (that updated cudnn)

So everything was going fine until this morning I did a system update. Last system update was last week.

Stuff that was installed that might impact tensorflow:

kernel from 5.7.9 to 5.7.10
cuda from 10.2 to 11.0
cudnn from 7.6 to 8.0
nvidia from 450.77-3 to 450-57-4
mesa from 20.1.3 to 20.1.4
pycuda-headers from 2019.1.2-5 to 2019.1.2-6
python from 3.8.3 to 3.8.4
tensorflow-opt-cuda from 2.2.0 to 2.3.0rc2-2

When I am creating the model and all that, there are new messages, but everything seems OK. When going to train I get the following error:

2020-07-27 12:58:16.554013: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcublas.so.11
2020-07-27 12:58:16.871243: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcudnn.so.8
2020-07-27 12:58:17.050553: F tensorflow/stream_executor/cuda/cuda_dnn.cc:1186] Check failed: cudnnSetRNNMatrixMathType(rnn_desc.get(), math_type) == CUDNN_STATUS_SUCCESS (3 vs. 0)
Aborted (core dumped) 

It seems the problem is directly related with cudnn, which made a big jump from 7.6 to 8.0. Based on the tensorflow website, the last working version for cudnn seems to be 7.6, but no idea if tf2.3 works with 8.0. Its not displayed as one of the tested builds (https://www.tensorflow.org/install/source#gpu)

Checking my nvidia-smi everything seems OK, it shows the right drivers and cuda version 11.0.

Training using CPU works ok, it only gives this error when I use GPU.

I posted that this was solved because downgrading it to 7.6 seemed to work, but it doesnt, downgrading it to 7.6 does not enable the GPU. Tensorflow "works" but no GPU accelerated.

When trying to run the code with all the up-to-date packages and cudnn 7.6 I get the following error:

2020-07-30 15:50:02.399292: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudnn.so.8'; dlerror: libcudnn.so.8: cannot open shared object file: No such file or directory
2020-07-30 15:50:02.399305: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1753] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...

So I guess something tried to still load cudnn8

Offline

Board footer

Powered by FluxBB