You are not logged in.

#1 2019-04-14 18:43:43

Stiege
Member
Registered: 2013-09-24
Posts: 10

Confused about loading shared objects

Hi all,

I've been trying to get tensorflow 2.0-alpha working, however in the course of that I've come across something I don't fully understand and would like someone to fill in the gaps.

In short, the issue is around why some shared libraries seem to be available to python and not others:

[stiege@archie ~]$ locate libcublas.so.10
/opt/cuda/targets/x86_64-linux/lib/libcublas.so.10
/opt/cuda/targets/x86_64-linux/lib/libcublas.so.10.1
/opt/cuda/targets/x86_64-linux/lib/libcublas.so.10.1.0
/opt/cuda/targets/x86_64-linux/lib/libcublas.so.10.1.0.105

[stiege@archie ~]$ ls -l /opt/cuda/lib64
lrwxrwxrwx 1 root root 24 Mar 26 23:16 /opt/cuda/lib64 -> targets/x86_64-linux/lib
[stiege@archie ~]$ cat /etc/ld.so.conf.d/cuda.conf 
/opt/cuda/lib64
/opt/cuda/nvvm/lib64
/opt/cuda/extras/CUPTI/lib64

[stiege@archie ~]$ ls -l /opt/cuda/lib64/ | grep libcublas.so
lrwxrwxrwx 1 root root        23 Mar 26 23:16 libcublas.so -> libcublas.so.10.1.0.105
lrwxrwxrwx 1 root root        23 Apr 14 19:12 libcublas.so.10 -> libcublas.so.10.1.0.105
lrwxrwxrwx 1 root root        23 Mar 26 23:16 libcublas.so.10.1 -> libcublas.so.10.1.0.105
lrwxrwxrwx 1 root root        23 Mar 26 23:16 libcublas.so.10.1.0 -> libcublas.so.10.1.0.105
-rwxr-xr-x 1 root root  78315120 Mar 26 23:16 libcublas.so.10.1.0.105
In [9]: cdll.LoadLibrary("libcublas.so.10") # works                                     
Out[9]: <CDLL 'libcublas.so.10', handle 55ef42256e30 at 0x7fead9e8efd0>

In [10]: cdll.LoadLibrary("libcublas.so.10.1") # fails                                 
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-10-7a08087e910f> in <module>
...
OSError: libcublas.so.10.1: cannot open shared object file: No such file or directory

My comment on tensorflow shows a bit more investigation - https://github.com/tensorflow/tensorflo … -482829280 - i.e. around how this works as expected for libcrypto; but I'm really unsure why I'm unable to load libcublas.so.10.1 . I think maybe this has something to do with ldconfig only processing libcublas.so.10; but why is this?

[stiege@archie ~]$ sudo ldconfig -v | grep cublas
ldconfig: Path `/usr/lib64' given more than once
ldconfig: Can't stat /usr/libx32: No such file or directory
	libcublas.so.10 -> libcublas.so.10.1.0.105
	libcublasLt.so.10 -> libcublasLt.so.10.1.0.105
[stiege@archie ~]$ ldconfig -p | grep cublas.so
	libcublas.so.10 (libc6,x86-64) => /opt/cuda/lib64/libcublas.so.10
	libcublas.so (libc6,x86-64) => /opt/cuda/lib64/libcublas.so

As per my second code snippet; I'd expect all of those in /opt/cuda/lib64/ to show up here, not just libcublas.so and libcublas.so.10.

From http://man7.org/linux/man-pages/man8/ld.so.8.html

From the cache file /etc/ld.so.cache, which contains a compiled
list of candidate shared objects previously found in the augmented
library path.

So it looks like all of those entries in /opt/cuda/lib64/ are supposed to appear in the cache, what am I missing?

Last edited by Stiege (2019-04-14 19:19:14)

Offline

#2 2019-04-14 19:51:16

loqs
Member
Registered: 2014-03-06
Posts: 17,194

Re: Confused about loading shared objects

https://bugs.archlinux.org/task/62282?  If you use tensorflow-cuda from extra does that have the issue?

Offline

#3 2019-04-14 19:54:21

Stiege
Member
Registered: 2013-09-24
Posts: 10

Re: Confused about loading shared objects

https://bugs.archlinux.org/task/62282?  If you use tensorflow-cuda from extra does that have the issue?

Thanks for your response but that's not really my question (although was how I discovered it). Tensorflow 2-alpha requires cuda 10.0; not 10.1 as I have. However while I understand that, I don't understand why some shared objects are appearing in /etc/ld.so.cache and others are not, but that by the documentation should be.

This is a good link though, so thank you; I'm especially confused by this:

... but it requires a new release of glibc (ldconfig)


---

And while this seems to be what you'd expect:

(venv36) [stiege@archie ~]$ sudo ln -s /opt/cuda/lib64/libcublas.so.10.1.0.105 /usr/lib/libcublas.so.10.0

---

In [1]: import ctypes                                                           

In [2]: ctypes.cdll.LoadLibrary("libcublas.so.10.0")                            
Out[2]: <CDLL 'libcublas.so.10.0', handle 562b65d9d750 at 0x7fb76c25ddd8>

Why on earth is ldconfig trying to make this link:

(venv36) [stiege@archie ~]$ sudo ldconfig -v | grep libcublas
[sudo] password for stiege: 
ldconfig: Path `/usr/lib64' given more than once
ldconfig: Can't stat /usr/libx32: No such file or directory
	libcublas.so.10 -> libcublas.so.10.1.0.105
	libcublasLt.so.10 -> libcublasLt.so.10.1.0.105
	libcublas.so.10 -> libcublas.so.10.0 (changed)

And now I have an extra entry in the cache? This persisted even after cleaning the cache, for as long as the symlink in /usr/lib/ remains.

(venv36) [stiege@archie ~]$ sudo ldconfig -p | grep libcublas
	libcublasLt.so.10 (libc6,x86-64) => /opt/cuda/lib64/libcublasLt.so.10
	libcublasLt.so (libc6,x86-64) => /opt/cuda/lib64/libcublasLt.so
	libcublas.so.10 (libc6,x86-64) => /opt/cuda/lib64/libcublas.so.10
	libcublas.so.10 (libc6,x86-64) => /usr/lib/libcublas.so.10
	libcublas.so (libc6,x86-64) => /opt/cuda/lib64/libcublas.so

What makes these entries so special with respect to libcublas.10.1?

(venv36) [stiege@archie ~]$ ldconfig -p | grep -v -P '.*\.so\.\d+\s' | grep -v -P '.*\.so\s' | grep -v 'usr\/lib'
3120 libs found in cache `/etc/ld.so.cache'
	libnvrtc.so.10.1 (libc6,x86-64) => /opt/cuda/lib64/libnvrtc.so.10.1
	libnvrtc-builtins.so.10.1 (libc6,x86-64) => /opt/cuda/lib64/libnvrtc-builtins.so.10.1
	libcupti.so.10.1 (libc6,x86-64) => /opt/cuda/extras/CUPTI/lib64/libcupti.so.10.1
	libcuinj64.so.10.1 (libc6,x86-64) => /opt/cuda/lib64/libcuinj64.so.10.1
	libcudart.so.10.1 (libc6,x86-64) => /opt/cuda/lib64/libcudart.so.10.1
	libaccinj64.so.10.1 (libc6,x86-64) => /opt/cuda/lib64/libaccinj64.so.10.1

And why is it the opposite situation for libcudart?

(venv36) [stiege@archie ~]$ locate libcudart.so
/opt/cuda/doc/man/man7/libcudart.so.7
/opt/cuda/targets/x86_64-linux/lib/libcudart.so
/opt/cuda/targets/x86_64-linux/lib/libcudart.so.10
/opt/cuda/targets/x86_64-linux/lib/libcudart.so.10.1
/opt/cuda/targets/x86_64-linux/lib/libcudart.so.10.1.105

---

In [8]: ctypes.cdll.LoadLibrary("libcudart.so.10.1")                            
Out[8]: <CDLL 'libcudart.so.10.1', handle 555ccda760a0 at 0x7f3ee4b9b390>

In [9]: ctypes.cdll.LoadLibrary("libcudart.so.10")                              
---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
...
OSError: libcudart.so.10: cannot open shared object file: No such file or directory

Last edited by Stiege (2019-04-15 21:12:49)

Offline

#4 2019-04-15 21:04:50

Stiege
Member
Registered: 2013-09-24
Posts: 10

Re: Confused about loading shared objects

I think I found the issue?

https://github.com/lattera/glibc/blob/m … #L884-L907

	     You should always do this:
		libfoo.so -> SONAME -> Arbitrary package-chosen name.
	     e.g. libfoo.so -> libfoo.so.1 -> libfooimp.so.9.99.
	     Given a SONAME of libfoo.so.1.
	     You should *never* do this:
		libfoo.so -> libfooimp.so.9.99

Whereas clearly in my first post:

[stiege@archie ~]$ ls -l /opt/cuda/lib64/ | grep libcublas.so
lrwxrwxrwx 1 root root        23 Mar 26 23:16 libcublas.so -> libcublas.so.10.1.0.105
lrwxrwxrwx 1 root root        23 Apr 14 19:12 libcublas.so.10 -> libcublas.so.10.1.0.105
lrwxrwxrwx 1 root root        23 Mar 26 23:16 libcublas.so.10.1 -> libcublas.so.10.1.0.105
lrwxrwxrwx 1 root root        23 Mar 26 23:16 libcublas.so.10.1.0 -> libcublas.so.10.1.0.105
-rwxr-xr-x 1 root root  78315120 Mar 26 23:16 libcublas.so.10.1.0.105

---
Submitted bug fix:
https://bugs.archlinux.org/task/62361

Last edited by Stiege (2019-04-15 21:12:58)

Offline

#5 2019-04-15 22:13:31

loqs
Member
Registered: 2014-03-06
Posts: 17,194

Re: Confused about loading shared objects

Stiege wrote:

I do not see a patch attached to that bug report.  It also does not specify the package version.

Offline

#6 2019-04-15 22:18:23

Stiege
Member
Registered: 2013-09-24
Posts: 10

Re: Confused about loading shared objects

Sorry, meant bug report. Working on some code now to modify the symlinks to add the intermediary SONAME.

Offline

#7 2019-04-16 21:30:16

Stiege
Member
Registered: 2013-09-24
Posts: 10

Re: Confused about loading shared objects

So not really sure in the end:

diff --git a/cuda/trunk/PKGBUILD b/cuda/trunk/PKGBUILD
index d11f84d7..9b93aff1 100644
--- a/cuda/trunk/PKGBUILD
+++ b/cuda/trunk/PKGBUILD
@@ -54,16 +54,16 @@ package() {
   # We have to be weird about this since for some reason the ELF SONAME is incorrect or at least partially incorrect for some libs.                                                              
   # Best we can do is copy those libs to *.so.10.1 variants, patchelf the SONAME and hope for the best.                                                                                          
   # Their installer used to perform this for us but now it's all manual and I think this is what we'll be stuck with for now.                                                                    
-  cd "${pkgdir}/opt/cuda/targets/x86_64-linux/lib"
-  find "${pkgdir}"/opt/cuda/targets/x86_64-linux/lib -type l -name "*.so.10" ! -path "*stubs/*" -print0 | while read -rd '' _lib; do                                                             
-    _current_soname=$(basename ${_lib})
-    if [ ! -f "${_current_soname}.1" ]; then
-      echo "copying ${_current_soname} to ${_current_soname}.1 version"
-      cp ${_current_soname} "${_current_soname}.1"
-      echo "patching ${_current_soname}.1 SONAME to match ${_current_soname}.1"
-      patchelf --set-soname "${_current_soname}.1" "${_current_soname}.1"
-    fi
-  done
+#  cd "${pkgdir}/opt/cuda/targets/x86_64-linux/lib"
+#  find "${pkgdir}"/opt/cuda/targets/x86_64-linux/lib -type l -name "*.so.10" ! -path "*stubs/*" -print0 | while read -rd '' _lib; do                                                            
+#    _current_soname=$(basename ${_lib})
+#    if [ ! -f "${_current_soname}.1" ]; then
+#      echo "copying ${_current_soname} to ${_current_soname}.1 version"
+#      cp ${_current_soname} "${_current_soname}.1"
+#      echo "patching ${_current_soname}.1 SONAME to match ${_current_soname}.1"
+#      patchelf --set-soname "${_current_soname}.1" "${_current_soname}.1"
+#    fi
+#  done

   # Install profile and ld.so.config files
   install -Dm755 "${srcdir}/cuda.sh" "${pkgdir}/etc/profile.d/cuda.sh"

It looks like there was work done to remedy another problem with the cuda release. However reversing this gives me back what I'd expect for the symlinks:

(venv36) [stiege@archie lib64]$ ls -l | grep cublas.so
lrwxrwxrwx 1 root root        15 Apr 16 21:26 libcublas.so -> libcublas.so.10
lrwxrwxrwx 1 root root        23 Apr 16 21:26 libcublas.so.10 -> libcublas.so.10.1.0.105
-rwxr-xr-x 1 root root  78315120 Apr 16 21:26 libcublas.so.10.1.0.105

Now, clearly the SONAME libcublas.so.10.1 is actually unavailable - but that could be intended by cuda. And in fact appears to be given the original SONAME of the elf:

(venv36) [stiege@archie lib64]$ readelf -d libcublas.so.10.1.0.105 | grep SONAME
 0x000000000000000e (SONAME)             Library soname: [libcublas.so.10]

I'm reasonably new to this, but based on the info at https://en.wikipedia.org/wiki/Soname this indicates CUDA is aiming for backwards compatibility for 10 now, as opposed to specifically 10.1 (if they have changed this).

If we want to expose 10.1 for some reason; then I'm not exactly sure what that is supposed to look like. I've had a play but it seems like I'm fighting a few conventions and can't actually even make it work (certainly not while keeping it working for the SONAME provided by CUDA).

Nothing more I can add here unfortunately.

Offline

#8 2019-04-17 10:22:04

Lone_Wolf
Member
From: Netherlands, Europe
Registered: 2005-10-04
Posts: 11,868

Re: Confused about loading shared objects


Disliking systemd intensely, but not satisfied with alternatives so focusing on taming systemd.


(A works at time B)  && (time C > time B ) ≠  (A works at time C)

Offline

Board footer

Powered by FluxBB