You are not logged in.

#1 2024-02-16 20:28:54

byjove01
Member
From: Alps, France
Registered: 2021-02-15
Posts: 207

[SOLVED] RuntimeError: HIP error: shared object initialization failed

Hi.
To use a specific program (ComfyUI), I have to install with the `python-torchsde` package. But most of the tests does fail and I can't have a guess about what's happening other than this log.

RuntimeError: HIP error: shared object initialization failed

I'm using Arch Linux 6.7.4-arch1-1. GPU: Radeon RX 5700 XT. CPU: Ryzen 5 3600.
ComfyUI needed Pytorch so I installed the "python-pytorch-rocm 2.2.0-1" package, but I also tried with the "opt-rocm" variant that almost worked until I got other errors related to Blender.

==> check()…
=================================================================== test session starts ====================================================================
platform linux -- Python 3.11.7, pytest-7.4.4, pluggy-1.4.0
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: /home/byjove/.cache/yay/python-torchsde/src/python-torchsde
plugins: cov-4.1.0, anyio-4.2.0, hypothesis-6.98.2, benchmark-4.0.0
collected 3172 items

tests/test_adjoint.py FFF.FFF.FFFFFFFFFFF.FFF.FFFFFFFFFFFFFFF.FFF.FFFFFFFFFFFFFFFFFFFFFFFFFFFF                                                       [  2%]
tests/test_brownian_interval.py .F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F.F..FF..FF..FF..FF..FF..FF.. [  5%]
FF..FF..FF..FF..FF                                                                                                                                   [  6%]
tests/test_brownian_path.py .F.F.F                                                                                                                   [  6%]
tests/test_brownian_tree.py .F.F.F                                                                                                                   [  6%]
tests/test_sdeint.py .F.F........................................................................................................................... [ 10%]
.................................................................................................................................................... [ 15%]
.................................................................................................................................................... [ 20%]
.................................................................................................................................................... [ 24%]
.................................................................................................................................................... [ 29%]
.................................................................................................................................................... [ 34%]
.................................................................................................................................................... [ 38%]
.................................................................................................................................................... [ 43%]
.................................................................................................................................................... [ 48%]
.......................................................................................................................................FFFFFFFFFFFFF [ 52%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 57%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 62%]
FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF [ 66%]

[...]

FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-none-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-none-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-none-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-none-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-space-time-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-space-time-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-space-time-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-space-time-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-davie-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-davie-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-davie-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-davie-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-foster-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-foster-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-foster-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-ito-True-foster-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-False-None-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-False-None-ExScalar] - RuntimeError: HIP error: shared objectinitialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-False-None-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-False-None-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-none-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-none-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-none-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-none-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-space-time-ExDiagonal] - RuntimeError: HIP error: sharedobject initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-space-time-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-space-time-ExAdditive] - RuntimeError: HIP error: sharedobject initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-space-time-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-davie-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-davie-ExScalar] - RuntimeError: HIP error: shared objectinitialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-davie-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-davie-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-foster-ExDiagonal] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-foster-ExScalar] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-foster-ExAdditive] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_run_shape_method[cuda-False-True-log_ode-stratonovich-True-foster-NeuralGeneral] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-euler-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-euler-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-euler-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-euler-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein_grad_free-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein_grad_free-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein_grad_free-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-milstein_grad_free-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-srk-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-srk-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-srk-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-False-srk-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-euler-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-euler-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-euler-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-euler-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein_grad_free-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein_grad_free-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein_grad_free-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-milstein_grad_free-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-srk-BasicSDE1] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-srk-BasicSDE2] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-srk-BasicSDE3] - RuntimeError: HIP error: shared object initialization failed
FAILED tests/test_sdeint.py::test_sdeint_dependencies[cuda-True-srk-BasicSDE4] - RuntimeError: HIP error: shared object initialization failed
================================================ 1613 failed, 1559 passed, 28 warnings in 234.31s (0:03:54) ================================================

Last edited by byjove01 (2024-02-19 20:07:47)

Offline

#2 2024-02-19 18:55:49

loqs
Member
Registered: 2014-03-06
Posts: 17,554

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

Do you intend to package ComfyUI?  If not I would suggest using python virtual environment with pip with the rocm5.7 additional repo provided by pytorch to manage the modules.  This avoids having to manage the conflicting versions between ComfyUI and the arch official repositories.  rocm5.7 may well not be compatible in which case I would drop the additional repo.

Offline

#3 2024-02-19 20:07:39

byjove01
Member
From: Alps, France
Registered: 2021-02-15
Posts: 207

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

Thanks for the solution, it worked. Would've preferred to avoid that, but I'll deal with it.

Offline

#4 2024-02-22 11:34:00

loqs
Member
Registered: 2014-03-06
Posts: 17,554

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

I think there is a slim chance it might be the same issue as to https://gitlab.archlinux.org/archlinux/ … -/issues/2
Did any other PKGBUILDs depending on python-pytorch have a check function and pass?

Last edited by loqs (2024-02-22 11:34:40)

Offline

#5 2024-02-22 17:03:12

byjove01
Member
From: Alps, France
Registered: 2021-02-15
Posts: 207

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

How can I check this, so that I could be sure of my answer?

Last edited by byjove01 (2024-02-22 17:03:24)

Offline

#6 2024-02-22 17:40:19

loqs
Member
Registered: 2014-03-06
Posts: 17,554

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

If you can remeber the the build order you used you can examine the PKGBUILDs in sequence to checking if they have a package function and a dependency on pytorch or as a basic check use torch.cuda.is_available or torch.device('cuda').

Offline

#7 2024-04-09 09:15:47

shibe
Member
Registered: 2024-04-09
Posts: 5

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

"cuda" device didn't work for me in PyTorch from python-pytorch-opt-rocm 2.2.2-1 package because of a problem in magma-hip 2.7.2-4 package. This affects any software that uses system-provided PyTorch. I think, python-pytorch-rocm would not work either.

ROCm/HIP binaries contain code compiled for particular GPU architectures. For it to work, it must include code for the kind of GPU you are trying to use. Because of a deficiency in HIP runtime (as of ROCm 6.0), if application links against any library that contains some GPU code, but not for the needed architecture, all GPU code in all other libraries will not work either.

PyTorch links to /opt/rocm/lib/libmagma.so, which is compiled for gfx906. If you try to use it with any other kind of GPU, it will not work, and it will give "HIP error: shared object initialization failed".

I managed to recompile magma-hip with some changes such that it includes code for multiple HIP architectures, and now "cuda" device works in system PyTorch. I reported the issue in build process of MAGMA library: https://bitbucket.org/icl/magma/issues/74

Offline

#8 2024-04-15 20:50:44

Nickola
Member
Registered: 2024-01-07
Posts: 4

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

shibe wrote:

PyTorch links to /opt/rocm/lib/libmagma.so, which is compiled for gfx906. If you try to use it with any other kind of GPU, it will not work, and it will give "HIP error: shared object initialization failed".

I managed to recompile magma-hip with some changes such that it includes code for multiple HIP architectures, and now "cuda" device works in system PyTorch. I reported the issue in build process of MAGMA library: https://bitbucket.org/icl/magma/issues/74

The issue description is not accessible on bitbucket now, could you please share a patch with your fix to build magma-hip. It will be useful to fix an issue for the python-pytorch-opt-rocm package. https://gitlab.archlinux.org/archlinux/ … /issues/10

Offline

#9 2024-04-16 23:07:54

shibe
Member
Registered: 2024-04-09
Posts: 5

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

This seemingly fixes one of the problems with MAGMA:

--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -612,6 +612,7 @@
 	hip::host
         ${blas_fix}
         ${LAPACK_LIBRARIES}
+	hip::device
 	roc::hipblas
 	roc::hipsparse
 	)

Another problem is with setting the list of architectures in CMake. I don't know the correct way of doing it. I set 3 CMake variables: AMDGPU_TARGETS, GPU_TARGET, GPU_TARGETS with the same list of architectures, but separated by semicolons instead of spaces.

By the way, if you just recompile MAGMA as is on the target computer, it will probably work with auto-detected GPU architecture.

Offline

#10 2024-04-20 18:12:34

Nickola
Member
Registered: 2024-01-07
Posts: 4

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

Thanks for the answer.
I had a small investigation for a magma-hip package build.
Unfortunately, CMake variables: AMDGPU_TARGETS, GPU_TARGET, GPU_TARGETS have no effect on building the magma-hip package.

PKGBUILD script uses CMake to build the package.
CMake processes the GPU_TARGET variable but it is ignored to build the magma-hip package, so the libraries are compiled to use GPUs discovered on the build server.

Possible solution to build the magma-hip package is using make. Makefile script processes environment variable GPU_TARGET to build magma-hip libraries.

Regarding the CMakeLists patch: looks like no need to add hip::device to build libmagma.so at least in my case for building libmagma to support only one GPU

Offline

#11 2024-04-21 07:32:26

shibe
Member
Registered: 2024-04-09
Posts: 5

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

Nickola wrote:

Regarding the CMakeLists patch: looks like no need to add hip::device to build libmagma.so at least in my case for building libmagma to support only one GPU

Yes, no change is needed to build for GPU present on the build system. My patch is for building for multiple architectures, but it's not the only change that's needed. Here is PKGBUILD that I used:

_pkgname=magma
pkgbase=$_pkgname
pkgname=(magma-hip)
pkgver=2.7.2
pkgrel=4
_pkgdesc="Matrix Algebra on GPU and Multicore Architectures"
arch=('x86_64')
url="https://icl.utk.edu/magma/"
license=('custom')
depends=('blas' 'lapack')
makedepends=('git' 'cmake' 'ninja' 'python' 'gcc-fortran'
             'rocm-core' 'hip-runtime-amd' 'hipblas' 'hipsparse')
optdepends=('python: for examples and tests'
            'gcc-fortran: Fortran interface')
_commit=a1625ff4d9bc362906bd01f805dbbe12612953f6  # commit after v2.7.2 with ROCm 6 fixes.
source=("${_pkgname}::git+https://bitbucket.org/icl/magma.git#commit=${_commit}"
        'hipdevice.diff')
sha256sums=('SKIP'
            '86739e85b015f8919e404ad32f5f57a446be86d2b728dba4eff17c536fbaef62')
options=(!lto)

_valid_gfx() {
  #List of GPU targets from rocBLAS
  echo "gfx900 gfx906:xnack- gfx908:xnack- gfx90a:xnack+ gfx90a:xnack- gfx940 gfx941 gfx942 gfx1010 gfx1012 gfx1030 gfx1031 gfx1100 gfx1101 gfx1102"
}

prepare() {
  cp -r "${_pkgname}" "${_pkgname}-${pkgver}-hip"
  cd "${srcdir}"
  cd "${_pkgname}-${pkgver}-hip"
  patch -Np1 -i "${srcdir}/hipdevice.diff"
  echo -e "BACKEND = hip\nFORT = true\nGPU_TARGET=$(_valid_gfx)" > make.inc
  make generate
}

build() {
  echo "Build with rocm/hip backend"
  cd "${srcdir}/${_pkgname}-${pkgver}-hip"
  local _rocm_ver=$(./tools/get-rocm-version.sh)
  # -fcf-protection is not supported by HIP, see
  # https://docs.amd.com/bundle/ROCm-Compiler-Reference-Guide-v5.5/page/Compiler_Options_and_Features.html#d2e2018
  CXXFLAGS+=" -fcf-protection=none"
  # ROCm version needs to be passed to the compiler since it's not part of the
  # cmake toolchain yet.
  CXXFLAGS+=" -DROCM_VERSION=$_rocm_ver"
  # With ROCm 6.0.0 the header moved from /opt/rocm/include to the subfolder hipsparse.
  # magma still uses the old location.
  CXXFLAGS+=" -isystem /opt/rocm/include/hipsparse"
  cmake \
    -Bbuild \
    -GNinja \
    "-DAMDGPU_TARGETS=$(_valid_gfx|tr " " ";")" \
    -DBUILD_SHARED_LIBS=ON \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
    -DCMAKE_INSTALL_PREFIX=/opt/rocm \
    "-DGPU_TARGET=$(_valid_gfx|tr " " ";")" \
    "-DGPU_TARGETS=$(_valid_gfx|tr " " ";")" \
    -DMAGMA_ENABLE_HIP=ON
  ninja -C build
}

_package() {
  DESTDIR="${pkgdir}" ninja -Cbuild install
  local _prefix="$1"
  install -d "${pkgdir}/${_prefix}"/share/magma/example
  cp -r "${srcdir}"/magma/example/* "${pkgdir}/${_prefix}"/share/magma/example/
  install -d "${pkgdir}/${_prefix}"/share/magma/testing
  cp -r "${srcdir}"/magma/testing/* "${pkgdir}/${_prefix}"/share/magma/testing/
  install -Dm644 "${srcdir}"/magma/COPYRIGHT "${pkgdir}"/usr/share/licenses/${pkgname}/LICENSE
  echo "${_prefix}/lib" > "${pkgname}.conf"
  install -Dm644 "${pkgname}.conf" "${pkgdir}"/etc/ld.so.conf.d/"${pkgname}.conf"
}

package_magma-hip() {
  pkgdesc="${_pkgdesc} (with ROCm/HIP)"
  depends+=(hip-runtime-amd hipblas hipsparse)
  provides=(hipmagma)
  replaces=(hipmagma)

  cd "${srcdir}/${_pkgname}-${pkgver}-hip"
  _package "/opt/rocm"
}

It worked for me at least:

/opt/rocm/lib/libmagma.so:
amdgcn-amd-amdhsa--gfx1010
amdgcn-amd-amdhsa--gfx1012
amdgcn-amd-amdhsa--gfx1030
amdgcn-amd-amdhsa--gfx1031
amdgcn-amd-amdhsa--gfx1100
amdgcn-amd-amdhsa--gfx1101
amdgcn-amd-amdhsa--gfx1102
amdgcn-amd-amdhsa--gfx900
amdgcn-amd-amdhsa--gfx906:xnack-
amdgcn-amd-amdhsa--gfx908:xnack-
amdgcn-amd-amdhsa--gfx90a:xnack+
amdgcn-amd-amdhsa--gfx90a:xnack-
amdgcn-amd-amdhsa--gfx940
amdgcn-amd-amdhsa--gfx941
amdgcn-amd-amdhsa--gfx942

/opt/rocm/lib/libmagma_sparse.so:
amdgcn-amd-amdhsa--gfx1010
amdgcn-amd-amdhsa--gfx1012
amdgcn-amd-amdhsa--gfx1030
amdgcn-amd-amdhsa--gfx1031
amdgcn-amd-amdhsa--gfx1100
amdgcn-amd-amdhsa--gfx1101
amdgcn-amd-amdhsa--gfx1102
amdgcn-amd-amdhsa--gfx900
amdgcn-amd-amdhsa--gfx906:xnack-
amdgcn-amd-amdhsa--gfx908:xnack-
amdgcn-amd-amdhsa--gfx90a:xnack+
amdgcn-amd-amdhsa--gfx90a:xnack-
amdgcn-amd-amdhsa--gfx940
amdgcn-amd-amdhsa--gfx941
amdgcn-amd-amdhsa--gfx942

Offline

#12 2024-04-22 18:11:44

Nickola
Member
Registered: 2024-01-07
Posts: 4

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

Great thanks, Shibe, your PKGBUILD works good.
I had tried to define AMDGPU_TARGETS inside of CMakeList.txt by patching it before, but had no success. Definition of  AMDGPU_TARGETS in cmake command line works perfectly, in CMakeList.txt does not.(I don't know why)

So, if someone wants to build the magma-hip with several AMD GPU architectures support should do the following steps:

  • patch CMakeLists.txt to add hip::device for libmagma.so target. Patch

  • add  "-DAMDGPU_TARGETS=$(_valid_gfx|tr " " ";")"  to cmake command in original magma PKGBUILD file, or use Shibe PKGBUILD file

P.S. I build magma 2.8.0 (git commit 17472eb935956c598368a5f66c9eb0336a68aab6), GPU_TARGETS is not defined.  Pytorch with rocm works well.

Offline

#13 2024-04-23 15:17:42

shibe
Member
Registered: 2024-04-09
Posts: 5

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

My report on Bitbucket doesn't get attention. Perhaps, this temporary solution can be suggested to Arch packager. I don't have an account there, so I didn't report.

Nickola wrote:

I had tried to define AMDGPU_TARGETS inside of CMakeList.txt by patching it before, but had no success.

AMDGPU_TARGETS is used in ROCm's CMake file, not directly in MAGMA's. Another possibility is that it was considered, but only applied to libmagma_sparse.so.

Offline

#14 2024-05-06 13:38:20

shibe
Member
Registered: 2024-04-09
Posts: 5

Re: [SOLVED] RuntimeError: HIP error: shared object initialization failed

magma-hip 2.8.0-1 seems to include the patch to enable multiple targets for libmagma.so, but unfortunately, it is still built only for gfx906, probably because of a typo in PKGBUILD. Here is one that worked for me:

_pkgname=magma
pkgbase=$_pkgname
pkgname=(magma-hip)
pkgver=2.8.0
pkgrel=1
_pkgdesc="Matrix Algebra on GPU and Multicore Architectures"
arch=('x86_64')
url="https://icl.utk.edu/magma/"
license=('BSD-3-Clause')
depends=('blas' 'lapack')
makedepends=('git' 'cmake' 'ninja' 'python' 'gcc-fortran'
             'rocm-core' 'hip-runtime-amd' 'hipblas' 'hipsparse')
optdepends=('python: for examples and tests'
            'gcc-fortran: Fortran interface')
source=("git+https://bitbucket.org/icl/magma.git#tag=v${pkgver}"
        'hip_device.patch')
sha256sums=('781bafd605579512b441664f76c9ba5559268f95f9357247cb8e04b76a72061e'
            '86739e85b015f8919e404ad32f5f57a446be86d2b728dba4eff17c536fbaef62')
options=(!lto)

_valid_gfx() {
  #List of GPU targets from rocBLAS
  echo "gfx900 gfx906:xnack- gfx908:xnack- gfx90a:xnack+ gfx90a:xnack- gfx940 gfx941 gfx942 gfx1010 gfx1012 gfx1030 gfx1100 gfx1101 gfx1102"
}

prepare() {
  cp -R "${_pkgname}" "${_pkgname}-${pkgver}-hip"

  cd "${_pkgname}-${pkgver}-hip"
  echo -e "BACKEND = hip\nFORT = true\nGPU_TARGET=$(_valid_gfx)" > make.inc
  patch -Np1 -i "${srcdir}/hip_device.patch"
}

build() {
  echo "Build with rocm/hip backend"
  cd "${srcdir}/${_pkgname}-${pkgver}-hip"
  make generate
  local _rocm_ver=$(./tools/get-rocm-version.sh)
  # -fcf-protection is not supported by HIP, see
  # https://docs.amd.com/bundle/ROCm-Compiler-Reference-Guide-v5.5/page/Compiler_Options_and_Features.html#d2e2018
  CXXFLAGS+=" -fcf-protection=none"
  # ROCm version needs to be passed to the compiler since it's not part of the
  # cmake toolchain yet.
  CXXFLAGS+=" -DROCM_VERSION=$_rocm_ver"
  # With ROCm 6.0.0 the header moved from /opt/rocm/include to the subfolder hipsparse.
  # magma still uses the old location.
  CXXFLAGS+=" -isystem /opt/rocm/include/hipsparse"
  cmake \
    -Bbuild \
    -GNinja \
    -DCMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc \
    -DCMAKE_BUILD_TYPE=Release \
    -DCMAKE_INSTALL_PREFIX=/opt/rocm \
    -DBUILD_SHARED_LIBS=ON \
    -DMAGMA_ENABLE_HIP=ON \
    -DGPU_TARGET="$(_valid_gfx)" \
    -DAMDGPU_TARGETS="$(_valid_gfx | tr ' ' ';')"
  ninja -C build
}

_package() {
  DESTDIR="${pkgdir}" ninja -Cbuild install

  local _prefix="$1"
  install -d "${pkgdir}/${_prefix}"/share/magma/example
  cp -r "${srcdir}"/magma/example/* "${pkgdir}/${_prefix}"/share/magma/example/
  install -d "${pkgdir}/${_prefix}"/share/magma/testing
  cp -r "${srcdir}"/magma/testing/* "${pkgdir}/${_prefix}"/share/magma/testing/
  install -Dm644 "${srcdir}"/magma/COPYRIGHT "${pkgdir}"/usr/share/licenses/${pkgname}/LICENSE
  echo "${_prefix}/lib" > "${pkgname}.conf"
  install -Dm644 "${pkgname}.conf" "${pkgdir}"/etc/ld.so.conf.d/"${pkgname}.conf"
}

package_magma-hip() {
  pkgdesc="${_pkgdesc} (with ROCm/HIP)"
  depends+=(hip-runtime-amd hipblas hipsparse)
  provides=(hipmagma)
  replaces=(hipmagma)

  cd "${srcdir}/${_pkgname}-${pkgver}-hip"
  _package "/opt/rocm"
}

Additionally, python-pytorch-opt-rocm 2.3.0-3 depends on hipBLASLt, which supports smaller set of GPU architectures. I tried to rebuild hipblaslt 6.0.2 and simply added more targets. The build process was throwing error messages, but somehow the package was created anyway, and PyTorch seems to work with it.

Offline

Board footer

Powered by FluxBB