You are not logged in.

#1 2025-06-06 03:16:09

Snowy8179
Member
Registered: 2025-06-06
Posts: 1

7900XTX hangs with system python-pytorch-opt-rocm but not with venv

Hi, I just noticed that my 7900XTX hangs only when using python-pytorch-opt-rocm and python-pytorch-rocm from the official repository, but it doesn't hang when I run the same code inside venv with pytorch installed via pip (as instructed by PyTorch).

This is the python code:

import torch

x = torch.rand(2098, 1, device='cuda')
x # GPU hangs here

What is happening here: generate random tensor with dimension 2098x1, and then printing out the tensor x causes the GPU to hang.

It can also be triggered by

import torch

x = torch.rand(2098, 1, device='cpu')  # generate the tensor on cpu first
x # This will print out x correctly
x.cuda() # GPU hangs here, when trying to copy the tensor to the GPU

I can see in journalctl -f that this message appears very shortly after x.cuda() is executed:

Jun 06 01:52:16 tomahawk kernel: amdgpu: sq_intr: error, detail 0x00000000, type 2, sh 1, priv 1, wave_id 0, simd_id 0, wgp_id 0

Now, running the same code above inside a virtual environment, with pytorch installed via pip as documented on the official PyTorch, gives the expected result:

>>> x.cuda()
tensor([[0.4804],
        [0.3825],
        [0.5009],
        ...,
        [0.4668],
        [0.7279],
        [0.5362]], device='cuda:0')

Here is how the venv is created:

mkdir ~/venv
cd ~/venv
python -m venv test
source ~/venv/test/bin/activate
pip install --upgrade pip
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.3

As of now, official PyTorch is 2.7.1 with ROCm=6.3, while Arch's pytorch is 2.7.0 with ROCm=6.4.0

Offline

Board footer

Powered by FluxBB