[Solved] Ollama-rocm not offloading the workload to gpu

uttarax · 2025-12-19 22:37:19

I have discrete AMD GPU 7900XT

44:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 31 [Radeon RX 7900 XT/7900 XTX/7900 GRE/7900M] (rev cc)

I followed the instruction in https://wiki.archlinux.org/title/Ollama and installed ollama-rocm

ollama_models]$ ollama --version
ollama version is 0.13.5

I tried running running models first without any overrides and then with HSA_OVERRIDE_GFX_VERSION as suggested in the wiki

Environment="OLLAMA_HOST=0.0.0.0:11434"
Environment="OLLAMA_MODELS=/disk2/ollama_models"
Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"

In both cases ollama uses 100% CPU with no GPU offload i check with ollama ps and amdgpu_top command.

The version 11.0.0 comes from the outputs of the following command as instructed in wiki

rocminfo | grep amdhsa
      Name:                    amdgcn-amd-amdhsa--gfx1100         
      Name:                    amdgcn-amd-amdhsa--gfx11-generic 

find /opt/rocm/lib/rocblas/library -name 'Kernels.so-*'
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx90a-xnack+.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1101.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx906-xnack-.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1150.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx942.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1100.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1010.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1151.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx908-xnack-.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1103.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1201.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1030.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx900.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx950.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1102.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1200.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx90a-xnack-.hsaco
/opt/rocm/lib/rocblas/library/Kernels.so-000-gfx1012.hsaco

What am I missing ?
Other details: Linux bhokaal.linkpc.net 6.18.1-arch1-2 #1 SMP PREEMPT_DYNAMIC Sat, 13 Dec 2025 18:23:21 +0000 x86_64 GNU/Linux

Last edited by uttarax (2025-12-20 21:10:29)

Succulent of your garden · 2025-12-19 23:12:18

did you check point 3.1 of the link or arch wiki about ollama ?

This sound initially that you missed the installation of hip-runtime-amd as you can see in point 4.2 in this wiki https://wiki.archlinux.org/title/Genera … _units#HIP

if you need to use pytorch in the future also this package is needed python-pytorch-rocm as point 6.1 says. But i think that for ollama is not needed.

Does your gpu is been showed when you make usage of nvtop ?

uttarax · 2025-12-20 00:05:18

Thank you for your response

Succulent of your garden wrote:

did you check point 3.1 of the link or arch wiki about ollama ?

Yes that is why I tried adding Environment="HSA_OVERRIDE_GFX_VERSION=11.0.0"

I have both hip-runtime-amd & rocm-hip-runtime installed

pacman -Q hip-runtime-amd rocm-hip-runtime
opencl-amd 1:7.1.1-1
opencl-amd 1:7.1.1-1

Succulent of your garden wrote:

Does your gpu is been showed when you make usage of nvtop ?

Since it is amd gpu i am checking amdgpu_top and it shows no usage, when I do ollama run also ollama ps shows 100% CPU usage

What else should I be doing ?

A further point, when I use LMstudio appimage it is able to offload to GPU fine but I am having *no* success with ollama-rocm :{

loqs · 2025-12-20 00:08:58

Please start ollama from the console and post the output from when you load and run a model that uses the CPU.

uttarax · 2025-12-20 04:59:54

"if GPUs are not correctly discovered, unset and try again"

I am guessing it means it is not able detect my GPU

https://ollama.com/blog/amd-preview says my GPU 7900 XT is supported.

Full output

[bhokaal@bhokaal ~]$ OllAMA_DEBUG=5 ollama serve
time=2025-12-19T20:48:53.023-08:00 level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION:11.0.0 HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/bhokaal/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-12-19T20:48:53.023-08:00 level=INFO source=images.go:493 msg="total blobs: 5"
time=2025-12-19T20:48:53.024-08:00 level=INFO source=images.go:500 msg="total unused blobs removed: 0"
time=2025-12-19T20:48:53.024-08:00 level=INFO source=routes.go:1607 msg="Listening on 127.0.0.1:11434 (version 0.13.5)"
time=2025-12-19T20:48:53.024-08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-12-19T20:48:53.024-08:00 level=WARN source=runner.go:485 msg="user overrode visible devices" HSA_OVERRIDE_GFX_VERSION=11.0.0
time=2025-12-19T20:48:53.024-08:00 level=WARN source=runner.go:489 msg="if GPUs are not correctly discovered, unset and try again"
time=2025-12-19T20:48:53.025-08:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 42221"
time=2025-12-19T20:48:53.053-08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.2 GiB" available="22.5 GiB"
time=2025-12-19T20:48:53.054-08:00 level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

Succulent of your garden · 2025-12-20 12:18:07

can you share with us the following commands:

pacman -Q | grep amd

pacman -Q | grep hip

pacman -Q | grep rocm

To know which dependencies you have and check if you have some kind of dependency conflict between two ?

Your gpu I'm mostly sure it's supported. I was able to run rocm-amd just the past week fine but know I had changed my gpu for and Nvidia, but I'm mostly sure you can.

Also did you get the HSA_OVERRIDE_GFX_VERSION from where ? those are in the amd links that the arch wiki provide ? I can't see it now, since there is an issue with the dns trying to get the page in my case at the moment. Had you ever tried without that export variable ? I was able to run ollama-rocm without it.

Lone_Wolf · 2025-12-20 12:56:52

Pacman -Qs something tends to be faster and doesn't start a second process .

ADDED
https://wiki.archlinux.org/title/Ollama … grated_GPU

Last edited by Lone_Wolf (2025-12-20 12:58:45)

Succulent of your garden · 2025-12-20 13:25:48

Lone_Wolf wrote:

Pacman -Qs something tends to be faster and doesn't start a second process .

Thanks for the tip ^^

So if that HSA_OVERRIDE_GFX_VERSION is for integrated gpus why are you using it if you are with a discrete gpu uttarax ?

Also nvtop works for amd gpus. If you can check if your gpu is listed with nvtop that could be nice to help you, since for example currently now I'm with both nvidia gpus, but one is pascal and one is blackwell. I can only see my blackwell with nvtop because that's host system have in the nvidia driver, but if I do and lspci or fastfetch I can clearly see the pascal one gpu. So nvtop as for me is a good quick way to check if the drivers are installed.

uttarax · 2025-12-20 20:06:16

Than you both for your continued help and replies. Sorry for delay in response i was afk for some sudden reasons.

[bhokaal@bhokaal ~]$ pacman -Qs | grep amd
local/amd-ucode 20251125-2
local/amdgpu_top 0.11.0-1
local/linux-firmware-amdgpu 20251125-2
local/opencl-amd 1:7.1.1-1
    ROCm components repackaged from AMD's Ubuntu releases (ROCr runtime, OpenCL runtime, HIP runtime) - This package is intended to work along with the free amdgpu stack.
local/xf86-video-amdgpu 25.0.0-1 (xorg-drivers)
    X.org amdgpu video driver
[bhokaal@bhokaal ~]$ pacman -Qs | grep hip
    Utility for reading, writing, erasing and verifying flash ROM chips
    Runtime libraries shipped by GCC
    Runtime libraries shipped by GCC (14.x.x)
local/hipblas 7.1.1-1
local/hipblas-common 7.1.1-1
    Common files shared by hipBLAS and hipBLASLt
    Linux Driver for ITE LPC chips
    A ship sinking game
    Each of two possible players controls a satellite spaceship orbiting the sun
    32-bit runtime libraries shipped by GCC
    A library to talk to FTDI chips, optional python bindings.
[bhokaal@bhokaal ~]$ pacman -Qs | grep rocm
local/ollama-rocm 0.13.5-1

I tried without setting HSA_OVERRIDE_GFX_VERSION first results were the same still no gpu usage.
Here is a run without HSA_OVERRIDE_GFX_VERSION set

[bhokaal@bhokaal ~]$ OllAMA_DEBUG=5 ollama serve
time=2025-12-20T12:04:13.545-08:00 level=INFO source=routes.go:1554 msg="server config" env="map[CUDA_VISIBLE_DEVICES: GGML_VK_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: HTTPS_PROXY: HTTP_PROXY: NO_PROXY: OLLAMA_CONTEXT_LENGTH:4096 OLLAMA_DEBUG:INFO OLLAMA_FLASH_ATTENTION:false OLLAMA_GPU_OVERHEAD:0 OLLAMA_HOST:http://127.0.0.1:11434 OLLAMA_KEEP_ALIVE:5m0s OLLAMA_KV_CACHE_TYPE: OLLAMA_LLM_LIBRARY: OLLAMA_LOAD_TIMEOUT:5m0s OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/bhokaal/.ollama/models OLLAMA_MULTIUSER_CACHE:false OLLAMA_NEW_ENGINE:false OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:1 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://* vscode-webview://* vscode-file://*] OLLAMA_REMOTES:[ollama.com] OLLAMA_SCHED_SPREAD:false OLLAMA_VULKAN:false ROCR_VISIBLE_DEVICES: http_proxy: https_proxy: no_proxy:]"
time=2025-12-20T12:04:13.548-08:00 level=INFO source=images.go:493 msg="total blobs: 5"
time=2025-12-20T12:04:13.548-08:00 level=INFO source=images.go:500 msg="total unused blobs removed: 0"
time=2025-12-20T12:04:13.549-08:00 level=INFO source=routes.go:1607 msg="Listening on 127.0.0.1:11434 (version 0.13.5)"
time=2025-12-20T12:04:13.551-08:00 level=INFO source=runner.go:67 msg="discovering available GPUs..."
time=2025-12-20T12:04:13.553-08:00 level=INFO source=server.go:429 msg="starting runner" cmd="/usr/bin/ollama runner --ollama-engine --port 37719"
time=2025-12-20T12:04:13.603-08:00 level=INFO source=types.go:60 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="31.2 GiB" available="26.9 GiB"
time=2025-12-20T12:04:13.603-08:00 level=INFO source=routes.go:1648 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"

Thanks for tip about nvtop, I did give it a try and of course no GPU usage with ollama-rocm but i can see usage with LMstudio.

Luciddream · 2025-12-20 20:17:24

Hey, not sure if it's causing your issue but it looks like you have parts of ROCm from the official repositories and other parts from AUR. There are a couple of ways to add ROCm to Arch Linux.

One is the official (7.1.1) rocm-hip-sdk, the second one is using opencl-amd-dev (7.1.1) and if you want the latest preview (7.10.0) you can try rocm-gfx110x-bin for your GPU - which will eventually replace opencl-amd-dev in the future.

I would suggest deciding which package you want and remove the other packages. For example if you want to keep the official packages, remove opencl-amd first then install rocm-hip-sdk.

edit: maybe off topic but, I'm not to blame for not including hipblas into the lighter opencl-amd. It's working for most OpenCL and HIP runtime stuff, and I'm trying to follow the AMD packaging. For example hipblas depends on over 800MB of packages.

Package: hipblas
Architecture: amd64
Depends: rocblas (>= 5.1.0), rocsolver (>= 3.31.0), rocm-core

also from hipblas repository:

To use hipBLAS, you must first install rocBLAS, rocSPARSE, and rocSOLVER or cuBLAS.

Last edited by Luciddream (2025-12-20 20:47:58)

uttarax · 2025-12-20 21:08:09

Thanks your tip solved my problem.

Here is how I fixed it

sudo pacman -Rs ollama-rocm

sudo pacman -Rs opencl-amd

sudo pacman -S ollama-rocm

Now it just works !! No need for any overrides the GPU is detected and model is loaded into GPU automatically .
Perfect .

Thank you so very much for all your help. I will mark this thread as solved.

Last edited by uttarax (2025-12-20 21:10:48)

Arch Linux

#1 2025-12-19 22:37:19

[Solved] Ollama-rocm not offloading the workload to gpu

#2 2025-12-19 23:12:18

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#3 2025-12-20 00:05:18

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#4 2025-12-20 00:08:58

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#5 2025-12-20 04:59:54

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#6 2025-12-20 12:18:07

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#7 2025-12-20 12:56:52

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#8 2025-12-20 13:25:48

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#9 2025-12-20 20:06:16

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#10 2025-12-20 20:17:24

Re: [Solved] Ollama-rocm not offloading the workload to gpu

#11 2025-12-20 21:08:09

Re: [Solved] Ollama-rocm not offloading the workload to gpu

Board footer