When running local LLMs, one of the most frustrating problems is this: your machine clearly has a GPU, yet Ollama still leans heavily on the CPU, and performance is painfully slow.
The short version is that this is usually not caused by one single issue. The most common causes are:
Ollamais not detecting any usable GPU- The driver,
ROCm, orCUDAenvironment is not set up correctly - The
Ollamaservice was started without the right environment variables - The model is too large and has fallen back to
CPUor mixedCPU/GPUloading - On AMD platforms, there may be extra compatibility issues such as
ROCmversion mismatch,gfxsettings, or device visibility problems
The fastest way to troubleshoot it is to go through the checks below in order.
1. First, confirm whether Ollama is really not using the GPU
The most direct check is:
|
|
Focus on the PROCESSOR column.
100% GPU: the model is fully running on the GPU100% CPU: the GPU is not being used at all- Results like
48%/52% CPU/GPU: part of the model is in VRAM, and part has spilled into system memory
If you see 100% CPU, the next step is to focus on environment and service configuration.
If you see mixed loading, that does not necessarily mean the GPU is broken. In many cases, it simply means VRAM is not enough.
2. Rule out the most common misunderstanding first: the model does not fit into VRAM
Many people assume that once a GPU is installed, Ollama will always run fully on it. That is not how it works.
If the model is too large, the context is too long, or some other loaded model is already occupying VRAM, Ollama may fall back to:
- Partial GPU + partial CPU
- Full
100% CPU
At this point, the two simplest tests are:
- Try a smaller model first
For example, test with a4Bor7Bmodel before jumping straight to much larger ones. - Unload other active models and test again
Runollama psfirst and make sure nothing else is occupying VRAM.
If smaller models use the GPU but larger ones do not, the real problem is usually VRAM capacity rather than the driver.
3. Check whether the GPU driver and the lower-level runtime are actually working
If even small models run only on CPU, the next step is to check the underlying environment.
NVIDIA
First confirm that the driver is working and the system can see the GPU. A common check is:
|
|
If this already fails, Ollama is very unlikely to use the GPU correctly.
AMD / ROCm
If you are using an AMD GPU, especially with ROCm, start with:
|
|
If these tools cannot list the device properly, the problem is still below Ollama, so there is no point debugging the application layer yet.
On AMD, the most common issue is not simply “is the driver installed,” but rather:
- The
ROCmversion does not match the OS version - The current GPU architecture has incomplete support
- The device exists, but the runtime is not being exposed correctly to
Ollama
4. Restart the Ollama service, not just your terminal
This is a very common trap.
Many people install drivers, change environment variables, fix ROCm, then just open a new terminal and continue with ollama run. But if Ollama is running as a background service, it may still be using the old environment.
So the safer approach is:
- Fully restart the
Ollamaservice - Reboot the machine if necessary
If you are running it as a service on Linux, make sure the service process was actually restarted instead of reusing the old one.
5. Check whether the environment variables are really reaching the service
This matters especially on AMD ROCm systems.
Some machines work fine when commands are run manually in a shell, but the Ollama service still uses only CPU. In that case, the usual reason is that the service process never received the variables you set in your shell.
Common variables to look at include:
|
|
Specifically:
ROCR_VISIBLE_DEVICESlimits or selects which GPUsROCmcan seeHSA_OVERRIDE_GFX_VERSIONis often used as a compatibility workaround on some AMD platforms
If you only export these variables in the current terminal, but Ollama is started by systemd, a desktop background service, or another daemon, they may not take effect.
In other words, “it looks set in my terminal” does not mean Ollama is actually using it.
6. On AMD platforms, focus on ROCm compatibility
Based on the public page metadata, the original video for this topic is tied to AMD Max+ 395, strix halo, and AMD ROCm.
In setups like these, Ollama failing to use the GPU is often more dependent on version matching than on NVIDIA systems.
Start by checking these:
- Whether the installed
ROCmversion fits the current OS and GPU - Whether the GPU belongs to an architecture with solid
ROCmsupport - Whether you need to set
HSA_OVERRIDE_GFX_VERSION - Whether an older
Ollamabuild or older inference runtime is causing compatibility issues
If rocminfo works and the GPU is visible to the system, but Ollama still runs only on CPU, the issue is often in the version combination rather than in model parameters.
7. In Docker, WSL, or remote environments, also check device mapping
If you are not running on bare metal but inside:
- Docker
- WSL
- Remote containers
- Virtualized environments
then you need to check one more layer: whether the GPU device is actually being exposed inside that environment.
A typical symptom looks like this:
- The host machine can see the GPU
Ollamainside the container or subsystem still uses onlyCPU
In that case, the issue may not be Ollama itself. The container or subsystem may simply not have GPU access.
8. Check logs last, but check them for the right reason
If you have already gone through the earlier steps, the most effective next move is not endless reinstalling, but looking directly at the Ollama startup and runtime logs.
Focus on two kinds of messages:
- Whether a GPU was detected at all
- Whether there are driver, library loading, or device initialization errors
If the logs clearly say something like “no compatible GPU found” or “failed to initialize ROCm/CUDA,” the troubleshooting direction becomes much clearer immediately.
Troubleshooting Order
If you only want the shortest path, use this order:
- Run
ollama psand confirm whether it isGPU,CPU, or mixed loading - Try a smaller model to rule out VRAM limits
- Use
nvidia-smi,rocminfo, androcm-smito verify the lower-level environment first - Fully restart the
Ollamaservice - Check service environment variables, especially
ROCR_VISIBLE_DEVICESandHSA_OVERRIDE_GFX_VERSIONon AMD - If you are in Docker or WSL, verify device mapping
- Finally, inspect logs for the exact error
Conclusion
When Ollama uses CPU instead of GPU, the root cause usually falls into one of three groups:
- The GPU is not being detected at all
- The GPU is detectable, but the runtime environment is not reaching
Ollama - The GPU is working, but the model is too large and falls back to
CPUor mixed memory
Once you separate those three cases, troubleshooting becomes much faster.
If you are on an AMD platform, pay special attention to ROCm version matching, device visibility, and compatibility variables instead of focusing only on the Ollama command itself.
Original video: https://www.bilibili.com/video/BV1cHoYBqE8k/