<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>AMD on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/amd/</link>
        <description>Recent content in AMD on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Fri, 24 Apr 2026 18:30:00 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/amd/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>How to Fix Ollama Using CPU Instead of GPU</title>
        <link>https://www.knightli.com/en/2026/04/24/fix-ollama-using-cpu-instead-of-gpu/</link>
        <pubDate>Fri, 24 Apr 2026 18:30:00 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/24/fix-ollama-using-cpu-instead-of-gpu/</guid>
        <description>&lt;p&gt;When running local LLMs, one of the most frustrating problems is this: your machine clearly has a GPU, yet &lt;code&gt;Ollama&lt;/code&gt; still leans heavily on the &lt;code&gt;CPU&lt;/code&gt;, and performance is painfully slow.&lt;/p&gt;
&lt;p&gt;The short version is that this is usually not caused by one single issue. The most common causes are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;Ollama&lt;/code&gt; is not detecting any usable GPU&lt;/li&gt;
&lt;li&gt;The driver, &lt;code&gt;ROCm&lt;/code&gt;, or &lt;code&gt;CUDA&lt;/code&gt; environment is not set up correctly&lt;/li&gt;
&lt;li&gt;The &lt;code&gt;Ollama&lt;/code&gt; service was started without the right environment variables&lt;/li&gt;
&lt;li&gt;The model is too large and has fallen back to &lt;code&gt;CPU&lt;/code&gt; or mixed &lt;code&gt;CPU/GPU&lt;/code&gt; loading&lt;/li&gt;
&lt;li&gt;On AMD platforms, there may be extra compatibility issues such as &lt;code&gt;ROCm&lt;/code&gt; version mismatch, &lt;code&gt;gfx&lt;/code&gt; settings, or device visibility problems&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;The fastest way to troubleshoot it is to go through the checks below in order.&lt;/p&gt;
&lt;h2 id=&#34;1-first-confirm-whether-ollama-is-really-not-using-the-gpu&#34;&gt;1. First, confirm whether Ollama is really not using the GPU
&lt;/h2&gt;&lt;p&gt;The most direct check is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ollama ps
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Focus on the &lt;code&gt;PROCESSOR&lt;/code&gt; column.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;100% GPU&lt;/code&gt;: the model is fully running on the GPU&lt;/li&gt;
&lt;li&gt;&lt;code&gt;100% CPU&lt;/code&gt;: the GPU is not being used at all&lt;/li&gt;
&lt;li&gt;Results like &lt;code&gt;48%/52% CPU/GPU&lt;/code&gt;: part of the model is in VRAM, and part has spilled into system memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you see &lt;code&gt;100% CPU&lt;/code&gt;, the next step is to focus on environment and service configuration.&lt;br&gt;
If you see mixed loading, that does not necessarily mean the GPU is broken. In many cases, it simply means VRAM is not enough.&lt;/p&gt;
&lt;h2 id=&#34;2-rule-out-the-most-common-misunderstanding-first-the-model-does-not-fit-into-vram&#34;&gt;2. Rule out the most common misunderstanding first: the model does not fit into VRAM
&lt;/h2&gt;&lt;p&gt;Many people assume that once a GPU is installed, &lt;code&gt;Ollama&lt;/code&gt; will always run fully on it. That is not how it works.&lt;/p&gt;
&lt;p&gt;If the model is too large, the context is too long, or some other loaded model is already occupying VRAM, &lt;code&gt;Ollama&lt;/code&gt; may fall back to:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Partial GPU + partial CPU&lt;/li&gt;
&lt;li&gt;Full &lt;code&gt;100% CPU&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;At this point, the two simplest tests are:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Try a smaller model first&lt;br&gt;
For example, test with a &lt;code&gt;4B&lt;/code&gt; or &lt;code&gt;7B&lt;/code&gt; model before jumping straight to much larger ones.&lt;/li&gt;
&lt;li&gt;Unload other active models and test again&lt;br&gt;
Run &lt;code&gt;ollama ps&lt;/code&gt; first and make sure nothing else is occupying VRAM.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If smaller models use the GPU but larger ones do not, the real problem is usually VRAM capacity rather than the driver.&lt;/p&gt;
&lt;h2 id=&#34;3-check-whether-the-gpu-driver-and-the-lower-level-runtime-are-actually-working&#34;&gt;3. Check whether the GPU driver and the lower-level runtime are actually working
&lt;/h2&gt;&lt;p&gt;If even small models run only on &lt;code&gt;CPU&lt;/code&gt;, the next step is to check the underlying environment.&lt;/p&gt;
&lt;h3 id=&#34;nvidia&#34;&gt;NVIDIA
&lt;/h3&gt;&lt;p&gt;First confirm that the driver is working and the system can see the GPU. A common check is:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;nvidia-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If this already fails, &lt;code&gt;Ollama&lt;/code&gt; is very unlikely to use the GPU correctly.&lt;/p&gt;
&lt;h3 id=&#34;amd--rocm&#34;&gt;AMD / ROCm
&lt;/h3&gt;&lt;p&gt;If you are using an &lt;code&gt;AMD GPU&lt;/code&gt;, especially with &lt;code&gt;ROCm&lt;/code&gt;, start with:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;rocminfo
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;rocm-smi
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;If these tools cannot list the device properly, the problem is still below &lt;code&gt;Ollama&lt;/code&gt;, so there is no point debugging the application layer yet.&lt;/p&gt;
&lt;p&gt;On AMD, the most common issue is not simply “is the driver installed,” but rather:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;ROCm&lt;/code&gt; version does not match the OS version&lt;/li&gt;
&lt;li&gt;The current GPU architecture has incomplete support&lt;/li&gt;
&lt;li&gt;The device exists, but the runtime is not being exposed correctly to &lt;code&gt;Ollama&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;4-restart-the-ollama-service-not-just-your-terminal&#34;&gt;4. Restart the Ollama service, not just your terminal
&lt;/h2&gt;&lt;p&gt;This is a very common trap.&lt;/p&gt;
&lt;p&gt;Many people install drivers, change environment variables, fix &lt;code&gt;ROCm&lt;/code&gt;, then just open a new terminal and continue with &lt;code&gt;ollama run&lt;/code&gt;. But if &lt;code&gt;Ollama&lt;/code&gt; is running as a background service, it may still be using the old environment.&lt;/p&gt;
&lt;p&gt;So the safer approach is:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Fully restart the &lt;code&gt;Ollama&lt;/code&gt; service&lt;/li&gt;
&lt;li&gt;Reboot the machine if necessary&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you are running it as a service on Linux, make sure the service process was actually restarted instead of reusing the old one.&lt;/p&gt;
&lt;h2 id=&#34;5-check-whether-the-environment-variables-are-really-reaching-the-service&#34;&gt;5. Check whether the environment variables are really reaching the service
&lt;/h2&gt;&lt;p&gt;This matters especially on &lt;code&gt;AMD ROCm&lt;/code&gt; systems.&lt;/p&gt;
&lt;p&gt;Some machines work fine when commands are run manually in a shell, but the &lt;code&gt;Ollama&lt;/code&gt; service still uses only &lt;code&gt;CPU&lt;/code&gt;. In that case, the usual reason is that the service process never received the variables you set in your shell.&lt;/p&gt;
&lt;p&gt;Common variables to look at include:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;ROCR_VISIBLE_DEVICES
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;HSA_OVERRIDE_GFX_VERSION
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Specifically:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;ROCR_VISIBLE_DEVICES&lt;/code&gt; limits or selects which GPUs &lt;code&gt;ROCm&lt;/code&gt; can see&lt;/li&gt;
&lt;li&gt;&lt;code&gt;HSA_OVERRIDE_GFX_VERSION&lt;/code&gt; is often used as a compatibility workaround on some AMD platforms&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you only &lt;code&gt;export&lt;/code&gt; these variables in the current terminal, but &lt;code&gt;Ollama&lt;/code&gt; is started by systemd, a desktop background service, or another daemon, they may not take effect.&lt;/p&gt;
&lt;p&gt;In other words, “it looks set in my terminal” does not mean &lt;code&gt;Ollama&lt;/code&gt; is actually using it.&lt;/p&gt;
&lt;h2 id=&#34;6-on-amd-platforms-focus-on-rocm-compatibility&#34;&gt;6. On AMD platforms, focus on ROCm compatibility
&lt;/h2&gt;&lt;p&gt;Based on the public page metadata, the original video for this topic is tied to &lt;code&gt;AMD Max+ 395&lt;/code&gt;, &lt;code&gt;strix halo&lt;/code&gt;, and &lt;code&gt;AMD ROCm&lt;/code&gt;.&lt;br&gt;
In setups like these, &lt;code&gt;Ollama&lt;/code&gt; failing to use the GPU is often more dependent on version matching than on NVIDIA systems.&lt;/p&gt;
&lt;p&gt;Start by checking these:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Whether the installed &lt;code&gt;ROCm&lt;/code&gt; version fits the current OS and GPU&lt;/li&gt;
&lt;li&gt;Whether the GPU belongs to an architecture with solid &lt;code&gt;ROCm&lt;/code&gt; support&lt;/li&gt;
&lt;li&gt;Whether you need to set &lt;code&gt;HSA_OVERRIDE_GFX_VERSION&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Whether an older &lt;code&gt;Ollama&lt;/code&gt; build or older inference runtime is causing compatibility issues&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;If &lt;code&gt;rocminfo&lt;/code&gt; works and the GPU is visible to the system, but &lt;code&gt;Ollama&lt;/code&gt; still runs only on &lt;code&gt;CPU&lt;/code&gt;, the issue is often in the version combination rather than in model parameters.&lt;/p&gt;
&lt;h2 id=&#34;7-in-docker-wsl-or-remote-environments-also-check-device-mapping&#34;&gt;7. In Docker, WSL, or remote environments, also check device mapping
&lt;/h2&gt;&lt;p&gt;If you are not running on bare metal but inside:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Docker&lt;/li&gt;
&lt;li&gt;WSL&lt;/li&gt;
&lt;li&gt;Remote containers&lt;/li&gt;
&lt;li&gt;Virtualized environments&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;then you need to check one more layer: whether the GPU device is actually being exposed inside that environment.&lt;/p&gt;
&lt;p&gt;A typical symptom looks like this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The host machine can see the GPU&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Ollama&lt;/code&gt; inside the container or subsystem still uses only &lt;code&gt;CPU&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In that case, the issue may not be &lt;code&gt;Ollama&lt;/code&gt; itself. The container or subsystem may simply not have GPU access.&lt;/p&gt;
&lt;h2 id=&#34;8-check-logs-last-but-check-them-for-the-right-reason&#34;&gt;8. Check logs last, but check them for the right reason
&lt;/h2&gt;&lt;p&gt;If you have already gone through the earlier steps, the most effective next move is not endless reinstalling, but looking directly at the &lt;code&gt;Ollama&lt;/code&gt; startup and runtime logs.&lt;/p&gt;
&lt;p&gt;Focus on two kinds of messages:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Whether a GPU was detected at all&lt;/li&gt;
&lt;li&gt;Whether there are driver, library loading, or device initialization errors&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If the logs clearly say something like “no compatible GPU found” or “failed to initialize ROCm/CUDA,” the troubleshooting direction becomes much clearer immediately.&lt;/p&gt;
&lt;h2 id=&#34;troubleshooting-order&#34;&gt;Troubleshooting Order
&lt;/h2&gt;&lt;p&gt;If you only want the shortest path, use this order:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Run &lt;code&gt;ollama ps&lt;/code&gt; and confirm whether it is &lt;code&gt;GPU&lt;/code&gt;, &lt;code&gt;CPU&lt;/code&gt;, or mixed loading&lt;/li&gt;
&lt;li&gt;Try a smaller model to rule out VRAM limits&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;nvidia-smi&lt;/code&gt;, &lt;code&gt;rocminfo&lt;/code&gt;, and &lt;code&gt;rocm-smi&lt;/code&gt; to verify the lower-level environment first&lt;/li&gt;
&lt;li&gt;Fully restart the &lt;code&gt;Ollama&lt;/code&gt; service&lt;/li&gt;
&lt;li&gt;Check service environment variables, especially &lt;code&gt;ROCR_VISIBLE_DEVICES&lt;/code&gt; and &lt;code&gt;HSA_OVERRIDE_GFX_VERSION&lt;/code&gt; on AMD&lt;/li&gt;
&lt;li&gt;If you are in Docker or WSL, verify device mapping&lt;/li&gt;
&lt;li&gt;Finally, inspect logs for the exact error&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;When &lt;code&gt;Ollama&lt;/code&gt; uses &lt;code&gt;CPU&lt;/code&gt; instead of &lt;code&gt;GPU&lt;/code&gt;, the root cause usually falls into one of three groups:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The GPU is not being detected at all&lt;/li&gt;
&lt;li&gt;The GPU is detectable, but the runtime environment is not reaching &lt;code&gt;Ollama&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;The GPU is working, but the model is too large and falls back to &lt;code&gt;CPU&lt;/code&gt; or mixed memory&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Once you separate those three cases, troubleshooting becomes much faster.&lt;br&gt;
If you are on an AMD platform, pay special attention to &lt;code&gt;ROCm&lt;/code&gt; version matching, device visibility, and compatibility variables instead of focusing only on the &lt;code&gt;Ollama&lt;/code&gt; command itself.&lt;/p&gt;
&lt;p&gt;Original video: &lt;a class=&#34;link&#34; href=&#34;https://www.bilibili.com/video/BV1cHoYBqE8k/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://www.bilibili.com/video/BV1cHoYBqE8k/&lt;/a&gt;&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
