If you want to confirm whether an Ollama model is actually running on GPU, the most direct way is checking processor allocation for currently loaded models.
Command
|
|
Example Output
|
|
How to Read the PROCESSOR Column
100% GPU: The model is fully loaded into GPU VRAM.100% CPU: The model is fully loaded in system memory (no GPU inference).48%/52% CPU/GPU: The model is split between system memory and GPU VRAM.
Practical Tips
- If you expect GPU usage but see
100% CPU, first check GPU drivers, CUDA/ROCm environment, and Ollama runtime settings. - With larger models and limited VRAM, CPU/GPU mixed loading is common.
- For performance troubleshooting, run
ollama psbefore checking speed metrics to locate bottlenecks faster.
Summary
ollama ps is the first step to verify real GPU usage. Focus on the PROCESSOR column to quickly identify where the model is loaded and decide your next optimization action.