How to Check Whether an Ollama Model Is Loaded on GPU

If you want to confirm whether an Ollama model is actually running on GPU, the most direct way is checking processor allocation for currently loaded models.

Command

1

ollama ps

Example Output

1
2


NAME        ID            SIZE    PROCESSOR   UNTIL
llama3:70b  bcfb190ca3a7  42 GB   100% GPU    4 minutes from now

How to Read the `PROCESSOR` Column

100% GPU: The model is fully loaded into GPU VRAM.
100% CPU: The model is fully loaded in system memory (no GPU inference).
48%/52% CPU/GPU: The model is split between system memory and GPU VRAM.

Practical Tips

If you expect GPU usage but see 100% CPU, first check GPU drivers, CUDA/ROCm environment, and Ollama runtime settings.
With larger models and limited VRAM, CPU/GPU mixed loading is common.
For performance troubleshooting, run ollama ps before checking speed metrics to locate bottlenecks faster.

Summary

ollama ps is the first step to verify real GPU usage. Focus on the PROCESSOR column to quickly identify where the model is loaded and decide your next optimization action.