How to Check Whether an Ollama Model Is Loaded on GPU

Use `ollama ps` to quickly verify whether a model is running on GPU, CPU, or a CPU/GPU mixed memory setup, and learn how to read the `PROCESSOR` column.

If you want to confirm whether an Ollama model is actually running on GPU, the most direct way is checking processor allocation for currently loaded models.

Command

1
ollama ps

Example Output

1
2
NAME        ID            SIZE    PROCESSOR   UNTIL
llama3:70b  bcfb190ca3a7  42 GB   100% GPU    4 minutes from now

How to Read the PROCESSOR Column

  • 100% GPU: The model is fully loaded into GPU VRAM.
  • 100% CPU: The model is fully loaded in system memory (no GPU inference).
  • 48%/52% CPU/GPU: The model is split between system memory and GPU VRAM.

Practical Tips

  1. If you expect GPU usage but see 100% CPU, first check GPU drivers, CUDA/ROCm environment, and Ollama runtime settings.
  2. With larger models and limited VRAM, CPU/GPU mixed loading is common.
  3. For performance troubleshooting, run ollama ps before checking speed metrics to locate bottlenecks faster.

Summary

ollama ps is the first step to verify real GPU usage. Focus on the PROCESSOR column to quickly identify where the model is loaded and decide your next optimization action.

记录并分享
Built with Hugo
Theme Stack designed by Jimmy