If you want to run Gemma 4 locally on a laptop, Ollama is one of the fastest and simplest options. Even without complex setup, you can usually get it running in about five minutes.
Step 1: Install Ollama
- Open
https://ollama.comand download the installer for your OS. - Complete installation based on your system:
- macOS: drag it to
Applications. - Windows: run the
.exeinstaller. - Linux: use the install script from the official site.
After installation, Ollama runs as a background service. Beyond initial setup, daily usage is mostly simple commands.
Step 2: Download a Gemma 4 Model
Open a terminal and run:
|
|
If your machine is stronger, you can switch to 12b or 27b. Once downloaded, the model is stored locally.
Check downloaded models with:
|
|
Step 3: Run the Model
|
|
This opens an interactive chat session in your terminal. Type your prompt and press Enter. To exit, type:
|
|
If you prefer a browser chat UI, you can pair it with Open WebUI. It wraps Ollama with a local web interface and is usually quick to set up with Docker.
Laptop Performance Tips
- Apple Silicon (M2/M3/M4): Metal acceleration is enabled by default, and
12Bcan run well. - NVIDIA GPU: CUDA is used automatically when a compatible GPU is detected. Keep drivers updated.
- CPU-only inference: works, but larger models will be slower. For most CPU-only setups,
4Bis the practical default. - Free memory before loading large models: as a rough rule, each billion parameters needs about
0.5GB to 1GBRAM.
How to Choose a Model
Gemma 4 1B: good for lightweight Q&A, simple summarization, and quick lookups; limited on complex reasoning.Gemma 4 4B: best for most daily tasks (writing help, coding help, document summarization) with strong speed/quality balance.Gemma 4 12B: better for longer context and more complex tasks, especially coding and reasoning.Gemma 4 27B: better for high-demand workloads and closer to frontier-cloud quality, but needs significantly stronger hardware.