Gemma 4 on Raspberry Pi 5: It Works, But Responses Are Slow

Wed, 08 Apr 2026 18:42:00 +0800

I ran a near-limit experiment: running Gemma 4 on a Raspberry Pi 5 (8GB RAM). I was not targeting larger variants, only the smallest E2B model.

Conclusion first: it runs and it is usable, but it fits low-interaction workflows better than real-time chat.

Test Environment

Device: Raspberry Pi 5 (4-core CPU, 8GB RAM)
OS: Ubuntu Server (no GUI)
Access method: SSH
Runtime: LM Studio CLI (command-line-only mode)
Model: Gemma 4 E2B (about 4.5GB)

Step 1: Install and Start LM Studio CLI

I installed the LM Studio CLI build on the Pi, then started the service and checked available commands.

For a terminal-only setup, this deployment mode is a good fit for Raspberry Pi.

Step 2: Move Model Storage to SSD

To avoid heavy SD card writes, I switched model download storage to an external SSD.

On Raspberry Pi 5, SSD usage is much more practical than on older models. For long-term local model runs, SSD is strongly recommended.

Step 3: Download and Load Gemma 4 E2B

After download, the model loaded into memory successfully.

According to official information, Gemma 4 includes:

Tool-calling support for agent-style workflows (function calling)
Multimodal capabilities (image/video; smaller models also include audio-related capability)
128K context window
Apache 2.0 license (commercial use allowed)

Given Raspberry Pi hardware limits, E2B is the most practical tier to start with.

Step 4: Start API and Enable LAN Access

After loading, I started the API on local port 4000 and confirmed model listing works via HTTP.

The issue: by default, it only listens on localhost, so other LAN devices cannot access it directly.

Since host binding was not exposed by the startup options, I used socat for port forwarding, bridging an external Pi port to LM Studio’s internal port.

Result: successful. I could query the model list from a MacBook on the same LAN.

Step 5: Connect to Editor (Zed)

LM Studio’s local server is OpenAI-API-compatible, so most tools that support custom base_url can connect.

I added a new LLM provider in Zed pointing to the Pi-hosted Gemma 4 instance, and in-editor chat worked.

Practical Usability

This setup is suitable for:

Local automation scripts
Low-concurrency, low-real-time assistant tasks
Personal learning and edge-device experimentation

Less suitable for:

High-frequency interactive chat
Development collaboration scenarios sensitive to response latency

Conclusion

Running Gemma 4 (E2B) on Raspberry Pi 5 is feasible, and the practical output quality is better than expected.

If your goal is offline operation, tool integration, and lightweight-to-mid tasks, this setup is worth trying. If your goal is smooth real-time interaction, stronger hardware is still the better choice.

LM Studio on KnightLi Blog