I ran a near-limit experiment: running Gemma 4 on a Raspberry Pi 5 (8GB RAM). I was not targeting larger variants, only the smallest E2B model.
Conclusion first: it runs and it is usable, but it fits low-interaction workflows better than real-time chat.
Test Environment
- Device: Raspberry Pi 5 (4-core CPU, 8GB RAM)
- OS: Ubuntu Server (no GUI)
- Access method: SSH
- Runtime: LM Studio CLI (command-line-only mode)
- Model: Gemma 4 E2B (about 4.5GB)
Step 1: Install and Start LM Studio CLI
I installed the LM Studio CLI build on the Pi, then started the service and checked available commands.
For a terminal-only setup, this deployment mode is a good fit for Raspberry Pi.
Step 2: Move Model Storage to SSD
To avoid heavy SD card writes, I switched model download storage to an external SSD.
On Raspberry Pi 5, SSD usage is much more practical than on older models. For long-term local model runs, SSD is strongly recommended.
Step 3: Download and Load Gemma 4 E2B
After download, the model loaded into memory successfully.
According to official information, Gemma 4 includes:
- Tool-calling support for agent-style workflows (function calling)
- Multimodal capabilities (image/video; smaller models also include audio-related capability)
128Kcontext window- Apache 2.0 license (commercial use allowed)
Given Raspberry Pi hardware limits, E2B is the most practical tier to start with.
Step 4: Start API and Enable LAN Access
After loading, I started the API on local port 4000 and confirmed model listing works via HTTP.
The issue: by default, it only listens on localhost, so other LAN devices cannot access it directly.
Since host binding was not exposed by the startup options, I used socat for port forwarding, bridging an external Pi port to LM Studio’s internal port.
Result: successful. I could query the model list from a MacBook on the same LAN.
Step 5: Connect to Editor (Zed)
LM Studio’s local server is OpenAI-API-compatible, so most tools that support custom base_url can connect.
I added a new LLM provider in Zed pointing to the Pi-hosted Gemma 4 instance, and in-editor chat worked.
Practical Usability
This setup is suitable for:
- Local automation scripts
- Low-concurrency, low-real-time assistant tasks
- Personal learning and edge-device experimentation
Less suitable for:
- High-frequency interactive chat
- Development collaboration scenarios sensitive to response latency
Conclusion
Running Gemma 4 (E2B) on Raspberry Pi 5 is feasible, and the practical output quality is better than expected.
If your goal is offline operation, tool integration, and lightweight-to-mid tasks, this setup is worth trying. If your goal is smooth real-time interaction, stronger hardware is still the better choice.