<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>LM Studio on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/lm-studio/</link>
        <description>Recent content in LM Studio on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Wed, 08 Apr 2026 18:42:00 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/lm-studio/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Gemma 4 on Raspberry Pi 5: It Works, But Responses Are Slow</title>
        <link>https://www.knightli.com/en/2026/04/08/gemma4-on-raspberry-pi5-benchmark/</link>
        <pubDate>Wed, 08 Apr 2026 18:42:00 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/04/08/gemma4-on-raspberry-pi5-benchmark/</guid>
        <description>&lt;p&gt;I ran a near-limit experiment: running Gemma 4 on a &lt;code&gt;Raspberry Pi 5 (8GB RAM)&lt;/code&gt;. I was not targeting larger variants, only the smallest &lt;code&gt;E2B&lt;/code&gt; model.&lt;/p&gt;
&lt;p&gt;Conclusion first: it runs and it is usable, but it fits low-interaction workflows better than real-time chat.&lt;/p&gt;
&lt;h2 id=&#34;test-environment&#34;&gt;Test Environment
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;Device: Raspberry Pi 5 (4-core CPU, 8GB RAM)&lt;/li&gt;
&lt;li&gt;OS: Ubuntu Server (no GUI)&lt;/li&gt;
&lt;li&gt;Access method: SSH&lt;/li&gt;
&lt;li&gt;Runtime: LM Studio CLI (command-line-only mode)&lt;/li&gt;
&lt;li&gt;Model: Gemma 4 E2B (about 4.5GB)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;step-1-install-and-start-lm-studio-cli&#34;&gt;Step 1: Install and Start LM Studio CLI
&lt;/h2&gt;&lt;p&gt;I installed the LM Studio CLI build on the Pi, then started the service and checked available commands.&lt;/p&gt;
&lt;p&gt;For a terminal-only setup, this deployment mode is a good fit for Raspberry Pi.&lt;/p&gt;
&lt;h2 id=&#34;step-2-move-model-storage-to-ssd&#34;&gt;Step 2: Move Model Storage to SSD
&lt;/h2&gt;&lt;p&gt;To avoid heavy SD card writes, I switched model download storage to an external SSD.&lt;/p&gt;
&lt;p&gt;On Raspberry Pi 5, SSD usage is much more practical than on older models. For long-term local model runs, SSD is strongly recommended.&lt;/p&gt;
&lt;h2 id=&#34;step-3-download-and-load-gemma-4-e2b&#34;&gt;Step 3: Download and Load Gemma 4 E2B
&lt;/h2&gt;&lt;p&gt;After download, the model loaded into memory successfully.&lt;/p&gt;
&lt;p&gt;According to official information, Gemma 4 includes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tool-calling support for agent-style workflows (function calling)&lt;/li&gt;
&lt;li&gt;Multimodal capabilities (image/video; smaller models also include audio-related capability)&lt;/li&gt;
&lt;li&gt;&lt;code&gt;128K&lt;/code&gt; context window&lt;/li&gt;
&lt;li&gt;Apache 2.0 license (commercial use allowed)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Given Raspberry Pi hardware limits, E2B is the most practical tier to start with.&lt;/p&gt;
&lt;h2 id=&#34;step-4-start-api-and-enable-lan-access&#34;&gt;Step 4: Start API and Enable LAN Access
&lt;/h2&gt;&lt;p&gt;After loading, I started the API on local port &lt;code&gt;4000&lt;/code&gt; and confirmed model listing works via HTTP.&lt;/p&gt;
&lt;p&gt;The issue: by default, it only listens on localhost, so other LAN devices cannot access it directly.&lt;/p&gt;
&lt;p&gt;Since host binding was not exposed by the startup options, I used &lt;code&gt;socat&lt;/code&gt; for port forwarding, bridging an external Pi port to LM Studio&amp;rsquo;s internal port.&lt;/p&gt;
&lt;p&gt;Result: successful. I could query the model list from a MacBook on the same LAN.&lt;/p&gt;
&lt;h2 id=&#34;step-5-connect-to-editor-zed&#34;&gt;Step 5: Connect to Editor (Zed)
&lt;/h2&gt;&lt;p&gt;LM Studio&amp;rsquo;s local server is OpenAI-API-compatible, so most tools that support custom &lt;code&gt;base_url&lt;/code&gt; can connect.&lt;/p&gt;
&lt;p&gt;I added a new LLM provider in Zed pointing to the Pi-hosted Gemma 4 instance, and in-editor chat worked.&lt;/p&gt;
&lt;h2 id=&#34;practical-usability&#34;&gt;Practical Usability
&lt;/h2&gt;&lt;p&gt;This setup is suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Local automation scripts&lt;/li&gt;
&lt;li&gt;Low-concurrency, low-real-time assistant tasks&lt;/li&gt;
&lt;li&gt;Personal learning and edge-device experimentation&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Less suitable for:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;High-frequency interactive chat&lt;/li&gt;
&lt;li&gt;Development collaboration scenarios sensitive to response latency&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;conclusion&#34;&gt;Conclusion
&lt;/h2&gt;&lt;p&gt;Running Gemma 4 (E2B) on &lt;code&gt;Raspberry Pi 5&lt;/code&gt; is feasible, and the practical output quality is better than expected.&lt;/p&gt;
&lt;p&gt;If your goal is offline operation, tool integration, and lightweight-to-mid tasks, this setup is worth trying. If your goal is smooth real-time interaction, stronger hardware is still the better choice.&lt;/p&gt;
&lt;h2 id=&#34;related-posts&#34;&gt;Related Posts
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.knightli.com/en/2026/04/05/google-gemma-4-model-comparison/&#34; &gt;Google Gemma 4 Model Comparison: How to Choose Between 2B/4B/26B/31B&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.knightli.com/en/2026/04/08/android-gemma4-install-run-guide/&#34; &gt;How to Install and Run Gemma 4 on Android: Complete Getting-Started Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.knightli.com/en/2026/04/08/run-gemma4-on-laptop/&#34; &gt;How to Run Gemma 4 on a Laptop: 5-Minute Local Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a class=&#34;link&#34; href=&#34;https://www.knightli.com/en/2026/04/08/openclaw-connect-gemma4-local/&#34; &gt;Connect OpenClaw to Local Gemma 4: Complete Setup Guide&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
    </channel>
</rss>
