VLLM on KnightLi的博客

VLLM on KnightLi的博客 https://www.knightli.com/zh-tw/tags/vllm/ Recent content in VLLM on KnightLi的博客 Hugo -- gohugo.io zh-tw Fri, 10 Apr 2026 22:54:17 +0800 Gemma 4 本地調用指南：從一鍵啟動到開發整合 https://www.knightli.com/zh-tw/2026/04/10/gemma4-local-runtime-options/ Fri, 10 Apr 2026 22:54:17 +0800 https://www.knightli.com/zh-tw/2026/04/10/gemma4-local-runtime-options/ <p>如果你想在本地調用 Gemma 4，可以依需求從以下四種主流方案中選擇。</p> <h2 id="1-最快上手ollama推薦">1) 最快上手：Ollama（推薦） </h2><p>這是門檻最低的方式，適合快速測試、日常對話與本地 API 調用。</p> <div class="highlight"><div class="chroma"> <table class="lntable"><tr><td class="lntd"> <pre tabindex="0" class="chroma"><code><span class="lnt">1 </span></code></pre></td> <td class="lntd"> <pre tabindex="0" class="chroma"><code class="language-bash" data-lang="bash"><span class="line"><span class="cl">ollama run gemma4 </span></span></code></pre></td></tr></table> </div> </div><p>特點：</p> <ul> <li>支援 Win/Mac/Linux</li> <li>自動處理硬體加速</li> <li>提供相容 OpenAI 風格的本地 API</li> </ul> <h2 id="2-圖形介面lm-studio--unsloth-studio">2) 圖形介面：LM Studio / Unsloth Studio </h2><p>如果你偏好桌面 GUI（像 ChatGPT）：</p> <ul> <li>LM Studio：可直接搜尋與下載 Hugging Face 上的 Gemma 4 量化模型（如 4-bit、8-bit），並查看資源占用。</li> <li>Unsloth Studio：除推理外，也支援低顯存微調；對 6GB-8GB 顯存更友善。</li> </ul> <h2 id="3-低配與深度控制llamacpp">3) 低配與深度控制：llama.cpp </h2><p>適合舊機、純 CPU 場景，或希望細調推理參數的使用者。</p> <p>你可以使用 <code>.gguf</code> 模型檔配合量化版本，在更低硬體門檻下運行 Gemma 4。</p> <h2 id="4-開發者整合transformers--vllm">4) 開發者整合：Transformers / vLLM </h2><p>如果你要把 Gemma 4 接進自己的應用：</p> <ul> <li>Transformers：適合 Python 專案直接載入模型</li> <li>vLLM：適合高效能 GPU 與高吞吐推理服務</li> </ul> <h2 id="快速選型">快速選型 </h2><table> <thead> <tr> <th>需求</th> <th>推薦工具</th> <th>硬體門檻</th> </tr> </thead> <tbody> <tr> <td>我只想先跑起來</td> <td>Ollama</td> <td>低（自動適配）</td> </tr> <tr> <td>我想用圖形介面</td> <td>LM Studio</td> <td>中</td> </tr> <tr> <td>顯存很吃緊（6GB-8GB）</td> <td>Unsloth / llama.cpp</td> <td>低</td> </tr> <tr> <td>我要做本地 AI 應用開發</td> <td>Ollama / Transformers / vLLM</td> <td>中到高</td> </tr> <tr> <td>我要做微調訓練</td> <td>Unsloth Studio</td> <td>中到高</td> </tr> </tbody> </table> <h2 id="模型尺寸建議">模型尺寸建議 </h2><p>Gemma 4 有多種尺寸（如 E2B、E4B、31B）。</p> <ul> <li>一般筆電建議先用量化後的 E2B / E4B</li> <li>顯存充足後再嘗試更大版本</li> </ul>