<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Inference Engine on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/inference-engine/</link>
        <description>Recent content in Inference Engine on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Mon, 11 May 2026 08:51:37 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/inference-engine/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>Running DeepSeek 4 Locally: Antirez&#39;s ds4 Experiment on Apple Silicon Mac</title>
        <link>https://www.knightli.com/en/2026/05/11/deepseek-v4-flash-ds4-metal/</link>
        <pubDate>Mon, 11 May 2026 08:51:37 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/05/11/deepseek-v4-flash-ds4-metal/</guid>
        <description>&lt;p&gt;Antirez has open sourced a new project: &lt;code&gt;ds4&lt;/code&gt;. It is not a general-purpose LLM framework, but a local inference engine for DeepSeek V4 Flash, with a focus on Apple Silicon and the Metal backend.&lt;/p&gt;
&lt;p&gt;Project URL: &lt;a class=&#34;link&#34; href=&#34;https://github.com/antirez/ds4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/antirez/ds4&lt;/a&gt;&lt;/p&gt;
&lt;h2 id=&#34;what-is-ds4&#34;&gt;What is ds4?
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;ds4&lt;/code&gt; has a clear goal: running DeepSeek V4 Flash locally on a Mac.&lt;/p&gt;
&lt;p&gt;It currently provides three ways to use it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Interactive CLI.&lt;/li&gt;
&lt;li&gt;HTTP server.&lt;/li&gt;
&lt;li&gt;An experimental Agent mode.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Judging from its positioning, it is more like an inference project deeply optimized for one specific model than a replacement for general-purpose tools such as &lt;code&gt;llama.cpp&lt;/code&gt;, Ollama, or vLLM.&lt;/p&gt;
&lt;h2 id=&#34;why-it-is-worth-watching&#34;&gt;Why it is worth watching
&lt;/h2&gt;&lt;p&gt;There are three main reasons this kind of project is worth following.&lt;/p&gt;
&lt;p&gt;First, the author is Antirez, the creator of Redis. He has long focused on low-level systems, performance, and simple tools, and his projects are usually quite direct in style.&lt;/p&gt;
&lt;p&gt;Second, DeepSeek V4 Flash points toward efficient inference. If the local running experience is good enough, it could be very attractive for Mac users.&lt;/p&gt;
&lt;p&gt;Third, &lt;code&gt;ds4&lt;/code&gt; directly targets Apple Metal. Compared with the route of supporting every platform first and optimizing later, it feels more like a project trying to go deep on one well-defined scenario.&lt;/p&gt;
&lt;h2 id=&#34;who-should-try-it&#34;&gt;Who should try it
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;ds4&lt;/code&gt; is better suited for users who:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Use an Apple Silicon Mac.&lt;/li&gt;
&lt;li&gt;Want to run DeepSeek V4 Flash locally.&lt;/li&gt;
&lt;li&gt;Care about Metal inference performance.&lt;/li&gt;
&lt;li&gt;Are willing to try an alpha-stage project.&lt;/li&gt;
&lt;li&gt;Want to study lightweight inference engines and model runtime details.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If your goal is stable deployment, cross-platform operation, or OpenAI API-compatible infrastructure, it may not be the first choice at this stage. It is better treated as an experimental tool and a technical project to watch.&lt;/p&gt;
&lt;h2 id=&#34;how-to-use-it&#34;&gt;How to use it
&lt;/h2&gt;&lt;p&gt;The basic workflow in the project README is to build it first, then run it.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;git clone https://github.com/antirez/ds4.git
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;cd&lt;/span&gt; ds4
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;make
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Run it interactively:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./ds4
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Start the HTTP server:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./ds4 --server
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Agent mode:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;./ds4 --agent
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For exact parameters and model file preparation, follow the repository README, because the project is still changing quickly.&lt;/p&gt;
&lt;h2 id=&#34;current-risks&#34;&gt;Current risks
&lt;/h2&gt;&lt;p&gt;&lt;code&gt;ds4&lt;/code&gt; is still at an early stage, so set expectations before using it:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Features may be incomplete.&lt;/li&gt;
&lt;li&gt;Parameters, model formats, and command-line behavior may change.&lt;/li&gt;
&lt;li&gt;Compatibility mainly revolves around Apple Silicon and Metal.&lt;/li&gt;
&lt;li&gt;Agent mode is more experimental and is not suitable for direct production use.&lt;/li&gt;
&lt;li&gt;When something breaks, you may need to read the README, issues, or source code yourself.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In other words, it is currently more of an open source experiment worth trying than a one-click tool for ordinary users.&lt;/p&gt;
&lt;h2 id=&#34;how-it-differs-from-general-inference-tools&#34;&gt;How it differs from general inference tools
&lt;/h2&gt;&lt;p&gt;General-purpose inference tools usually aim for broad compatibility across model formats, platforms, backends, and APIs. &lt;code&gt;ds4&lt;/code&gt; takes a narrower path: local DeepSeek V4 Flash inference on Metal.&lt;/p&gt;
&lt;p&gt;That choice has both benefits and trade-offs.&lt;/p&gt;
&lt;p&gt;The benefit is that the implementation can stay focused, making performance and user experience easier to optimize around a single target. The trade-off is a limited scope: it is not meant to run every possible model, nor to replace a complete deployment platform.&lt;/p&gt;
&lt;p&gt;If you already use &lt;code&gt;llama.cpp&lt;/code&gt; or Ollama, &lt;code&gt;ds4&lt;/code&gt; is better treated as a supplementary testing tool, not an immediate replacement for your existing workflow.&lt;/p&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The interesting part of &lt;code&gt;ds4&lt;/code&gt; is not that it is yet another local LLM tool. It is that its scope is intentionally narrow: DeepSeek V4 Flash, Apple Silicon, Metal, and local inference.&lt;/p&gt;
&lt;p&gt;If you have a suitable Mac and are willing to tinker with an early-stage project, it is worth watching its performance, model support approach, and server/agent capabilities. For production environments, it is better to keep observing until the interfaces and usage patterns become more stable.&lt;/p&gt;
&lt;h2 id=&#34;references&#34;&gt;References
&lt;/h2&gt;&lt;ul&gt;
&lt;li&gt;GitHub project: &lt;a class=&#34;link&#34; href=&#34;https://github.com/antirez/ds4&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/antirez/ds4&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>
        </item>
        
    </channel>
</rss>
