Antirez has open sourced a new project: ds4. It is not a general-purpose LLM framework, but a local inference engine for DeepSeek V4 Flash, with a focus on Apple Silicon and the Metal backend.
Project URL: https://github.com/antirez/ds4
What is ds4?
ds4 has a clear goal: running DeepSeek V4 Flash locally on a Mac.
It currently provides three ways to use it:
- Interactive CLI.
- HTTP server.
- An experimental Agent mode.
Judging from its positioning, it is more like an inference project deeply optimized for one specific model than a replacement for general-purpose tools such as llama.cpp, Ollama, or vLLM.
Why it is worth watching
There are three main reasons this kind of project is worth following.
First, the author is Antirez, the creator of Redis. He has long focused on low-level systems, performance, and simple tools, and his projects are usually quite direct in style.
Second, DeepSeek V4 Flash points toward efficient inference. If the local running experience is good enough, it could be very attractive for Mac users.
Third, ds4 directly targets Apple Metal. Compared with the route of supporting every platform first and optimizing later, it feels more like a project trying to go deep on one well-defined scenario.
Who should try it
ds4 is better suited for users who:
- Use an Apple Silicon Mac.
- Want to run DeepSeek V4 Flash locally.
- Care about Metal inference performance.
- Are willing to try an alpha-stage project.
- Want to study lightweight inference engines and model runtime details.
If your goal is stable deployment, cross-platform operation, or OpenAI API-compatible infrastructure, it may not be the first choice at this stage. It is better treated as an experimental tool and a technical project to watch.
How to use it
The basic workflow in the project README is to build it first, then run it.
|
|
Run it interactively:
|
|
Start the HTTP server:
|
|
Agent mode:
|
|
For exact parameters and model file preparation, follow the repository README, because the project is still changing quickly.
Current risks
ds4 is still at an early stage, so set expectations before using it:
- Features may be incomplete.
- Parameters, model formats, and command-line behavior may change.
- Compatibility mainly revolves around Apple Silicon and Metal.
- Agent mode is more experimental and is not suitable for direct production use.
- When something breaks, you may need to read the README, issues, or source code yourself.
In other words, it is currently more of an open source experiment worth trying than a one-click tool for ordinary users.
How it differs from general inference tools
General-purpose inference tools usually aim for broad compatibility across model formats, platforms, backends, and APIs. ds4 takes a narrower path: local DeepSeek V4 Flash inference on Metal.
That choice has both benefits and trade-offs.
The benefit is that the implementation can stay focused, making performance and user experience easier to optimize around a single target. The trade-off is a limited scope: it is not meant to run every possible model, nor to replace a complete deployment platform.
If you already use llama.cpp or Ollama, ds4 is better treated as a supplementary testing tool, not an immediate replacement for your existing workflow.
Summary
The interesting part of ds4 is not that it is yet another local LLM tool. It is that its scope is intentionally narrow: DeepSeek V4 Flash, Apple Silicon, Metal, and local inference.
If you have a suitable Mac and are willing to tinker with an early-stage project, it is worth watching its performance, model support approach, and server/agent capabilities. For production environments, it is better to keep observing until the interfaces and usage patterns become more stable.
References
- GitHub project: https://github.com/antirez/ds4