How DeepSeek V4 Price Cuts Rewrite the Cost Model for AI Agents

Fri, 01 May 2026 19:47:47 +0800

DeepSeek V4 did not arrive with an especially loud launch. There was no major event, nor a benchmark story that instantly crushed every competitor. But a few days later, the part that truly affects the industry became visible: repeated price cuts.

The point of this change is not that “the model got a little stronger”, but that “usage cost has been pushed into another tier”. When token prices become low enough that an ordinary Agent task can finish for a few cents or a couple of yuan, the business logic behind many Coding Plans and Token Plans needs to be reconsidered.

Launch Day Was Not Explosive

The first wave of feedback to DeepSeek V4 was not especially heated. Many people expected it to deliver the kind of shock R1 did: across-the-board benchmark leadership, validation of domestic compute, and simultaneous breakthroughs in multimodal and Agent capabilities. After the actual release, however, it looked more like a steady upgrade.

V4 Pro is indeed a strong model, especially in coding, math, long context, and agentic coding. But it is not the kind of product that instantly makes every peer model look outdated. So on launch day, the discussion felt a little awkward: people wanted to praise it, but it was hard to find a sufficiently explosive angle.

The real turning point was not launch day, but the price adjustments that followed.

Successive Price Cuts Are the Key

After DeepSeek V4 was released, prices started to move downward. According to DeepSeek’s official pricing page and the information summarized in the source article, the rough prices at that time were:

DeepSeek V4 Flash: about 1 yuan per 1 million input tokens; about 0.02 yuan per 1 million tokens after a cache hit;
DeepSeek V4 Pro: about 3 yuan per 1 million input tokens; about 0.025 yuan per 1 million tokens after a cache hit;
the cache-hit input price across the model family dropped to one tenth of the launch price;
V4 Pro was once in a 75% discount period, extended until May 31, 2026 at 23:59.

The API prices in US dollars make the difference easier to see:

Model	Cached input	Non-cached input	Output	Context
`deepseek-v4-flash`	$0.0028 / 1M tokens	$0.14 / 1M tokens	$0.28 / 1M tokens	1M
`deepseek-v4-pro` promotional price	$0.003625 / 1M tokens	$0.435 / 1M tokens	$0.87 / 1M tokens	1M
`deepseek-v4-pro` regular price	$0.0145 / 1M tokens	$1.74 / 1M tokens	$3.48 / 1M tokens	1M

Two details matter here.

First, V4 Pro’s $0.435 / $0.87 is a promotional price, not the long-term regular price. In DeepSeek’s official notes, this 75% discount was extended until May 31, 2026 at 15:59 UTC.

Second, cache-hit pricing is the key variable in the Agent cost model. Flash’s cached input price is as low as $0.0028 / 1M tokens, while Pro’s promotional cached input price is $0.003625 / 1M tokens. That means repeated project context, tool definitions, system prompts, and historical summaries no longer need to be charged at the full input price.

The most important thing about this pricing is that it makes the token cost of many tasks “insensitive”. In the past, developers worried that one Agent task would consume a large amount of context, repeatedly read and write code, and call tools frequently. Now, as long as the cache hit rate is high enough, the cost can be pushed very low.

Price Comparison With GPT and Claude

DeepSeek’s own prices alone do not fully convey the gap. The contrast becomes much clearer when placed next to common closed-source models from the same period.

Model	Input	Cached input	Output	Best fit
`deepseek-v4-flash`	$0.14 / M	$0.0028 / M	$0.28 / M	High-frequency Agents, routine coding, batch tasks
`deepseek-v4-pro` promotional price	$0.435 / M	$0.003625 / M	$0.87 / M	Complex coding, planning, fact checking
`deepseek-v4-pro` regular price	$1.74 / M	$0.0145 / M	$3.48 / M	Pro cost baseline after the promotion
GPT-5.5	$5 / M	$0.50 / M	$30 / M	High-quality complex tasks, general reasoning
GPT-5.4	$2.50 / M	$0.25 / M	$15 / M	Mid-range choice for programming and professional tasks
GPT-5.4 mini	$0.75 / M	$0.075 / M	$4.50 / M	Lower-cost general and subtask model
Claude Opus 4.7	$5 / M	$0.50 / M	$25 / M	High-quality writing, complex reasoning, long tasks
Claude Sonnet 4.6	$3 / M	$0.30 / M	$15 / M	Programming, Agents, general work
Claude Haiku 4.5	$1 / M	$0.10 / M	$5 / M	Lightweight tasks, summarization, classification

The most striking number in this table is output price. Agents do not only read context; they also keep generating plans, patches, explanations, logs, and next actions. If there is a lot of output, DeepSeek V4 Pro’s promotional $0.87 / M becomes dramatically cheaper than GPT-5.5’s $30 / M or Claude Sonnet 4.6’s $15 / M.

Even at V4 Pro’s regular output price of $3.48 / M, it is still clearly below GPT-5.4, GPT-5.5, and Claude Sonnet / Opus. If the task can be handled by Flash, the output price drops further to $0.28 / M.

The cached input gap is even more extreme. DeepSeek V4 Flash’s cached input price is $0.0028 / M, while GPT-5.5 and Claude Opus 4.7 are both $0.50 / M. These are not in the same order of magnitude. For Agents that repeatedly read the same code repository, this gap matters more than it does in ordinary chat.

Why Agent Tasks Are Especially Affected

AI Agents are different from ordinary chat. Ordinary chat is usually a question-and-answer flow with relatively limited input context. Agent tasks repeatedly read project files, generate plans, call tools, inspect results, and then modify code again.

These tasks have two traits:

large token consumption;
lots of repeated context.

The second point is crucial. In a code project, the model repeatedly reads the same files, directory structure, error logs, and modification results. If the platform supports cache hits, the cost of repeated input drops sharply.

The source article mentioned a real experience: connecting DeepSeek V4 Pro and Flash to a Claude Code-like tool, asking it to pull a prompt repository and turn it into a local search site. The task was completed, with a total cost of roughly a little over 0.8 yuan, and Pro reached a cache hit rate of 98.7%.

This example illustrates a practical issue: the more an Agent task resembles “repeated work around the same project”, the more valuable cache hits become. If generating a website, fixing a bug, or changing a frontend costs only a few cents to a few yuan, subscription plans become less attractive.

We can estimate the gap with a simplified task. Assume one coding agent task includes:

500,000 input tokens, of which 80% can hit cache;
50,000 output tokens;
no tool calls, search costs, or platform markup included, only model token cost.

The rough costs are:

Model	Estimated cost
DeepSeek V4 Flash	about $0.03
DeepSeek V4 Pro promotional price	about $0.09
DeepSeek V4 Pro regular price	about $0.36
GPT-5.4 mini	about $0.30
GPT-5.4	about $1.01
GPT-5.5	about $1.75
Claude Sonnet 4.6	about $1.11
Claude Opus 4.7	about $1.65

This estimate does not mean DeepSeek is better for every task. Model quality, tool-call stability, long-context retrieval ability, coding style, and factual reliability all need separate evaluation. But from a cost perspective, DeepSeek V4 pushes the marginal cost of “letting the Agent run a few more rounds” very low. That will encourage developers to design longer workflows, more frequent self-checks, and more candidate solutions instead of worrying about the token bill every time.

The Difference Between Coding Plans and Token Plans

Many AI products now offer two types of plans: Coding Plans and Token Plans.

The rough difference is:

Coding Plans are usually mainly for programming;
Token Plans usually cover more capabilities, such as STT, TTS, image generation, search, embedding, and RAG;
STT means speech to text;
TTS means text to speech;
Coding Plans often restrict users to programming scenarios, while other capabilities still require separate purchases.

From a business perspective, a Coding Plan is more like a buffet. Users pay a fixed fee in advance, while the vendor bets that most people will not use up the quota. Some users consume more, others consume less, and the platform can still make money on average.

But if pay-as-you-go token prices are low enough, users start calculating: why do I have to buy a plan? If the real monthly usage cost is only a few yuan or a dozen yuan, a 40-yuan or 200-yuan plan may no longer be worthwhile.

Why Price Cuts Challenge the Subscription Model

Subscription plans rely on one premise: users feel that each individual use is expensive, or they do not want to calculate the cost of every call. When token prices are high, a plan feels reassuring. When token prices are almost negligible, pay-as-you-go becomes more natural.

DeepSeek V4’s price cut effectively reveals the underlying cost:

Agent tasks can be very cheap;
long context is not necessarily too expensive to use;
cache hits can reduce cost significantly;
ordinary developers do not necessarily need a fixed subscription;
the model entry point can shift from a “plan platform” to a “low-cost API”.

This will make platforms built around Coding Plans uncomfortable. If users find pay-as-you-go calls cheaper and freer, they have less reason to be locked into one platform’s subscription.

How to Choose Between Flash and Pro

A practical way to use DeepSeek V4 is to split work between Flash and Pro.

Flash is suitable for high-frequency, lightweight, repeatable tasks:

fixing bugs;
writing frontend code;
writing scripts;
routine code understanding;
processing ordinary information in long context;
running large numbers of subtasks.

Flash is cheap, fast, and also supports very long context. For everyday coding agents, many tasks do not need Pro from the start.

Pro is better for complex judgment and fallback work:

multi-round planning;
complex Agent workflows;
multiple function calls;
fact checking;
financial research;
content production that requires stronger knowledge and judgment;
high-risk code changes.

A reasonable setup is: Flash handles volume, Pro handles fallback. Start ordinary tasks with Flash, then switch to Pro for long-horizon planning, complex judgment, fact checking, or multi-tool collaboration. This keeps cost under control while preserving model quality.

Why DeepSeek Can Price This Way

DeepSeek has a different business structure from many large platforms. It does not have e-commerce, social networking, short video, cloud computing, phones, cars, office suites, operating systems, browsers, or a large enterprise SaaS ecosystem.

That means it does not need to lock users into a complete platform. It can simply sell text model capability: use cheap text models here, and call any other capability elsewhere.

Large platforms usually think differently. If you buy their Coding Plan or Token Plan, you are pulled into their cloud, search, image generation, voice, database, and developer-tool ecosystem. The plan is not merely selling the model; it is competing for the user entry point.

DeepSeek’s approach is more direct: push text model prices down and try to become the default model entry point for Agents. Once the default entry point is occupied, many developers and toolchains will naturally adapt around it.

Open Models and the Default Entry Point

If DeepSeek V4 keeps an open model route, third-party cloud vendors and platforms may deploy it themselves and provide services. For DeepSeek, that is both distribution and potential diversion.

This is where a low-price official API matters. If the official price is already low enough, other platforms will struggle to offer an obvious price advantage even if they can deploy the model. Users will tend to use the default, cheap, stable entry point directly.

This is especially true for Agent tools. Agent tasks depend on long context, caching, tool calls, and stable throughput. Once a model is cheap enough in these scenarios, it has a chance to become the default option.

Coding Plans Are Still Not Useless

This does not mean Coding Plans will disappear immediately. They still fit some users.

If some users are truly heavy users who max out their quota every day, a fixed subscription may still be economical. Just like a buffet, if nobody could ever eat enough to get their money’s worth, users would not buy it.

The problem is that most users are not that kind of extremely high-frequency user. Low-frequency users, lightweight developers, and people who occasionally write scripts or modify projects are better suited to pay-as-you-go. After DeepSeek lowers pay-as-you-go costs, the appeal of plans weakens.

The future is more likely to become a layered choice:

heavy high-frequency users keep buying Coding Plans;
ordinary users move to low-cost APIs;
Agent tools automatically choose Flash / Pro according to the task;
platform plans need to provide more non-model value, such as workflows, IDE integration, deployment, team management, and security auditing.

Summary

DeepSeek V4 did not create its biggest impact through benchmarks. What truly changed industry expectations was the price reduction that followed.

When input tokens and cache-hit pricing are pushed very low, the cost of using AI Agents changes. Long context, code-project analysis, and multi-round tool calls that used to look expensive may now become everyday costs of a few cents to a few yuan.

This directly challenges the business logic of Coding Plans and Token Plans. If users can pay by usage, freely combine models and tools, and keep costs low enough, they may not want to be tied to a specific platform plan.

What DeepSeek V4 truly touches this time is not only the ranking of model capability, but the cost structure of AI Agents and the battle for the default entry point.

References:

LLM Pricing on KnightLi Blog