Claude Code Token-Saving Guide: How Models, MCP, CLAUDE.md, and Skills Affect Cache

In long Claude Code tasks, Prompt Cache hit rate directly affects cost and speed. Many users know that caching can save tokens, but not which actions make the cache suddenly miss.

The simplest mental model is a left-to-right context chain:

1

tools -> system -> CLAUDE.md / skills -> messages

The farther left something sits, the more stable it should be and the larger the cache benefit. If a left-side section changes, everything after it may need to be recalculated. If a right-side section changes, the impact is smaller.

So optimizing Prompt Cache in Claude Code is not guesswork. The rule is simple: before a task begins, prepare the model, MCP servers, Skills, CLAUDE.md, and other base context. Once the task starts, change as little of that fixed context as possible.

Prompt Cache does not cache plain text

Prompt Cache is not just a string cache for prompts. In Transformer inference, what matters is the Key/Value state calculated by attention layers from the prefix context, usually called KV cache.

That means two things:

If the prefix stays stable, part of the previous computation can be reused.
If the model, tool definitions, system prompt, or prefix messages change, old cache entries may no longer match.

Anthropic’s documentation summarizes the invalidation hierarchy as tools -> system -> messages. Changes to tool definitions can invalidate the whole cache; system changes affect system and messages; message changes mainly affect message cache.

Claude Code adds more context sources such as CLAUDE.md, Skills, MCP, plugins, and subagents, so it is easier to accidentally break cache reuse.

Cache killer 1: switching models mid-task

Switching models is one of the most expensive changes.

Prompt Cache is isolated by model. Opus, Sonnet, and Haiku have different architectures and weights, so the KV cache calculated from the same text is not interchangeable. If you build a long context in Opus and then switch to Sonnet, Sonnet cannot reuse Opus’s cache.

This creates a counterintuitive result: switching models mid-task to save money may make the previous cache useless. Context that could have been read at cache-read price may need to be written and computed again.

A steadier pattern is:

Keep the main conversation on one model.
Use a subagent for side tasks that can run on a cheaper model.
Let the side agent search, explore, or summarize, then hand a concise result back to the main conversation.

This keeps the long main-context prefix stable and improves cache hit consistency.

Cache killer 2: adding MCP or reloading plugins mid-task

MCP provides tools to Claude Code. When you add an MCP server, the tool list changes, and tool definitions sit at the far left of the context chain.

From a Prompt Cache perspective, when the tool list changes, the system and messages that follow may need to be recalculated. If you use many MCP servers, the tool definitions themselves can be large, so the cost of invalidation becomes obvious.

One detail matters: Claude Code usually reads MCP configuration at session startup. Changing config mid-session may not affect the current session immediately. The dangerous moments are restart, resume, plugin reload, or anything that rebuilds the tool list.

Recommended practice:

Install required MCP servers before starting a long task.
Avoid discovering missing tools halfway through and then reloading.
Reduce default-enabled MCP servers when possible.
Do not keep rarely used MCP servers always enabled.

Stable tool definitions are the foundation of stable Prompt Cache hits.

Cache killer 3: editing CLAUDE.md mid-session

CLAUDE.md is Claude Code’s project memory file. It is useful for build commands, test commands, architecture conventions, code style, and project-specific constraints.

It is helpful, but it also enters the context. Claude’s help documentation explains that CLAUDE.md is read at session start and delivered as a user message. It also benefits from Anthropic Prompt Cache: the first request pays full input price, while later requests can hit the lower cache-read price if the cache is still valid.

The catch is that CLAUDE.md is content-addressed. Once the file changes, the old cache no longer matches.

So avoid frequently editing CLAUDE.md during a long task. Better practices:

Check whether CLAUDE.md is sufficient before the task starts.
Put stable rules in the file and temporary instructions in the current conversation.
Do not edit long-term memory for one-off instructions.
If you must change it, treat the next stage as a new session or new phase.

CLAUDE.md should be stable project guidance, not a scratchpad that changes every round.

Cache killer 4: installing or updating Skills mid-task

Skills are also part of the context. Installing a new Skill, updating a Skill, or changing the Skill list changes what gets injected into the session.

These changes often do not fully take effect until reload, resume, or a new session. Once messages are rebuilt, old cache entries may no longer match.

The same advice applies:

Decide which Skills are needed before starting.
Keep the Skill set stable for the same kind of task.
Avoid installing Skills in the middle of a long task.
If you install a new Skill, treat it as the beginning of a new stage.

For repeatable workflows such as content production, review, deployment, and translation, keeping a fixed Skill set helps keep the context structure stable.

Cache killer 5: idle time exceeding TTL

Prompt Cache does not last forever. A common default TTL is on the order of minutes, and Claude Code-related documentation often refers to roughly a five-minute cache window. After TTL expires, even the same request may need to rebuild the cache.

This explains a common feeling in long tasks: everything was cheap and fast, then after a coffee break the token cost jumps again.

Long tasks hit this easily. You may review Claude Code output, inspect files, run tests, or think about the next step. Five minutes can disappear quickly.

If your environment supports it, you can request a one-hour Prompt Cache TTL before long tasks:

1

export ENABLE_PROMPT_CACHING_1H=1

In Windows PowerShell:

1

$env:ENABLE_PROMPT_CACHING_1H="1"

One-hour cache writes usually cost more than five-minute cache writes. It is not always worth it for short tasks, but for large codebases, long conversations, and complex multi-step development, it may be cheaper than repeated cache expiration.

A token-saving Claude Code workflow

A steadier long-task setup looks like this:

Choose the model before the task starts and avoid frequent switching.
Enable the MCP servers you need and disable the ones you do not.
Keep CLAUDE.md short, stable, and focused on durable rules.
Prepare the Skills needed for this task in advance.
For complex tasks, consider one-hour TTL.
Split the task into phases, but keep context structure stable within each phase.
Use subagents or separate sessions for side exploration instead of disturbing the main conversation.

The goal is not to prevent every cache miss. It is to avoid the high-cost misses that are easy to overlook.

A simple rule of thumb

Ask one question:

Does this operation change the model, tool definitions, system context, or fixed messages near the start of the session?

If yes, it probably affects Prompt Cache. The farther left it is in the context chain, the greater the impact.

Common operations:

Switch model: high risk, model caches are isolated.
Add MCP or reload plugins: high risk, tool list changes.
Edit CLAUDE.md: medium-high risk, project memory changes.
Install Skills: medium-high risk, injected context changes.
Continue normal conversation: low risk, mostly appends messages.
Idle past TTL: high risk, server-side cache expires.

Summary

Prompt Cache optimization in Claude Code is about keeping the session prefix stable.

Do not switch models casually. Do not install MCP servers and Skills halfway through. Do not use CLAUDE.md as a temporary scratchpad. For complex tasks, consider a longer TTL. Once these basics are stable, token cost and response speed become much more predictable.

The most practical sentence is: configure before you start, change less after you start.