Using Claude Code Quota More Efficiently: Models, Context, Caching, and /compact

Sun, 19 Apr 2026 15:29:06 +0800

Many Claude Code or Claude Max users run into the same problem: even after paying for Pro, Max 5x, or Max 20x, the usage warning appears quickly, or they have to wait for the next reset. This feels especially obvious when Claude Code reads many files, fixes complicated bugs, or runs long tasks in a large project.

The key point is this: usage is not deducted linearly by “minutes.” It depends on the model, context length, attachments, codebase size, conversation history, tool calls, and current capacity. In the same 5-hour window, one person may work for a long time while another hits the limit in minutes. Usually the account is not broken; each request is simply too heavy.

This note collects a set of practical habits for using quota more efficiently.

01 First Understand Claude’s Usage Window

Claude Pro and Max both have usage limits. Claude Code usage is shared with Claude on web, desktop, and mobile under the same subscription quota. Anthropic’s help center explains that message counts depend on message length, attachment size, current conversation length, model or feature used, and that Claude Code usage is also affected by project complexity, codebase size, and auto-accept settings.

A simple way to think about it:

Pro: suitable for light usage and small projects.
Max 5x: suitable for more frequent usage and larger codebases.
Max 20x: suitable for heavier daily collaboration.
Usage windows reset on a 5-hour session basis.
Long messages, long conversations, large files, and complex tasks consume usage faster.
Stronger models such as Opus hit limits faster than Sonnet.

So “I only used it for 20 minutes” does not explain much by itself. What matters is how much context Claude read during those 20 minutes, which model was used, whether large files were processed repeatedly, and whether the same long conversation kept accumulating more tasks.

02 First Habit: Do Not Default to the Most Expensive Model

The Claude model family is commonly positioned like this:

Opus: strongest capability, suitable for complex reasoning, architecture decisions, and hard bugs.
Sonnet: balanced capability and cost, suitable for most everyday coding tasks.
Haiku: lighter, suitable for simple classification, summarization, and format conversion.

For daily scripts, small bug fixes, documentation cleanup, and code explanation, Sonnet is usually enough. Save Opus for cases such as:

Complex architecture design.
Deep multi-file refactors.
Bugs that are hard to reproduce.
Long-chain troubleshooting.
Tasks where the normal model is clearly stuck.

In Claude Code, use /model to switch models, or set the default in /config. A steadier habit is to use Sonnet by default and switch to Opus only at key points, rather than running the whole task on Opus.

03 Second Habit: Control Context, Do Not Drag Old Tasks Along

The longer the context, the more Claude needs to process on each turn, and the faster usage is consumed. The Claude Code docs explicitly recommend proactive context management:

Use /clear when switching to an unrelated task.
Use /compact when one phase is done but important context should remain.
Use /context to see what is taking space.
Configure a status line if you want continuous status visibility.

A useful rhythm:

Small phase done: /compact
Large task done: /clear
Switching to unrelated work: /clear
Context usage getting high: /compact early

/compact summarizes earlier conversation history while preserving key task state, conclusions, file paths, and remaining work. It reduces the amount of history carried into later requests. You can also add a short instruction:

`1`	`/compact Preserve changed files, test results, remaining TODOs, and key design decisions`

Do not wait for automatic compaction. The docs note that Claude Code auto-compacts when context approaches the limit, but manually compacting at phase boundaries is usually easier to control.

04 Third Habit: Long Conversations and Large Files Make Every Request Heavier

Many people assume that “I only asked one more question” should be cheap. But in a long conversation, that question may carry a lot of history, file summaries, tool definitions, and system rules behind it.

Things that easily bloat context include:

Long conversations that are never cleared.
Asking Claude to read entire large files.
Pasting long logs, build output, or test output.
Adding many screenshots or images at once.
Asking it to repeatedly scan the whole repository.
An overly long CLAUDE.md.
Too many MCP servers enabled.

A more efficient approach: paste only key errors from logs, include only failing parts of test output, and let Claude use rg, head, tail, and symbol search before reading only the necessary parts. If command-line filtering can shrink the content, do not paste the whole thing into context.

05 Fourth Habit: Understand Caching, but Do Not Worship It

Anthropic’s Prompt Caching can cache repeated prompt prefixes. The default cache lifetime is 5 minutes, and a 1-hour cache is also supported. When cache hits, large repeated context does not need to be fully reprocessed, which helps reduce cost and improve rate limit utilization.

But caching has limitations:

Content must match exactly, including text and images.
The default cache is short-lived.
Changing models, tools, system prompts, or context structure may reduce cache hits.
Output tokens do not disappear because of caching; the response still needs to be generated.
How Claude Code uses caching is a product-level implementation detail, so do not treat it as permanent “free memory.”

In practice, the important part is not studying every caching detail. It is keeping the session stable:

Avoid frequent model switching within the same phase.
Do not repeatedly rewrite large rule blocks mid-task.
Do not keep adding new images inside the same task.
Do not leave a long task idle for too long and then return with another huge request.
Use /compact at phase boundaries.

This makes repeated context easier to reuse and reduces later request weight.

06 About Peak Hours: Avoid Them When You Can, but Do Not Treat Them as a Formula

People often say certain hours feel tighter. Anthropic’s help center is more careful: message counts can be affected by current Claude capacity, conversation length, attachments, model, and features. In other words, peak capacity can affect the experience, but do not treat a specific local time window as a permanent rule.

Practical suggestions:

Put large refactors and heavy analysis in periods when both your network and the service are stable.
Do not start a huge task right before you plan to step away.
If you expect to leave for a long time, run /compact or /clear first.
For small edits, do not use Opus with a long context unless you really need it.

This is more reliable than memorizing a fixed “do not use it from X to Y” rule.

07 Slim Down CLAUDE.md, rules, MCP, and skills

Claude Code loads project rules, tool information, and some environment context into the session. The official docs also recommend separating general rules from specialized rules so every session does not start with a large amount of unrelated text.

A useful split:

CLAUDE.md: only global rules that always apply.
rules: path-specific or file-type-specific rules.
skills: specific workflows, such as publishing posts, deployment, image generation, or committing code.
MCP: only enable servers that the current task actually needs.

If CLAUDE.md is hundreds or thousands of lines long, every session carries that cost. A better pattern is to move occasional workflows into skills and load them only when needed.

MCP is similar. More tools do not automatically mean more efficiency. The Claude Code docs mention using /mcp to view and disable unnecessary servers, and /context to see what is consuming context space.

08 Practical Command List

These are the most useful daily commands:

/model

Switch models. Sonnet is a good default; use Opus for complex reasoning.

/clear

Clear the current context. Use it when switching to unrelated work.

`1`	`/compact`

Compress conversation history. Use it when a phase is done but the same task continues.

`1`	`/context`

Inspect context usage and find what is taking space.

/status

Check subscription or usage-related status. Anthropic’s help center also recommends monitoring remaining allocation.

/mcp

View and manage MCP servers, and disable tools not needed for the current task.

If you use API billing, /cost can be useful. But for Pro/Max subscriptions, the Claude Code docs explain that the dollar estimate from /cost is not the right billing reference; subscribers should rely more on usage information such as /stats and /status.

09 A Quota-Saving Workflow

A practical workflow looks like this:

Run /clear before starting a new task.
Use Sonnet by default.
Let Claude inspect project structure and key files first, not the whole repository.
Run /compact after each small phase.
Switch to Opus only for hard blockers.
Filter logs, errors, and test output before pasting them.
Run /clear after the task is done; do not start new work with stale context.
Periodically review CLAUDE.md, MCP, and skills to shrink always-on context.

The core idea is simple: let Claude see only what it truly needs for the current task.

10 Summary

Claude Code usage running out quickly is usually not caused by one thing. It is often a combination of high-cost models, long uncleared conversations, too many files and logs, heavy MCP and rule context, weaker cache reuse, and peak capacity fluctuations.

The practical fixes are also simple:

Use Sonnet for daily work.
Save Opus for truly complex problems.
Use /compact when a phase is done.
Use /clear when switching tasks.
Use /context to find context bloat.
Slim down CLAUDE.md, rules, MCP, and skills.
Do not dump the whole repository, full logs, or large image batches into context.

How much work the same Pro or Max plan can support depends heavily on how you manage context. Make the context smaller and task boundaries clearer, and Claude Code will feel much steadier.

References

Claude Help Center: Using Claude Code with your Pro or Max plan: https://support.claude.com/en/articles/11145838-using-claude-code-with-your-pro-or-max-plan
Claude Help Center: About Claude’s Max Plan Usage: https://support.anthropic.com/en/articles/11014257-about-claude-s-max-plan-usage/
Claude Code Docs: Manage costs effectively: https://code.claude.com/docs/en/costs
Anthropic Docs: Prompt caching: https://docs.anthropic.com/en/docs/build-with-claude/prompt-caching

Context Management on KnightLi Blog