Anthropic recently published an engineering write-up on Harness. On the surface, it explains product implementation. At a deeper level, it answers a longer-term question:
As model capabilities keep evolving, which layers in an Agent system should stay stable, and which should remain fast to replace?
Core Judgment
My key takeaway is: Agent infrastructure is becoming more like a lightweight Agent OS.
The focus is not to hard-code today’s best workflow, but to define long-lived system abstractions.
Why This Matters
Common problems in many Agent frameworks include:
- turning temporary model limitations into permanent architecture
- treating prompt engineering as a system boundary
- turning one useful patch into a long-term dependency
Models will keep improving. A patch that is reasonable today may become technical debt tomorrow.
Anthropic’s Approach: From Concrete Harness to Meta-Harness
Instead of committing to one fixed orchestration style, this approach abstracts three stable interfaces:
session: recoverable event and state historyharness: reasoning and orchestration loop (brain)sandbox: execution environment and tool capabilities (hands)
After separation, the system becomes easier to replace, recover, and scale.
1) Session Is Not the Context Window
A critical point is: Session is not model context.
Session should be a queryable, replayable, and recoverable event log, not a direct history dump into the model.
Benefits of this design:
- trimming does not mean history disappears
- compaction does not mean facts are lost
- crash recovery can return to the event layer instead of relying on summary memory
2) Harness as a Replaceable Orchestration Layer
Harness should focus on orchestration rather than holding business state.
An ideal interface is closer to:
execute(name, input) -> string
This means the model only needs to know what capabilities it can call, without being tightly bound to specific devices, containers, or operating systems.
3) Sandbox Is the “Hands,” Not the “Brain”
When brain and hands are decoupled:
- tool environments can evolve independently
- different infrastructure can be integrated in parallel
- not every session needs a fully prewarmed execution environment
This directly improves startup and scalability behavior.
Performance and Security Insights
This split often improves both performance and security.
On performance:
- start the brain first, then provision hands on demand
- reduce Time To First Token (TTFT)
On security:
- do not expose high-value credentials directly to the model
- use controlled proxy/vault paths for indirect credential access
- build security boundaries on system constraints, not on assumptions that “the model probably can’t do this”