Ralph on KnightLi Blog

Ralph and Multi-Agent Collaboration: How to Keep AI Working Reliably Over Long Tasks

Mon, 27 Apr 2026 08:19:02 +0800

If you have been using coding agents lately, you quickly run into a very practical question: AI can work, sure, but how do you keep it working for hours without drifting, forgetting requirements, or redoing the same work?

That is the real question behind many discussions around Ralph and multi-agent collaboration. The point is not simply to compare which model is stronger. The more useful question is this: how do you design a workflow that lets AI stay stable during long tasks?

If you break the problem down, there are usually two main routes:

The Ralph approach: keep starting fresh sessions and connect context through the filesystem
The multi-agent approach: let a lead agent coordinate while worker agents split the execution

Put more simply, the question is not “which model is more powerful,” but “how do you organize AI so it behaves more like a small team that can keep delivering?”

01 Why Long Tasks Go Off the Rails

In short tasks, many problems stay hidden. You give an instruction, the model reads a few files, changes a few lines, and the job is done.

Once the task gets longer, the common failure modes start to pile up:

Conversations grow longer and context starts to bloat
Earlier requirements get squeezed out by newer information
One agent has to plan, implement, and test at the same time
Without a clear acceptance step, “it is done” often just means “it says it is done”

So when AI runs for a long time, the real challenge is often not single-shot model quality. It is task slicing, state handoff, role separation, and feedback loops.

02 The Ralph Approach: Break Long Tasks into Short Rounds

Ralph is a good fit when the main problem is dirty, overloaded context.

Its core pattern is straightforward:

Keep launching new agent sessions in a loop
Let each round handle only one small enough task
Store cross-round state in files instead of forcing everything into one conversation

The benefit is immediate: every round starts with fresh context, so the session stays more focused and is less likely to get dragged down by old history.

If you have already looked at Ralph-style projects, the structure will feel familiar:

Current tasks live in structured files
Intermediate learnings go into progress files
Code changes stay in git history

In other words, Ralph does not try to make one agent remember everything forever. It externalizes memory on purpose so the session itself can stay lighter.

This kind of setup works especially well when:

The work can already be split into small stories
Each story can fit inside one context window
The project already has tests, typecheck, or other checks

It is a solution to the problem of how to keep AI moving forward one round at a time.

03 The Multi-Agent Approach: Split the Work One Agent Cannot Handle Alone

The other route is multi-agent collaboration.

In this kind of workflow design, the more promising pattern is usually this: the lead agent should not do all the work directly. Instead, it coordinates while other agents handle development, testing, checking, and acceptance.

That differs from Ralph in an important way:

Ralph feels more like serial iteration
Multi-agent work feels more like parallel division of labor

When the task naturally contains different roles, multi-agent collaboration becomes easier to use. For example:

One agent breaks down the task and writes the execution plan
One agent implements the actual change
One agent tests and validates the result
One agent checks whether the result still matches the original goal

The point is not to open more windows for the sake of it. The real value is role separation. Tasks that used to be piled onto one agent can now be split into clearer stages.

Once the role boundaries are clear, several problems become lighter:

The person writing does not have to be the same one reviewing
The testing side does not have to reconstruct the full requirement every time
The lead agent is less likely to drown in implementation detail

This is a solution to the problem of how to make AI cooperate more like a small team.

04 The Real Key Is Not Parallelism, but Task Design

Whether you choose Ralph or multi-agent collaboration, the easiest thing to underestimate is this: workflow design matters more than opening more agents.

If the task split is wrong, adding more agents only parallelizes the confusion.

A more stable breakdown usually has a few traits:

One task maps to one clear objective
One role owns one category of output
Every round has a clear done condition
The output of one round can be consumed directly by the next

For example, instead of giving AI one giant instruction like “build the whole feature,” a steadier structure is often:

Break out requirements and boundaries first
Then split implementation
Then split testing
Then make acceptance its own step

The advantage is that when something goes wrong, it becomes easier to tell whether the problem sits in understanding, implementation, testing, or delivery criteria.

05 Why Acceptance Matters So Much

Many AI workflows fail not because nothing happened earlier, but because the last step lacked a genuinely independent confirmation pass.

In long tasks, there is often a wide gap between “a result was produced” and “the result is actually usable.”

So one especially important direction is to separate development from acceptance. Even without a complex process, it is worth asking at least these questions:

Did it really complete the original task?
Did it only patch the surface without fixing the root cause?
Did testing cover only the happiest path?
Did the upstream requirement get silently changed along the way?

Without that layer, AI can easily keep declaring success inside a long workflow.

06 How to Choose Between the Two

If you want a fast rule of thumb:

If your main pain is context bloat and long-session drift, start with Ralph
If your main pain is one agent wearing too many hats, start with multi-agent collaboration

More specifically:

Ralph fits work that is clear, granular, and easy to move forward round by round
Multi-agent collaboration fits work with strong role boundaries and a need for parallelism and cross-checking

In practice, these two approaches are not always competitors. A mature setup often combines them:

Use a Ralph-style outer loop to push the larger task forward
Use multi-agent collaboration inside each round for research, implementation, testing, and acceptance

That gives you both better control over long context and better collaboration inside a single round.

07 One-Sentence Summary

What makes these approaches worth studying is not that they recommend Ralph or multi-agent collaboration in isolation. It is that they make one practical truth very clear: keeping AI stable over long tasks depends less on the model itself and more on whether you designed context, tasks, roles, and acceptance well.

If you are already asking Claude Code, Codex, or other coding agents to handle longer real-world tasks, this kind of workflow thinking is often more valuable than simply switching to a stronger model.

What Ralph Is: Turning Claude Code and Amp into a Repeatable Autonomous Development Loop

Mon, 27 Apr 2026 08:08:55 +0800

If you have been paying attention to long-running coding agent workflows lately, snarktank/ralph is a project worth a close look. It is not another model wrapper or another chat UI. Instead, it organizes Claude Code or Amp into an autonomous loop that keeps running through stories in a PRD until everything is done.

Its core idea is simple: do not force the same agent to keep working inside an increasingly long and messy context. Start a brand-new AI coding session for every iteration instead. That keeps context from bloating and makes task boundaries much clearer.

01 What Ralph Is

Ralph describes itself very clearly: it is an autonomous AI agent loop that repeatedly runs an AI coding tool until the items in a PRD are complete.

The repository currently supports two tools:

Amp CLI
Claude Code

Each iteration starts a fresh instance. In other words, it does not depend on one endlessly extended conversation. Instead, it keeps memory in external state:

git history
progress.txt
prd.json

That detail matters a lot. When people let an agent run on large tasks, the main problem is often not that the model cannot code. It is that the session becomes heavier over time, starts losing context, forgets requirements, and repeats work. Ralph is designed almost entirely around that problem.

02 How It Works

Ralph’s workflow has three steps.

1. Write a PRD first

The README suggests starting with the bundled prd skill to generate a requirements document and break the feature into smaller stories.

2. Convert the PRD into `prd.json`

Then the ralph skill converts the Markdown PRD into a structured prd.json. That file stores the user stories and whether each one has passed.

3. Run the loop script

The actual execution is handled by ralph.sh. The commands look like this:

1
2

./scripts/ralph/ralph.sh [max_iterations]
./scripts/ralph/ralph.sh --tool claude [max_iterations]

The default is 10 iterations. In each round, Ralph roughly does the following:

Create a branch from branchName
Pick the highest-priority story where passes: false
Implement only that story
Run quality checks such as typecheck and tests
Commit if the checks pass
Update prd.json
Append learnings to progress.txt
Continue to the next round

So Ralph is not trying to finish everything in one go. It compresses work into many small loops that can fit inside a single context window.

03 What Makes Ralph Interesting

1. Every round uses fresh context

This is Ralph’s defining design choice. The README emphasizes that every iteration is a brand-new AI instance, and cross-iteration memory lives only in git, progress.txt, and prd.json.

That is very different from the common pattern of keeping Claude Code or another tool inside one long conversation. Once tasks get larger, that approach often slows down under its own history and gradually loses focus. Ralph accepts that no single round should remember everything, then moves memory into files instead.

2. It forces tasks to stay small

The docs explicitly say that each PRD item must be small enough to finish within one context window. Tasks like adding a filter, updating a server action, or adding a database column are about the right size. Tasks like rebuilding the whole API or creating an entire dashboard are too large.

That constraint is practical. Many autonomous agent loops fail not because the loop is bad, but because the task slicing is too coarse and each round carries too much at once.

3. It preserves learnings, not just code

Beyond progress.txt, the README also stresses updating AGENTS.md. The reason is straightforward: future iterations and future developers will read those notes, so patterns, gotchas, and conventions discovered in each round should be written down in the project itself.

Put differently, Ralph is not only trying to keep an agent coding continuously. It is also trying to help the agent build working memory about the codebase over time.

04 When It Fits Best

Ralph is a good fit when your task looks like this:

It can already be broken into a clear set of user stories
The codebase has reliable feedback loops such as tests, typecheck, or CI
You want the agent to keep moving forward without putting everything into one long conversation
You are fine with iterative progress instead of demanding a one-shot completion

On the other hand, if the requirement is still vague, or the work depends on frequent discussion and constant changes of direction, Ralph may not be the first thing to reach for. It fits better once the requirements are already shaped and execution needs to be steady.

05 How It Differs from Normal Claude Code Usage

With plain Claude Code, the usual pattern is simple: open a session and let it keep reading code, editing files, and running commands. That works very well for small and medium tasks, but larger tasks often hit two problems:

Context keeps growing
Intermediate decisions are harder to preserve in a structured way

Ralph turns Claude Code or Amp into something closer to a batch executor:

The task source is prd.json, not ad hoc chat instructions
Each iteration recognizes only one story
Completion state is written back to files
Learnings go into progress.txt
Code changes are preserved in git

So in practice, it feels less like a new AI assistant and more like an iteration controller added on top of a coding agent.

06 One Important Requirement

Whether Ralph works well depends less on the loop itself and more on the quality of your feedback loops. The README says this very directly: without typecheck, tests, and CI, errors will compound across later iterations.

For frontend tasks, the repository even recommends adding browser verification to the acceptance criteria. Without real verification, an agent can easily confuse “it looks done” with “it actually works.”

That point is important. Ralph is not magical automation. It is more like a force multiplier for the engineering discipline you already have. If your project already has clear task breakdowns and reliable checks, Ralph becomes much more useful. If those foundations are missing, the loop will only repeat the confusion.

07 One-Sentence Summary

What makes Ralph worth studying is not that it introduces a huge amount of new infrastructure. It takes a simple but useful idea and turns it into a practical workflow: let Claude Code or Amp handle one small story per round, keep focus with fresh context, and preserve continuity through git, prd.json, and progress.txt.

If you are already using coding agents in real projects and keep getting stuck on how to push long tasks forward reliably, Ralph’s approach is well worth borrowing.

References

GitHub repository: https://github.com/snarktank/ralph
Interactive flowchart: https://snarktank.github.io

Ralph on KnightLi Blog

Ralph and Multi-Agent Collaboration: How to Keep AI Working Reliably Over Long Tasks

01 Why Long Tasks Go Off the Rails

02 The Ralph Approach: Break Long Tasks into Short Rounds

03 The Multi-Agent Approach: Split the Work One Agent Cannot Handle Alone

04 The Real Key Is Not Parallelism, but Task Design

05 Why Acceptance Matters So Much

06 How to Choose Between the Two

07 One-Sentence Summary

What Ralph Is: Turning Claude Code and Amp into a Repeatable Autonomous Development Loop

01 What Ralph Is

02 How It Works

1. Write a PRD first

2. Convert the PRD into prd.json

3. Run the loop script

03 What Makes Ralph Interesting

1. Every round uses fresh context

2. It forces tasks to stay small

3. It preserves learnings, not just code

04 When It Fits Best

05 How It Differs from Normal Claude Code Usage

06 One Important Requirement

07 One-Sentence Summary

References

2. Convert the PRD into `prd.json`