Use Tests and Behavior Descriptions to Keep AI Coding Under Control

Tue, 05 May 2026 14:35:38 +0800

When you use AI to write code, the common pattern is easy to recognize: the beginning feels fast, and the later stages get messy. A feature can be scaffolded quickly at first, but once the project grows and the number of changes increases, fixing one bug can easily create three more.

This is not entirely an AI problem. Many human developers write code this way too. AI simply writes faster, so the problems surface faster. To reduce this loss of control, the key is not to make AI “try harder”, but to give it clearer boundaries: define what counts as correct first, then ask it to implement.

TDD and BDD fit naturally into an AI coding workflow. TDD turns “is this correct?” into automated tests. BDD turns “is this the feature I actually want?” into behavior descriptions that humans can read. Used together, they reduce guessing, limit free interpretation, and make the result easier to review.

What TDD Solves

TDD stands for Test-Driven Development. Its basic sequence is:

Write the test first.
Run the test and confirm that it fails.
Write the feature code.
Keep adjusting the implementation until the test passes.

This is the opposite of how many people naturally work. If you are writing a sorting function, the intuitive approach is to write the function first, then try a few inputs and see whether the results look right. TDD asks you to write the expected behavior as tests first. For example, input [3, 1, 2] should return [1, 2, 3], an empty array should return an empty array, and an array with duplicate values should still be sorted correctly.

The point is that the correct result is defined before development begins. Later, no matter who changes the code, rerunning the tests tells you whether previously agreed behavior has been broken.

Why TDD Used to Be Hard to Keep Up

TDD sounds great, but it is not easy to practice consistently in real projects.

First, it feels counterintuitive. When facing an empty file, many people would rather write the feature first than write tests first. This is especially true when the requirement is still unclear, because test cases are hard to write when the behavior itself is fuzzy.

Second, requirements change quickly. A dozen carefully written tests today may need to be rewritten tomorrow after the requirement changes. In the short term, TDD can slow the development rhythm.

Third, tests have their own cost. Test code does not appear out of nowhere. In the past, developers had to write it, maintain it, and explain its value. In teams that only care about short-term delivery speed, this work is easy to squeeze out.

AI changes that cost structure. Turning requirements into test code is exactly the kind of work AI is good at. Asking AI to implement against tests is also far more reliable than asking it to freely interpret a vague paragraph.

How to Use TDD When AI Writes Code

When using AI to build a feature, change the prompt from “implement this feature for me” into this sequence:

Ask AI to list test cases from the requirement first.
Require each test case to include a plain-language explanation.
Review whether the test cases match the real requirement.
After confirming the tests, ask AI to implement the feature.
Ask AI to run the tests and keep fixing based on failures.

At this point, the main thing you review is no longer a large block of implementation code. Instead, you review whether the tests describe the requirement clearly. Test cases are usually closer to “what is the input, what should the output be, and how should edge cases behave”, which is much easier than reading implementation logic directly.

For example, you can ask AI like this:

1
2
3

Do not implement the feature yet.
Write test cases based on the requirement below. Add a plain-language comment to each test case explaining the business rule it covers.
After the tests are confirmed, implement the code according to the tests.

This workflow reduces two common problems: AI drifting away from the requirement while coding, and later changes breaking old behavior.

TDD Is Not Enough

TDD alone still leaves two gaps.

The first gap is that passing tests does not mean the product actually meets expectations. Tests only prove that the code satisfies the rules written into the tests. If the tests themselves fail to express the user need clearly, the code may still “correctly do the wrong thing”.

The second gap is that test code is still unfriendly to non-technical users. Even with plain-language comments, many people do not want to read through a pile of unit tests. The more product-oriented a requirement is, the harder it is to confirm from test code alone that “this is what I wanted”.

That is where BDD helps.

What BDD Solves

BDD stands for Behavior-Driven Development. It focuses less on how code is written internally and more on how the system should behave in a given scenario.

BDD often uses the Given / When / Then format:

Given: a specific starting state.
When: an action performed by the user or system.
Then: the expected result.

For example, a game character with a lifesteal effect can be described like this:

Given there is a vampire on the board with 1 remaining HP, 2 attack, and 5 max HP
And an adjacent enemy unit has 10 remaining HP
When the vampire attacks that enemy unit
Then the enemy unit has 8 remaining HP
And the vampire recovers to 3 HP

This is not code, but it is much more precise than “recover health when attacking an enemy”. It describes the initial state, the action, and the result. It also exposes rules that need clarification: if the enemy only has 1 HP left, should the vampire recover based on damage dealt or attack value? If the vampire is already at full health, what happens to excess healing?

The earlier these questions appear, the less AI has to guess later.

Why BDD Fits AI So Well

BDD also used to have a high adoption cost. It asks product, engineering, and testing teams to communicate with the same behavior descriptions. In reality, many teams do not have that collaboration habit.

In the AI era, the cost of BDD drops. You can start with a rough requirement such as:

`1`	`After the vampire attacks an enemy, it recovers health equal to the damage dealt.`

Then ask AI to generate Given / When / Then scenarios. A good AI will add edge cases and ask about unclear rules. Your job is to confirm those behavior descriptions, not read the implementation code directly.

Once the behavior descriptions are clear, ask AI to convert them into tests, and then implement the feature based on those tests. The path becomes much smoother.

A More Reliable AI Coding Workflow

In practice, you can chain BDD and TDD together:

Write the requirement in natural language.
Ask AI to convert it into BDD behavior scenarios.
Confirm whether the Given / When / Then scenarios match your expectation.
Ask AI to convert the behavior scenarios into automated tests.
Quickly review test coverage.
Ask AI to implement the feature.
Run the tests. If they fail, ask AI to fix the code based on the errors.
Finish with manual acceptance and code review.

The key is the order. Do not ask AI to write the full implementation at the beginning. First ask it to turn the requirement into reviewable behavior, then into executable tests. This leaves much less room for free interpretation.

You can use a prompt like this:

Handle this requirement using a BDD + TDD workflow.

Step 1: First organize the requirement into Given / When / Then behavior scenarios. Do not write code.
Step 2: List any unclear rules you find and ask me to confirm them.
Step 3: After the behavior scenarios are confirmed, convert them into test cases.
Step 4: After the tests are confirmed, implement the feature.
Step 5: Run the tests and fix failures until all tests pass.

This kind of prompt is not complicated, but it can noticeably change how AI works. It narrows the requirement first, then moves into implementation, instead of immediately producing code that looks complete but is hard to verify.

Where to Use It First

BDD + TDD is not necessary for every task. For one-off scripts, temporary data processing, or small style tweaks, the full workflow may be too heavy.

It is better suited to these cases:

Business rules are numerous and easy to misunderstand.
There are many edge cases, and the feature will continue to change.
Logic-heavy features such as games, billing, permissions, state machines, and form validation.
Multiple people need to confirm the requirement together.
The code will be maintained for a long time, not generated once and thrown away.
The project already shows signs of AI making things messier after each change.

If you only need AI to change the text on a button, you do not need the full workflow. But if you are building a character skill system, order state transitions, permission checks, or points rules, writing behavior scenarios and tests first is usually worth it.

What to Watch Out For

First, more tests are not always better. Tests should cover key rules and high-risk boundaries, not lock every implementation detail in place. Otherwise, even a small requirement change can turn the tests into a maintenance burden.

Second, BDD scenarios must be specific. Do not write unverifiable descriptions like “the system should work normally” or “the experience should be smooth”. Be clear about the state, the action, and the expected result.

Third, humans still need to review. AI can generate tests and behavior scenarios, but it does not know the product tradeoffs you actually want. Boundary rules in particular must be confirmed by a human.

Fourth, after tests pass, you still need to run the feature for real. Automated tests can catch logic problems, but interface experience, performance, interaction details, and user feel still need manual acceptance.

Summary

AI writes code quickly, but speed is not the same as stability. The more complex the requirement is, the less you should rely on a single “help me implement this” prompt. A better approach is to break the requirement into reviewable behavior, turn that behavior into executable tests, and then let AI implement against those tests.

TDD tells AI what counts as correct. BDD makes it easier for humans to confirm whether the feature is actually what they wanted. Together, they are not about adding ceremony. They are about reducing the space for AI to guess, turning “writes fast” into “changes safely”.

BDD on KnightLi Blog