GPT-5.5 Prompt Migration Guide: Why old prompts should be trimmed before rewritten

OpenAI has updated the GPT-5.5 prompting guide in its API documentation. The most useful part of the guide is not that it gives yet another longer prompt template, but that it reminds developers of something easy to miss: when migrating to GPT-5.5, many old prompts should become shorter.

Official documentation: https://developers.openai.com/api/docs/guides/prompt-guidance

In one sentence, the prompting direction for GPT-5.5 is: write less process and more outcome; stack fewer rules and define acceptance criteria better; use fewer “always must” instructions and specify when to stop, when to validate, and when to gather more evidence.

Why old prompts need rewriting

Many production prompts are built layer by layer. When a model is unstable, one rule is added. When tool use fails, another prohibition is added. When output gets verbose, another formatting paragraph appears. Over time, a system prompt becomes a heavy operations manual.

That style can be useful with older models, because the model may need more step-by-step constraints to stay on track. But with GPT-5.5, OpenAI’s advice is clear: do not move the old prompt stack over unchanged.

Over-specifying the process brings several side effects:

More noise: the model must find the truly important constraints inside many old rules.
Narrower search space: the model becomes less willing to choose a more efficient solution.
Mechanical output: it looks like script execution instead of problem solving.
Conflicting old rules: tool calls and final answers can both become worse.

GPT-5.5 is better served by prompts that describe the target state, constraints, available evidence, and final output, instead of hard-coding every step.

outcome-first: define what done means first

The official documentation repeatedly emphasizes one direction: GPT-5.5 works best with outcome-first prompts.

That means the prompt should first define:

What the target result is.
What counts as success.
Which constraints cannot be crossed.
What context is currently available.
Which fields or sections the final answer must include.
What to do when evidence is insufficient.

A less recommended style is:

1

First check A, then check B, then compare all fields, then consider every exception, then decide which tool to call, then call the tool, and finally explain the full process.

A better style for GPT-5.5 is:

1
2
3
4
5


Solve the user's problem. Success criteria:
- Make the decision based on available policy and account data
- If the action is allowed, complete it before replying
- Final output includes completed_actions, customer_message, blockers
- If key evidence is missing, ask only for the smallest necessary fields

This does not make the prompt vague. It moves control from “process order” to “outcome and boundaries.” The model can choose its own search, reasoning, and tool-use path, but it must satisfy the success criteria.

Use fewer absolute rules and more decision rules

Old prompts often contain many instances of ALWAYS, NEVER, must, and only. These words are not forbidden, but they should be reserved for constraints that truly cannot be violated, such as safety rules, required fields, and prohibited actions.

For decisions like “when to search,” “when to ask the user,” “when to keep iterating,” and “when to stop,” GPT-5.5 is better served by decision rules.

For example, instead of writing:

1

Always search three times first.

Write:

1

Start with one search that covers the core question. If the first few results already support the key facts, stop searching and answer. Continue searching only when evidence is conflicting, missing, or insufficient to support the conclusion.

This gives the model room to decide, and it also gives it a stopping condition. For products that use web search, retrieval, file search, or database queries, this matters because every additional tool call adds latency and cost.

Add a retrieval budget

One type of rule worth adding to GPT-5.5 prompts is a retrieval budget.

This is not a money budget. It is a retrieval stopping rule. It tells the model when evidence is sufficient, when to keep looking, and when to admit that evidence is missing.

A practical version:

1

For ordinary Q&A, start with one broad search using short and distinctive keywords. If the first few results already support the core request, answer based on those results and do not continue searching. Add more retrieval only when results conflict, key facts are missing, or the conclusion cannot be supported.

This kind of rule reduces two common problems:

Too little search, producing answers without evidence.
Too much search, wasting time in a tool loop.

More importantly, the documentation also reminds us that failing to find evidence should not automatically become a factual “no.” Sometimes the right behavior is to state that the evidence is insufficient, or narrow the question and continue checking.

Do not raise reasoning effort too early

GPT-5.5 is more reasoning-efficient, so OpenAI recommends reevaluating low and medium instead of immediately increasing reasoning effort whenever quality is not good enough.

A steadier order is:

First check whether the prompt clearly defines the goal, output format, and stop conditions.
Add a validation loop, such as tests, citations, review, or render checks.
Add persistence rules and completion criteria for tool use.
Only then raise reasoning effort if the task still needs it.

In other words, reasoning.effort is more like a final tuning knob. It should not replace clear prompt design.

For short classification, field extraction, support ticket routing, or format conversion, start with lower reasoning cost. For long-document synthesis, conflicting-source judgment, strategy writing, or complex research, consider medium or higher.

text.verbosity controls output, not thinking

GPT-5.5 is highly controllable in output format. The official documentation recommends using text.verbosity together with the output requirements in the prompt.

The default text.verbosity is medium. If the product needs shorter, cleaner replies, use low. But that does not mean every part of the result should become short.

A typical pattern:

Keep user-facing status updates and final summaries short.
Still require readability when generating code, configuration, or structured results.
Do not sacrifice field completeness, citations, or necessary caveats just to be “brief.”

This is especially useful for code products. Chat replies can be shorter, while generated code can still require readable variable names, clear structure, and necessary comments.

preamble and phase: making long tasks feel visible

In complex tasks, GPT-5.5 may first reason, plan, or prepare tool calls before producing visible text. For streaming products, users can feel the first-token delay.

The official recommendation is: for multi-step, tool-heavy, or long-running tasks, let the model send a short preamble first. It does not need to explain the full plan; it only needs to tell the user what it will do first.

For example:

1

I will first inspect the relevant files and existing configuration, then suggest the changes.

In long-running or tool-heavy Responses API workflows, also pay attention to the assistant item’s phase. If the application uses previous_response_id, the API keeps prior assistant state automatically. If the application manually replays assistant output, it must preserve the original phase value.

Common conventions:

phase: "commentary": intermediate status update.
phase: "final_answer": final answer.
Do not add phase to user messages.

This may look like a low-level implementation detail, but it matters for products with tool calls, status updates, and final answers. Losing phase during manual replay can make the model confuse progress updates with final conclusions.

Prompt the model to check its work

Another very practical point in the GPT-5.5 guide: for tasks that can be verified, give the model validation tools and validation rules.

For code agents, explicitly require:

Run relevant unit tests after making changes.
Run type checks or lint when necessary.
Run build when the affected package is large.
If full validation is too expensive, at least do the smallest smoke test.
If validation cannot run, explain why and give the next best check.

For visual or page outputs, require rendering first, then checking layout, cropping, spacing, missing content, and visual consistency.

For engineering plans, require mappings to requirements, affected files/APIs/systems, state transitions, validation commands, failure behavior, privacy and security considerations, and open questions that truly affect implementation.

These rules are much more effective than “please be careful.” They turn “careful” into executable checks.

A prompt skeleton better suited for GPT-5.5

The structure in OpenAI’s docs can be simplified like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


Role:
What role you are playing and what context you are working in.

# Personality
Tone, collaboration style, whether warmth or point of view is needed.

# Goal
The user-visible target result.

# Success criteria
Conditions that must be satisfied before the final answer.

# Constraints
Safety, business, evidence, permission, cost, and side-effect boundaries.

# Output
Output structure, length, tone, and required fields.

# Stop rules
When to continue, retry, degrade, ask, or stop.

The point of this skeleton is not that every prompt must use all these headings. The real idea is that prompts for complex tasks should tell the model the destination, boundaries, and deliverable, instead of hard-coding every step.

A practical order for migrating old prompts

If you already have old prompts for GPT-4.1, GPT-4o, GPT-5.2, or GPT-5.4, do not rewrite everything at once.

A steadier migration order:

First switch the model while keeping current reasoning effort and output parameters fixed.
Run existing evals or real samples and identify behavior changes.
Delete process rules that are clearly outdated, duplicated, or contradictory.
Convert “step requirements” into “success criteria” and “stop conditions.”
Add retrieval budgets, citation rules, and behavior for missing evidence.
Add validation loops for tool tasks.
Tune reasoning.effort and text.verbosity last.

If you do not have evals, at least prepare a set of representative tasks: simple Q&A, complex retrieval, tool use, formatted output, refusal/degradation, and long-task completion. Do not judge prompt quality from a single demo case.

A checklist for migrating old prompts

When migrating an old prompt, start with this checklist. The goal is not simply to make the prompt shorter, but to delete ineffective constraints and rewrite important constraints into verifiable form.

Check item	Common problem	Suggested handling
Repeated rules	The same instruction appears in multiple sections, sometimes with inconsistent wording	Merge into one clear rule and keep only the final version
Absolute words	`ALWAYS`, `NEVER`, `must`, and `only` appear everywhere	Reserve absolute constraints for safety, compliance, permissions, and required fields
No stop condition	The model is told to keep searching, analyzing, or fixing without a stopping rule	Add stop rules such as evidence sufficiency, validation success, turn limits, or cost limits
No validation command	The prompt says “ensure correctness” but gives no tests, lint, citations, or checks	Replace with concrete checks: tests, type checks, build, citations, or smoke tests
Too much process	Every step is hard-coded, leaving no room for better paths	Rewrite as goals, success criteria, boundaries, and output requirements
Old model patches	Rules written for older model weaknesses are still present	Remove first, then use evals to decide whether they are still needed
Vague tool rules	The prompt only says “use tools when needed”	Define when to call tools, when to stop, and how to degrade on failure
Output drift	There is a format requirement but no field-completeness rule	Define required fields, optional fields, and missing-evidence behavior

If you can only do one thing, prioritize “no stop condition” and “no validation command.” These two issues are the easiest way to turn GPT-5.5 into an infinite tool loop, or into a model that gives a polished answer without verification.

GPT-5.5 prompt examples: old vs new

These are not full system prompts. They are common local rewrites during migration.

Example 1: retrieval Q&A

Old:

1

Before answering, you must search at least 3 times. You must read all relevant results. You must provide a complete explanation.

New:

1

Start with one search that covers the core question. If the first few results already support the key facts, stop searching and answer. If results conflict or key facts are missing, add another search. In the final answer, explain the basis; when evidence is insufficient, say so clearly.

The new version changes “number of searches” into “whether evidence is sufficient.” It gives the model a reason to continue and a reason to stop.

Example 2: code changes

Old:

1

Carefully modify the code. Do not break existing logic. Tell me what changed when finished.

New:

1
2
3
4
5


Make the smallest necessary code change requested by the user. Success criteria:
- Only modify files related to the task
- Preserve existing public API compatibility unless the user explicitly asks for a change
- Run relevant unit tests after the change; if they cannot run, explain why and the next best validation method
- Final summary includes changes, validation result, and remaining risks

The new version does not vaguely ask the model to be careful. It grounds caution in file scope, API compatibility, test commands, and risk reporting.

Example 3: structured output

Old:

1

Output JSON. Do not output extra content. Make fields complete.

New:

1
2
3
4
5
6


Output strict JSON without Markdown. Required fields:
- status: "ok" | "needs_more_info" | "blocked"
- answer: string
- evidence: string[]
- missing_info: string[]
If evidence is insufficient, use status "needs_more_info" and do not invent evidence.

The new version does not only require JSON. It also defines a valid path when evidence is missing, so the model does not have to invent information to satisfy “complete fields.”

How to configure the parameters

reasoning.effort and text.verbosity should not be viewed in isolation. The former controls how much reasoning the model invests; the latter controls how detailed the output is. A common mistake is to raise reasoning.effort whenever quality is not enough, or to write harsher prompts whenever output is too long. A better approach is to configure them by task type.

Scenario	reasoning.effort	text.verbosity	Notes
Field extraction, classification, short format conversion	`none` or `low`	`low`	Optimize for low latency; the output schema matters most
Support routing, simple tool routing	`low`	`low` or `medium`	Clear rules usually do not need high reasoning
Ordinary Q&A, light retrieval summary	`low` or `medium`	`medium`	Needs some judgment, but high reasoning should not be the default
Multi-document synthesis, conflict judgment	`medium`	`medium`	First ensure evidence rules and citations, then consider raising effort
Complex code changes, long-task agents	`medium` or `high`	User replies `low`, code output should remain clear	Chat updates can be short; code and diff should be readable
Strategy, planning, risk analysis	`medium` or `high`	`medium` or `high`	Needs tradeoffs, risks, and assumptions

For most applications, start with low or medium. Raise reasoning.effort only after the prompt already defines success criteria, stop conditions, and validation rules, and the model still misses important constraints.

text.verbosity is not always better when lower. Low verbosity works well for status updates, short customer support replies, and operation summaries. For code, configuration, migration plans, or audit explanations, overly short output can make the result hard to review.

Which rules should stay

Migrating to GPT-5.5 does not mean deleting the old prompt entirely. The following rules should usually stay, and they should be made more explicit.

Safety rules: actions that cannot be taken, content that cannot be generated, and cases that require refusal or degradation.
Compliance rules: industry policies, regional restrictions, age limits, audit requirements, approval requirements.
Privacy rules: personal data handling, sensitive data redaction, logging limits, data transfer limits.
Output fields: API responses, JSON schemas, table fields, fixed structures required by frontend components.
Business boundaries: refund rules, account permissions, service levels, contract scope, escalation conditions.
Tool permission boundaries: which tools can be called, which require confirmation, and which are prohibited.
Citation and evidence rules: when sources are required and how to handle conflicting evidence.

These are not old baggage. They are product contracts. The difference is that during migration, they should be rewritten from slogans into executable constraints.

For example:

1

Do not leak user privacy.

Can become:

1

Do not output full phone numbers, national ID numbers, access tokens, API keys, or internal user IDs in the final answer. When a reference is needed, show only a redacted version, such as keeping the last 4 digits of a phone number.

What should not be accidentally deleted

The biggest danger when trimming prompts is not deleting fluff. It is deleting real system boundaries. The following content should not be removed lightly, even if it looks old.

Privacy and data handling requirements: especially rules for logging, export, cross-system transfer, and third-party tool calls.
Safety and permission limits: confirmation rules for deleting data, transferring money, sending email, changing permissions, or running shell commands.
Citation format: if the product depends on citations, footnotes, source lists, or audit chains, do not delete them just because they take space.
Tool call boundaries: which tools are read-only, which are write-capable, and which require user confirmation.
Failure behavior: how to degrade when APIs time out, data is missing, retrieval fails, or permissions are insufficient.
Hard business rules: pricing, refunds, bans, risk controls, and compliance review rules that the model should not improvise.

A simple rule of thumb: if deleting a rule only changes output style, consider deleting it. If deleting it could cause privilege overreach, data leakage, incorrect actions, false promises, or broken audit trails, keep it and rewrite it more precisely.

Summary

The core of the GPT-5.5 prompting guide is not “write more advanced prompts.” It is to remove over-specified process instructions from old prompts.

A prompt better suited for GPT-5.5 should:

Prioritize goals, not steps.
Define success criteria, not just ask the model to “do well.”
Include stop conditions, instead of infinite search or infinite tool loops.
Include an evidence budget, instead of answering without evidence or searching forever.
Include validation rules, instead of relying on the model’s self-discipline.
Tune parameters later, instead of immediately raising reasoning effort.

If your old system prompt is already long, the first step in migrating to GPT-5.5 may not be adding content, but deleting content. Keep the truly non-negotiable rules, and turn process details into outcomes, boundaries, and checks. That is usually more effective than continuing to pile on prompts.

References

OpenAI Prompt guidance: https://developers.openai.com/api/docs/guides/prompt-guidance
OpenAI Using GPT-5.5: https://developers.openai.com/api/docs/guides/latest-model