<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>GPT-5.4 on KnightLi Blog</title>
        <link>https://www.knightli.com/en/tags/gpt-5.4/</link>
        <description>Recent content in GPT-5.4 on KnightLi Blog</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Sun, 10 May 2026 08:43:17 +0800</lastBuildDate><atom:link href="https://www.knightli.com/en/tags/gpt-5.4/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>How to Choose Between GPT-5.5, GPT-5.4, and GPT-5.3-Codex</title>
        <link>https://www.knightli.com/en/2026/05/10/gpt-5-5-vs-gpt-5-4-vs-gpt-5-3-codex/</link>
        <pubDate>Sun, 10 May 2026 08:43:17 +0800</pubDate>
        
        <guid>https://www.knightli.com/en/2026/05/10/gpt-5-5-vs-gpt-5-4-vs-gpt-5-3-codex/</guid>
        <description>&lt;p&gt;If you only want the short version, the conclusion is simple: default to &lt;code&gt;GPT-5.5&lt;/code&gt;, choose &lt;code&gt;GPT-5.4&lt;/code&gt; when budget and usage are more sensitive, and focus on &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; mainly when you are doing longer-running software engineering work inside Codex or need capabilities such as Cloud Tasks and Code Review.&lt;/p&gt;
&lt;p&gt;This is not just a subjective impression. As of &lt;code&gt;2026-05-10&lt;/code&gt;, OpenAI&amp;rsquo;s Codex documentation still says that most tasks should start with &lt;code&gt;gpt-5.5&lt;/code&gt;; if &lt;code&gt;gpt-5.5&lt;/code&gt; is not available yet, continue using &lt;code&gt;gpt-5.4&lt;/code&gt;; and for lighter tasks or subagents, &lt;code&gt;gpt-5.4-mini&lt;/code&gt; is the better fit.&lt;/p&gt;
&lt;h2 id=&#34;positioning-of-the-three-models&#34;&gt;Positioning of the three models
&lt;/h2&gt;&lt;p&gt;Start with the official positioning.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt; is the newest frontier model in Codex, aimed at complex coding, computer use, knowledge work, and research workflows. It behaves like the default flagship model for harder analysis, multi-step tasks, cross-file edits, solution design, and heavier document work.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt; is a steadier all-around choice. OpenAI describes it as bringing the industry-leading coding capabilities of &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; together with stronger reasoning, tool use, and agentic workflows. In other words, it is not simply a weaker &lt;code&gt;5.5&lt;/code&gt;; it is a more balanced model that is easier to use as a long-term default.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt; is still a very strong coding model, but its core strengths are more concentrated in real software engineering and native Codex workflows. The official docs also make a point of saying that it is optimized for agentic coding tasks, while &lt;code&gt;GPT-5.4&lt;/code&gt; already inherits much of its coding strength.&lt;/p&gt;
&lt;p&gt;So today it no longer makes much sense to treat &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; as the automatic choice for &amp;ldquo;the strongest coding model.&amp;rdquo; In most day-to-day development scenarios, &lt;code&gt;GPT-5.5&lt;/code&gt; and &lt;code&gt;GPT-5.4&lt;/code&gt; deserve attention first.&lt;/p&gt;
&lt;h2 id=&#34;how-to-choose-by-use-case&#34;&gt;How to choose by use case
&lt;/h2&gt;&lt;p&gt;If your work is daily Q&amp;amp;A, complex explanations, research summaries, file analysis, or long-form synthesis, &lt;code&gt;GPT-5.5&lt;/code&gt; is the best fit. It is not only good at coding, but also better at handling demanding knowledge work outside pure code.&lt;/p&gt;
&lt;p&gt;If your work is complex programming, refactoring, debugging, architecture design, or multi-file edits, &lt;code&gt;GPT-5.5&lt;/code&gt; is still the first choice. That is also how the Codex documentation frames it: when &lt;code&gt;gpt-5.5&lt;/code&gt; is available, most tasks should start there.&lt;/p&gt;
&lt;p&gt;If you care more about usage limits and cost while still wanting strong quality, &lt;code&gt;GPT-5.4&lt;/code&gt; is often the more practical default. For many routine development tasks, rewrites, standard translations, script generation, and bug fixes, &lt;code&gt;GPT-5.4&lt;/code&gt; is already strong enough and noticeably cheaper.&lt;/p&gt;
&lt;p&gt;If you are using Codex CLI, the IDE extension, or the app for more agent-like engineering work, such as reading a repository for a long time, continuously changing code, queueing tasks, or using Cloud Tasks or Code Review, &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; still matters. That is not because it is more advanced than &lt;code&gt;GPT-5.5&lt;/code&gt;, but because Cloud Tasks and Code Review in Codex still run on &lt;code&gt;GPT-5.3-Codex&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;how-much-credit-does-each-one-use&#34;&gt;How much credit does each one use
&lt;/h2&gt;&lt;p&gt;The Codex credit table makes the differences very clear.&lt;/p&gt;
&lt;p&gt;Under the Business / New Enterprise token-based pricing:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: &lt;code&gt;125 credits / 1M input tokens&lt;/code&gt;, &lt;code&gt;12.5 credits&lt;/code&gt; for cached input, &lt;code&gt;750 credits&lt;/code&gt; for output&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: &lt;code&gt;62.5 credits / 1M input tokens&lt;/code&gt;, &lt;code&gt;6.25 credits&lt;/code&gt; for cached input, &lt;code&gt;375 credits&lt;/code&gt; for output&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: &lt;code&gt;43.75 credits / 1M input tokens&lt;/code&gt;, &lt;code&gt;4.375 credits&lt;/code&gt; for cached input, &lt;code&gt;350 credits&lt;/code&gt; for output&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;That means, by headline pricing, &lt;code&gt;GPT-5.4&lt;/code&gt; costs about half of &lt;code&gt;GPT-5.5&lt;/code&gt; for similar input and output lengths. &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; is cheaper on input, but its output cost is already very close to &lt;code&gt;GPT-5.4&lt;/code&gt;, so it is not the kind of option that is dramatically cheaper overall.&lt;/p&gt;
&lt;p&gt;There is another detail that is easy to miss. The official Codex docs also say that &lt;code&gt;GPT-5.5 uses significantly fewer tokens to achieve results comparable to GPT-5.4&lt;/code&gt;. So although its unit price is higher, in some complex tasks it may reduce the gap through lower token usage and fewer retries.&lt;/p&gt;
&lt;p&gt;For fixed-template article rewriting, translation, and SEO description generation, however, input and output lengths are usually stable. In that kind of work, the advantage of taking fewer wrong turns is smaller than in complex engineering tasks. In practice, &lt;code&gt;GPT-5.4&lt;/code&gt; is still usually the cheaper option, often by roughly &lt;code&gt;45%&lt;/code&gt; to &lt;code&gt;50%&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;differences-in-codex-usage-limits&#34;&gt;Differences in Codex usage limits
&lt;/h2&gt;&lt;p&gt;Beyond raw pricing, these models are not available in exactly the same ways inside Codex.&lt;/p&gt;
&lt;p&gt;As of &lt;code&gt;2026-05-10&lt;/code&gt;, &lt;code&gt;GPT-5.5&lt;/code&gt; is the recommended model in Codex, but it is currently only available when you sign in to Codex with ChatGPT, and it does not support API-key authentication. &lt;code&gt;GPT-5.4&lt;/code&gt; and &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; do support API access.&lt;/p&gt;
&lt;p&gt;Also, &lt;code&gt;GPT-5.5&lt;/code&gt; and &lt;code&gt;GPT-5.4&lt;/code&gt; currently do not support Codex Cloud Tasks or Code Review. Those two features still belong to &lt;code&gt;GPT-5.3-Codex&lt;/code&gt;. So if what you really mean is long-running engineering work inside Codex, you cannot only compare model quality. You also have to ask whether the feature you need is still tied to &lt;code&gt;GPT-5.3-Codex&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If you are only using local messages, the official Plus-plan five-hour window is roughly:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: &lt;code&gt;15-80&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: &lt;code&gt;20-100&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: &lt;code&gt;30-150&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;This also shows a practical difference: &lt;code&gt;GPT-5.5&lt;/code&gt; is the strongest, but it typically gives you fewer uses under fixed limits; &lt;code&gt;GPT-5.4&lt;/code&gt; is more balanced; and &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; can look more durable for local messages.&lt;/p&gt;
&lt;h2 id=&#34;how-to-choose-across-common-scenarios&#34;&gt;How to choose across common scenarios
&lt;/h2&gt;&lt;p&gt;There are many high-frequency tasks in daily work. The more useful way to compare these models is not to ask in the abstract which one is &amp;ldquo;better,&amp;rdquo; but to break the decision down by scenario.&lt;/p&gt;
&lt;h3 id=&#34;1-daily-qa-research-organization-and-long-summaries&#34;&gt;1. Daily Q&amp;amp;A, research organization, and long summaries
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Best fit. It is better at handling ambiguous prompts, filling in context, and turning scattered information into structured output.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Good for normal summaries and bulk organization. When the difficulty is moderate and the volume is high, it is usually the more economical choice.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Not ideal as the main choice. It can do the work, but this is not where it stands out most.&lt;/p&gt;
&lt;h3 id=&#34;2-explaining-technical-concepts-code-walkthroughs-and-reading-old-projects&#34;&gt;2. Explaining technical concepts, code walkthroughs, and reading old projects
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Better for complex projects. It is more reliable when relationships span many files, call chains are long, and historical baggage is heavy.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Good for normal reading and explanation. It works well for understanding functions, modules, configuration, and getting up to speed on a project quickly.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: More execution-oriented, not the first choice for explanation-heavy tasks.&lt;/p&gt;
&lt;h3 id=&#34;3-writing-scripts-small-tools-sql-shell-commands-and-regex&#34;&gt;3. Writing scripts, small tools, SQL, shell commands, and regex
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Better when the script is tied to broader system design, multiple services, or more complex constraints.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: The best default main choice. Most scripts, small tools, SQL tasks, and command-line work are well within its comfort zone, and it uses fewer credits.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Worth considering if the script is only one part of a larger engineering-agent workflow, but not necessary as the first choice for standalone scripting.&lt;/p&gt;
&lt;h3 id=&#34;4-fixing-bugs-making-small-feature-changes-adding-tests-and-routine-development&#34;&gt;4. Fixing bugs, making small feature changes, adding tests, and routine development
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Better for somewhat harder fixes, especially when it needs to analyze the cause first, then edit across files, then add tests.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: The best daily development workhorse. For ordinary bugs, small features, test scaffolding, renaming, and formatting cleanup, it has the best cost-performance balance.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Capable, but usually not the first choice unless you specifically need Cloud Tasks or an engineering-agent workflow.&lt;/p&gt;
&lt;h3 id=&#34;5-complex-refactoring-architecture-design-and-hard-debugging&#34;&gt;5. Complex refactoring, architecture design, and hard debugging
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Best fit. In complex tasks, the expensive part is usually rework, not a single output. &lt;code&gt;GPT-5.5&lt;/code&gt; is better suited to be the main problem-solving model.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Good for medium-complexity work. It can handle refactors and design discussions, but for very long context, multi-step reasoning, and high-uncertainty tasks, it is usually less steady than &lt;code&gt;GPT-5.5&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: More execution-oriented and not the default priority for hard decision-heavy work.&lt;/p&gt;
&lt;h3 id=&#34;6-bulk-light-tasks-repetitive-work-and-split-sub-tasks&#34;&gt;6. Bulk light tasks, repetitive work, and split sub-tasks
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Capable, but usually not cost-effective.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Best fit. For batch comment edits, bulk formatting, template-style code generation, and repetitive content changes, it is the most balanced option.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Worth considering if the work is already embedded in a Codex engineering workflow, but in plain cost-performance terms it is still usually weaker than &lt;code&gt;GPT-5.4&lt;/code&gt;.&lt;/p&gt;
&lt;h3 id=&#34;7-automation-pipelines-agent-execution-and-continuous-repository-work&#34;&gt;7. Automation pipelines, agent execution, and continuous repository work
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Good for early-stage design, rules, and breaking down complex tasks.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Good for writing automation scripts and filling in medium-complexity workflow logic, especially when API access matters.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: The most relevant model here. Because Codex Cloud Tasks and Code Review still run on it, it is better suited to scenarios where you want the system to keep running on its own.&lt;/p&gt;
&lt;h3 id=&#34;8-important-page-copy-brand-introductions-and-final-polish&#34;&gt;8. Important page copy, brand introductions, and final polish
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Best fit. It is strongest in naturalness, style control, and long-context consistency.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Good for most ordinary pages and daily updates. Important pages can start with a draft in &lt;code&gt;GPT-5.4&lt;/code&gt; and then be polished with &lt;code&gt;GPT-5.5&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Not suitable as a primary writing model.&lt;/p&gt;
&lt;h3 id=&#34;9-fixed-template-website-rewriting-translation-and-seo-descriptions&#34;&gt;9. Fixed-template website rewriting, translation, and SEO descriptions
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Better for template design, final polish, high-value pages, and more natural Chinese-to-English translation.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Best fit for bulk production. For standard article rewriting, fixed-structure translation, product copy rewriting, and batch meta-description generation, it usually offers the best quality-cost balance.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Not suitable as the primary writing model. It is more useful for writing batch-processing scripts, cleaning HTML, preserving tag structure, and improving publishing workflows.&lt;/p&gt;
&lt;h3 id=&#34;10-e-commerce-product-copy-category-pages-and-bulk-content-operations&#34;&gt;10. E-commerce product copy, category pages, and bulk content operations
&lt;/h3&gt;&lt;p&gt;&lt;code&gt;GPT-5.5&lt;/code&gt;: Good for defining rules, spot-checking, and polishing high-value pages.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.4&lt;/code&gt;: Best fit for bulk production. It is more balanced for product titles, category descriptions, campaign copy, and long-tail SEO content.&lt;/p&gt;
&lt;p&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt;: Good for crawling, cleaning, batch processing, and auto-publishing scripts, but not ideal for the core copy itself.&lt;/p&gt;
&lt;p&gt;If you compress all of these scenarios into one line:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Complex knowledge work, complex analysis, and high-value writing: prioritize &lt;code&gt;GPT-5.5&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Daily development, bulk production, and repetitive work: prioritize &lt;code&gt;GPT-5.4&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Codex engineering agents, Cloud Tasks, and Code Review: pay special attention to &lt;code&gt;GPT-5.3-Codex&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;final-recommendation&#34;&gt;Final recommendation
&lt;/h2&gt;&lt;p&gt;If your work is mostly ordinary coding, bug fixing, technical questions, and accompanying documentation, &lt;code&gt;GPT-5.4&lt;/code&gt; is a very steady default model.&lt;/p&gt;
&lt;p&gt;If you need more complex project analysis, multi-file changes, architecture planning, hard debugging, or one model that can cover both engineering and demanding knowledge work, go straight to &lt;code&gt;GPT-5.5&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;If what matters most is the engineering workflow inside Codex itself, such as Cloud Tasks, Code Review, and long-running agent execution, then &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; is still worth keeping around, but it no longer makes much sense as the first default choice.&lt;/p&gt;
&lt;p&gt;For a fixed-template content site, the more practical setup is usually:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;code&gt;GPT-5.4&lt;/code&gt; for bulk production&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GPT-5.5&lt;/code&gt; for template design, spot checks, and final polishing&lt;/li&gt;
&lt;li&gt;&lt;code&gt;GPT-5.3-Codex&lt;/code&gt; for writing automation tools rather than the main content&lt;/li&gt;
&lt;/ol&gt;
&lt;h2 id=&#34;summary&#34;&gt;Summary
&lt;/h2&gt;&lt;p&gt;The more practical default order now is &lt;code&gt;GPT-5.5&lt;/code&gt; first, &lt;code&gt;GPT-5.4&lt;/code&gt; second, and &lt;code&gt;GPT-5.3-Codex&lt;/code&gt; reserved for more engineering-agent-heavy or Codex-specific scenarios.&lt;/p&gt;
&lt;p&gt;If your question is specifically &amp;ldquo;How much does &lt;code&gt;GPT-5.4&lt;/code&gt; save versus &lt;code&gt;GPT-5.5&lt;/code&gt; for rewriting the same template article?&amp;rdquo;, then based on the official credit table and the typical token structure of this type of work, it is reasonable to think of it as saving close to half. For content-heavy batch sites, that difference is large enough that the common pattern is not to use &lt;code&gt;GPT-5.5&lt;/code&gt; for everything, but to use &lt;code&gt;GPT-5.5&lt;/code&gt; to define the rules and style first, then hand the bulk work to &lt;code&gt;GPT-5.4&lt;/code&gt;.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
