OpenAI Releases GPT-5.5: Stronger Agentic Coding, Knowledge Work, and Research

Fri, 24 Apr 2026 08:39:56 +0800

OpenAI published Introducing GPT-5.5 on April 23, 2026. Judging from the official page, this release is not just about making the model “smarter”; it is more about whether the model can keep pushing complex tasks forward.

OpenAI positions GPT-5.5 as a model better suited for real work. It is expected not only to answer questions, but also to write code, debug, research online, analyze data, create documents and spreadsheets, operate software, and move across tools until the task is finished.

1. Where GPT-5.5 Is Strongest

The release page repeatedly highlights four areas:

Agentic coding
Computer use and tool use
Knowledge work
Early scientific research assistance

In other words, GPT-5.5 is aimed less at short Q&A and more at long-running tasks. For example, an engineering problem is not just “how should this code be changed”; the model needs to understand the project structure, locate the cause of failure, edit related files, add tests, verify results, and reduce repeated prompting from the user.

OpenAI also emphasizes that GPT-5.5 uses fewer tokens in Codex tasks. This matters in practice because coding agents can consume tokens quickly once they start reading files, running commands, and fixing bugs. If a model can complete the same task in fewer steps, both cost and waiting time go down.

2. Coding Is the Main Showcase

OpenAI calls GPT-5.5 its strongest agentic coding model to date.

The most notable public numbers include:

Terminal-Bench 2.0: GPT-5.5 reaches 82.7%
SWE-Bench Pro: GPT-5.5 reaches 58.6%
OpenAI’s internal Expert-SWE: GPT-5.5 also scores higher than GPT-5.4

These evaluations have one thing in common: they are closer to real engineering workflows than isolated algorithm questions. Terminal-Bench in particular involves command-line operations, planning, trial and error, tool coordination, and multi-step verification.

For everyday developers, the implication is direct: whether a model can take on larger tasks depends on whether it can hold context for a long time, check its own assumptions, know when to run tests, and understand what else a change may affect.

The value of GPT-5.5 in Codex also shows up mainly in these behaviors. It feels more like a collaborator that can take over part of an engineering task, rather than a tool that only completes code fragments.

3. Knowledge Work Becomes a Core Scenario

Beyond coding, OpenAI is placing GPT-5.5 into a broader office-work context.

The announcement says GPT-5.5 can generate documents, spreadsheets, and slide decks better in Codex, and is also better suited for operational research, spreadsheet modeling, and organizing business materials. Combined with computer use, its goal is not merely to offer suggestions, but to participate in the full workflow of finding information, understanding content, using tools, checking output, and turning raw material into a result.

The page also notes that OpenAI already uses Codex across many internal departments, including software engineering, finance, communications, marketing, data science, and product management. The interesting part is not any single example, but the direction: OpenAI is expanding Codex from a developer tool into a more general work tool.

In ChatGPT, GPT-5.5 Thinking is available to Plus, Pro, Business, and Enterprise users; GPT-5.5 Pro is aimed at harder questions and higher-accuracy work, and is available to Pro, Business, and Enterprise users.

4. Research Capability Is More Than Better Answers

GPT-5.5 also receives a strong research-focused presentation.

OpenAI says it has improved in genetics, quantitative biology, bioinformatics, mathematical proof, and related areas. The key is not whether the model can recall a fact, but whether it can handle more realistic research problems: reading data, spotting anomalies, proposing analyses, interpreting results, and continuing based on intermediate findings.

The release page mentions GeneBench and BixBench, both of which focus more on multi-stage scientific analysis. OpenAI also says an internal version of GPT-5.5, with a custom harness, helped discover a new proof related to Ramsey numbers and verified it with Lean.

These examples should not be simplified into “AI can now do research independently.” But they do suggest that models are moving from answer engines toward research collaborators. In scenarios where code, data, papers, experiment ideas, and notes are mixed together, GPT-5.5’s long-horizon reasoning and tool use become especially important.

5. Inference Efficiency: Stronger Without Getting Much Slower

One easily overlooked point is that OpenAI says GPT-5.5 matches GPT-5.4 in real-world per-token latency.

Normally, larger and more capable models bring higher latency. This time, OpenAI emphasizes that inference-system optimization helped GPT-5.5 become more capable while keeping speed stable. The release page also mentions that Codex analyzed production traffic patterns and wrote load-balancing heuristic algorithms, increasing token generation speed by more than 20%.

That detail is interesting: the model is not only served by infrastructure, but also helps improve the infrastructure that serves it.

6. Safety Gets Stricter, Especially Around Cybersecurity

Because GPT-5.5 has stronger cybersecurity capabilities, OpenAI is also tightening safety controls.

The announcement says GPT-5.5 improves over GPT-5.4 in cybersecurity capability, so OpenAI is deploying stricter classifiers, especially for high-risk activity, sensitive cybersecurity requests, and repeated misuse.

This means some users may see more refusals or friction when working on cybersecurity-related tasks. OpenAI also offers Trusted Access for Cyber, intended to reduce unnecessary barriers for verified defensive users.

For ordinary developers, the simple takeaway is: legitimate security hardening, vulnerability fixing, and code auditing should continue to be supported, while high-risk attack workflows will be more tightly controlled.

7. Availability and API Pricing

According to OpenAI’s release page, GPT-5.5 availability is as follows:

ChatGPT: GPT-5.5 Thinking for Plus, Pro, Business, and Enterprise users
ChatGPT: GPT-5.5 Pro for Pro, Business, and Enterprise users
Codex: GPT-5.5 for Plus, Pro, Business, Enterprise, Edu, and Go plans
Codex: 400K context window
Codex Fast mode: about 1.5x token generation speed at 2.5x the cost

For the API, OpenAI says gpt-5.5 and gpt-5.5-pro will be available soon.

The announced API prices are:

gpt-5.5: US$5 / 1M tokens input and US$30 / 1M tokens output
gpt-5.5-pro: US$30 / 1M tokens input and US$180 / 1M tokens output
gpt-5.5 API context window: 1M
Batch and Flex are half the standard API price
Priority processing is 2.5x the standard price

This is clearly more expensive than many everyday models, so it is better suited for high-value tasks: complex engineering changes, long-document analysis, office automation, research assistance, and important business workflows, rather than casual chat.

8. How to Read This Release

In one sentence, GPT-5.5 is about OpenAI pushing models further from “answering questions” toward “getting work done.”

The most important part is not just higher benchmark scores, but the convergence of several capabilities:

Better long-task persistence
More reliable tool use
Stronger engineering context understanding
Better fit for documents, spreadsheets, research, and business workflows
Longer context and higher token efficiency
Stricter controls around high-risk capabilities

For developers, the most interesting thing to test is complex engineering work in Codex. For enterprise users, the bigger question is whether it can turn some cross-tool, cross-document, cross-process work into deliverable output.

GPT-5.5 is not a small update aimed only at chat experience. It looks more like another step in OpenAI’s push toward AI as an execution layer for work.

Introducing GPT-5.5 - OpenAI

GPT-5.5 on KnightLi Blog