GPT-5.4 mini and nano: the right model isn't the biggest, it's the one that fits your agent
OpenAI launched two new models today — and they are not for you to use directly in chat. They are for delegation. The era of agents has a new cost logic, and it changes how you will build with AI.

There's a question every dev working with AI starts to ask sooner or later: why am I paying for the most expensive model at every step?
You use GPT-5.4 to plan. To write code. To review. To search the codebase. To classify a file. To extract data from a document. All with the same model, all at the same cost, even when the task is trivial.
GPT-5.4 mini and nano arrived today to signal that this usage model is over. Or at least it should be.

What was launched
Metrics and signals that help summarize technical impact with immediate readability.
GPT-5.4
Flagship
Input Reference Ideal use Planning, coordination, final review
GPT-5.4 mini
New
Input $0.75/M tokens Output $4.50/M tokens Context 400k tokens Codex 30% of flagship quota
GPT-5.4 nano
New · Cheaper
Input $0.20/M tokens Output $1.25/M tokens Ideal use Classification, extraction, ranking
The nano is the cheapest model OpenAI has ever launched. $0.20 per million input tokens — for high-volume tasks where you need speed and scale, not deep reasoning.
But is mini good enough?
That's the question that matters. And the benchmarks have an interesting answer.
SWE-bench Pro — code tasks in real repositories:
- GPT-5.4: ~56%
- GPT-5.4 mini: 54.38% — only 2 points behind
- GPT-5.4 nano: ~28%
OSWorld-Verified — computer and interface usage:
- GPT-5.4: 75.03%
- GPT-5.4 mini: 72.13% — 3 points behind
- GPT-5.4 nano: 39.61%
The mini is 2 percentage points behind the flagship in code. In computer usage, 3 points. And it runs more than twice as fast.
This isn't "almost good". It's good enough for 80% of the tasks a coding agent needs to do.
The logic of sub-agents
What OpenAI is signaling goes beyond prices. It's an architectural change — and it's already happening in Codex, their agentic coding engine.
How Codex divides the work
- GPT-5.4 - Planning, coordination, architectural decisions, final review
- GPT-5.4 - mini Parallel sub-agents — codebase search, large file review, support document processing
- GPT-5.4 nano - High volume — classification, data extraction, ranking, light code support
The large model thinks. The smaller models execute. In parallel, in volume, without consuming flagship quota for tasks that don't need it.
It's the same logic as microservices applied to AI models: you don't use the most expensive server to serve a static file. You use the right one for each function.
What this changes for those building with AI
If you're building anything that calls AI models in multiple stages — whether it's a coding agent, an analytics pipeline, an automation with n8n or Langchain — this model architecture starts to make much more sense than using the flagship for everything.
Think of a simple pipeline: receive a document, extract structured data, classify by category, generate a summary, review. Each step has a different level of complexity. Using GPT-5.4 for all of them is like hiring a senior architect to do housekeeping.
The math in Codex is direct: mini consumes only 30% of the GPT-5.4 quota. For parallel tasks — ten sub-agents running simultaneously — this is the difference between scaling or not scaling financially.
The nano isn't for everything. With 39% in OSWorld-Verified, it loses a lot in tasks that require chained reasoning. It shines in volume and simplicity — classifying, extracting, ranking. If the task has nuance, go for the mini.
A quote that summarizes it well
OpenAI said something worth remembering:
"The best model is often not the biggest — it's the one that can respond quickly, use tools reliably, and still perform well on complex and specialized tasks."
This is a shift in mindset. For a long time, the race was for increasingly larger models. Now the conversation is shifting towards increasingly suitable models — for the right cost, at the right speed, for the right task.
Key Takeaways
- GPT-5.4 mini is 2-3 points behind the flagship in code and computer usage, running 2× faster.
- GPT-5.4 nano is OpenAI's cheapest model — $0.20/M tokens — for high-volume tasks.
- In Codex, mini consumes 30% of GPT-5.4's quota — real financial scaling in parallel pipelines.
- Using the right model for each task is no longer advanced optimization — it's basic architecture.

