OpenAI Releases GPT-5.4 mini and nano, Approaching Flagship Model Performance at Lower Cost

SnapshotLaborer · 2026-03-17T20:16:24+00:00

OpenAI has launched two new models, GPT-5.4 mini and GPT-5.4 nano, improving performance while reducing costs. The mini version is suitable for programming and multimodal tasks with twice the execution speed of the previous generation; nano focuses on low-cost and low-latency tasks. Both can effectively supplement large models in real-time interaction scenarios and are suitable for developers to use flexibly.

SnapshotLaborer

2026-03-17 20:16:24

Abstract generation in progress

OpenAI Tuesday announced the release of its two most powerful small models to date, GPT-5.4 mini and GPT-5.4 nano, significantly narrowing the performance gap with flagship models through lower latency and reduced costs.

GPT-5.4 mini comprehensively surpasses the previous generation GPT-5 mini in core areas such as programming, reasoning, multimodal understanding, and tool invocation. It runs more than twice as fast and approaches the performance of larger models like GPT-5.4 on benchmarks such as SWE-Bench Pro.

GPT-5.4 nano is positioned as the lowest-cost, lowest-latency lightweight option, available only via API, designed for data classification, extraction, and simple programming sub-tasks.

The launch aims to fill the gap where large models struggle to be deployed in real-time interactive scenarios due to high latency, directly impacting rapidly growing commercial markets like programming assistants, AI agent systems, and multimodal applications.

Mini for consumers, nano with dedicated API

Starting today, GPT-5.4 mini is available across OpenAI API, Codex platform, and ChatGPT.

API pricing for GPT-5.4 mini is $0.75 per million input tokens and $4.50 per million output tokens. It supports text and image inputs, tool invocation, function calling, web search, file retrieval, computer control, and skill extension, with a context window of up to 400,000 tokens.

On the Codex platform, GPT-5.4 mini consumes only 30% of GPT-5.4 quota, reducing costs for simple programming tasks to about one-third of the flagship model. Codex also supports delegating workloads to sub-agents running GPT-5.4 mini, enabling lower-cost models to handle less inference-intensive tasks automatically.

On ChatGPT, free and Plus users can access GPT-5.4 mini via the “+” menu under “Thinking”; other paid users will see this model as an automatic fallback once GPT-5.4 Thinking reaches its rate limit.

GPT-5.4 nano is currently only available via API, priced at $0.20 per million input tokens and $1.25 per million output tokens, making it the most affordable among the new models. OpenAI states that nano is suitable for scenarios where high-level models coordinate and handle supporting tasks for sub-agents.

Mini approaches flagship performance, nano surpasses previous generation

According to OpenAI’s published evaluation data, GPT-5.4 mini performs especially well in programming and multimodal tasks.

On the SWE-bench Pro programming benchmark, mini scores 54.4%, narrowing the gap to GPT-5.4’s 57.7% to just 3.3 percentage points, significantly higher than GPT-5 mini’s 45.7%.

On the OSWorld-Verified computer control benchmark, mini scores 72.1%, close to GPT-5.4’s 75.0%, and far ahead of GPT-5 mini’s 42.0%.

In tool invocation capabilities, GPT-5.4 mini scores 93.4% on τ2-bench telecom tests, a notable improvement over GPT-5 mini’s 74.1%. In the general intelligence test GPQA Diamond, mini scores 88.0%, and nano scores 82.8%, both surpassing GPT-5 mini’s 81.6%.

It’s worth noting that GPT-5.4 nano underperforms GPT-5 mini on some visual tasks, with an OSWorld-Verified score of 39.0% versus 42.0%. However, in programming and tool invocation tasks, nano still shows significant improvements over previous generations.

OpenAI states that nano’s design prioritizes low latency and low cost over comprehensive performance, and developers should weigh their specific task requirements when choosing.

Sub-agent architecture, multi-model collaboration as a new product paradigm

OpenAI emphasizes the positioning of these models within layered multi-model systems.

For example, in their proprietary coding assistant Codex, GPT-5.4 handles planning, coordination, and final judgment, while GPT-5.4 mini sub-agents handle more granular tasks like codebase retrieval, large file review, and document assistance.

OpenAI notes that as smaller models become faster and more capable, developers no longer need to rely on a single model for all tasks. Instead, they can build systems where large models make decisions, and smaller models execute tasks rapidly at scale. OpenAI states:

GPT-5.4 mini is the most powerful small model we’ve developed for this workflow to date.

This architecture is especially critical for high-concurrency scenarios, such as programming assistants, screenshot analysis, and real-time image understanding, where response latency directly impacts user experience. The best choice is often not the most capable model but one that balances speed, tool reliability, and task performance.

For developers, the release of GPT-5.4 mini and nano further clarifies the path to significantly reducing inference costs without sacrificing overall system intelligence.

Risk warning and disclaimer

Market risks are present; invest cautiously. This article does not constitute personal investment advice and does not consider individual users’ specific investment goals, financial situations, or needs. Users should evaluate whether any opinions, viewpoints, or conclusions herein are suitable for their circumstances. Invest at your own risk.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes